Skip to content

khushbirkaur/SQL-Data-Cleaning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

🧹 SQL Data Cleaning Project

MySQL Data Cleaning Status GitHub Repo Size

📌 Project Overview

This project focuses on cleaning and transforming raw data using SQL. The dataset contains layoffs-related information, and the goal is to make it accurate, consistent, and ready for analysis.


🎯 Objectives

  • Remove duplicate records
  • Handle missing/null values
  • Standardize inconsistent data formats
  • Convert data types properly
  • Prepare dataset for further analysis

🛠️ Tools & Technologies

  • SQL (MySQL)
  • Database Management System (DBMS)

📂 Dataset Description

The dataset includes information such as:

  • Company name
  • Industry
  • Total layoffs
  • Date
  • Country
  • Stage

📌 Note:

  • The original raw dataset (uncleaned) is also included in this repository.
  • It contains inconsistencies, null values, and duplicates.
  • The SQL script demonstrates how this raw data is cleaned step-by-step.

⚙️ Data Cleaning Steps

1. Removing Duplicates

  • Identified duplicate rows using window functions
  • Deleted redundant records

2. Standardizing Data

  • Fixed inconsistent company names
  • Cleaned industry and country fields
  • Trimmed unwanted spaces

3. Handling Null Values

  • Replaced or removed null values where necessary
  • Ensured meaningful data consistency

4. Date Formatting

  • Converted date column into proper SQL DATE format using STR_TO_DATE()

5. Data Transformation

  • Structured dataset for better querying and analysis

📊 Key SQL Concepts Used

  • ROW_NUMBER()
  • PARTITION BY
  • CTE (Common Table Expressions)
  • UPDATE statements
  • CASE WHEN
  • STR_TO_DATE()
  • Data filtering and transformation

📈 Outcome

  • Clean and structured dataset
  • Improved data quality
  • Ready for analysis and visualization

👩‍💻 Author

Khushbir Kaur Bamrah Artificial Intelligence & Data Science Student

About

SQL Data Cleaning Project using MySQL. This project focuses on cleaning raw layoff dataset by handling null values, removing duplicates, standardizing formats, and preparing data for analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors