This project focuses on cleaning and transforming raw data using SQL. The dataset contains layoffs-related information, and the goal is to make it accurate, consistent, and ready for analysis.
- Remove duplicate records
- Handle missing/null values
- Standardize inconsistent data formats
- Convert data types properly
- Prepare dataset for further analysis
- SQL (MySQL)
- Database Management System (DBMS)
The dataset includes information such as:
- Company name
- Industry
- Total layoffs
- Date
- Country
- Stage
📌 Note:
- The original raw dataset (uncleaned) is also included in this repository.
- It contains inconsistencies, null values, and duplicates.
- The SQL script demonstrates how this raw data is cleaned step-by-step.
- Identified duplicate rows using window functions
- Deleted redundant records
- Fixed inconsistent company names
- Cleaned industry and country fields
- Trimmed unwanted spaces
- Replaced or removed null values where necessary
- Ensured meaningful data consistency
- Converted date column into proper SQL DATE format using
STR_TO_DATE()
- Structured dataset for better querying and analysis
ROW_NUMBER()PARTITION BYCTE (Common Table Expressions)UPDATEstatementsCASE WHENSTR_TO_DATE()- Data filtering and transformation
- Clean and structured dataset
- Improved data quality
- Ready for analysis and visualization
Khushbir Kaur Bamrah Artificial Intelligence & Data Science Student