A collection of data mining and machine learning projects implemented in Python using Jupyter Notebooks. Each project applies core data science techniques — from regression and classification to clustering — on real-world datasets.
| File | Dataset | Techniques Used |
|---|---|---|
Data_mining_pj1(Utilities).ipynb |
Utilities | Clustering (K-Means), Exploratory Data Analysis |
Data_mining_pj2(Airfares).ipynb |
Airfares | Linear Regression, Feature Selection |
Data_mining_pj3(Baseball_Hitters).ipynb |
Baseball Hitters | Regression, LASSO/Ridge, Model Evaluation |
Data_mining_pj4(SpamBase).ipynb |
SpamBase | Classification, Logistic Regression, Naive Bayes |
Explores utility company data to identify groups of similar companies using unsupervised learning. Applies K-Means clustering and visualizes cluster characteristics.
Dataset: Utilities.csv
Analyzes domestic airfare pricing to identify key factors that drive ticket costs. Builds regression models to predict fare prices across routes.
Dataset: Airfares.csv
Predicts baseball player salaries using batting statistics. Implements and compares multiple regression approaches including regularization methods to handle multicollinearity.
Dataset: Hitters.csv
Builds a spam email classifier using features extracted from email content. Compares classification models for accuracy, precision, and recall.
Dataset: Spambase.csv
- Python 3.x
- Jupyter Notebook
- pandas, numpy
- scikit-learn
- matplotlib, seaborn
-
Clone the repository:
git clone https://github.com/janmejoykar1807/Python_Data_Mining_Projects.git
-
Install dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn jupyter
-
Launch Jupyter Notebook:
jupyter notebook
-
Open any
.ipynbfile to explore the project.
Janmejoy Kar Data Science learner — applying Python, R, and SQL for data analysis and predictive modeling. GitHub Profile