Real-world machine learning projects — from exploratory data analysis to production-ready predictive models.
Each project tackles a genuine business or scientific problem using structured ML workflows: data cleaning, EDA, feature engineering, model selection, hyperparameter tuning, and performance evaluation.
Goal: Predict used car prices across two decades of Mercedes-Benz listings using model, mileage, AMG/4MATIC flags, and trim level.
Highlights: 9 models compared · Overfitting analysis · Log transformation · Ridge vs XGBoost
Result: Ridge Regression wins with CV R² = 0.708 and near-zero overfit gap (0.03) — XGBoost overfitted severely despite high train score
Regression Ridge XGBoost Overfitting Detection Feature Engineering
Goal: Predict student employment outcomes using academic performance, coding skills, and internship experience.
Highlights: 8 algorithms benchmarked · 5-fold cross-validation · Overfitting detection · Feature importance
Result: XGBoost achieved 96.80% test accuracy (CV Mean: 0.9648) — Decision Tree dropped 14.2% from train to test
Classification XGBoost Model Benchmarking Cross-Validation
Goal: Classify regional water scarcity levels (Low / Moderate / High) across 200+ countries from 2000–2025.
Highlights: 8 models compared · Zero overfitting validated · Groundwater depletion as #1 predictor (56% importance)
Result: XGBoost achieved 99.62% test accuracy with only 0.003 train-test gap
Classification XGBoost LightGBM Environmental ML Multi-Class
Goal: Predict median home values using socioeconomic, environmental, and structural features.
Highlights: 7 models compared · Log transformation · RobustScaler · Multicollinearity mitigation
Result: Tuned Gradient Boosting achieved R² = 0.88, RMSE = $2,940 — Lower income % and avg rooms as top drivers
Regression Gradient Boosting EDA Feature Engineering
Goal: Dual-dataset regression — King County residential pricing + medical insurance cost estimation.
Highlights: 4 models compared · GridSearchCV tuning · Polynomial features · SVR vs Decision Tree
Result: Tuned Decision Tree achieved R² = 0.7908 on 21,613 housing records
Regression Decision Tree Polynomial Regression SVR GridSearchCV
Goal: Predict individual annual medical insurance costs from health and demographic data.
Highlights: End-to-end pipeline · Custom prediction function · Feature importance via coefficients
Result: Linear Regression achieved R² = 0.78, RMSE = $5,796 — smoking status identified as dominant cost driver
Regression Linear Regression EDA Feature Importance
Goal: Explore Netflix's content library size and subscription pricing across countries.
Highlights: Multi-library visualisation (Plotly · Seaborn · Matplotlib) · Pandas Profiling · Regional pricing patterns
Result: Identified significant regional disparities in both content availability and pricing strategy
EDA Data Visualisation Plotly Pandas Profiling
Core ML │ Scikit-learn · XGBoost · LightGBM · Gradient Boosting
Analysis │ Pandas · NumPy · SciPy
Visualisation│ Matplotlib · Seaborn · Plotly
Environment │ Jupyter Notebook · Kaggle Kernels
Every project follows a consistent, professional workflow:
- Problem Definition — What business question are we answering?
- Exploratory Data Analysis — Distributions, correlations, outliers
- Feature Engineering — Encoding, scaling, new feature creation
- Model Selection — Multiple algorithms compared objectively
- Hyperparameter Tuning — Grid search / cross-validation
- Evaluation — Metrics relevant to the problem type
Full interactive notebooks with outputs, visualisations, and commentary are available on Kaggle:
👉 kaggle.com/brahimenesulusoy
I build custom ML solutions for businesses — sales forecasting, churn prediction, price estimation, and more.
- 🔗 LinkedIn: ibrahim-enes-ulusoy
- 🌐 Portfolio: enesulusoy-portfolio.netlify.app
- 📧 Email: c.enes.eng@gmail.com