My Python Data Science Journey Welcome to my repository! This space is dedicated to documenting my learning journey through the world of data science, with a primary focus on mastering Python's core data manipulation and analysis libraries: NumPy and Pandas.
The goal of this repository is not just to store code, but to build a structured, real-world portfolio of my skills, showing my progress from fundamental concepts to more complex machine learning applications.
π― Goals Master the Fundamentals: Gain a deep and practical understanding of NumPy for numerical computation and Pandas for data manipulation.
Build a Portfolio: Create a collection of clean, well-commented, and practical examples that showcase my abilities.
Explore Machine Learning: Apply data science libraries to solve real-world problems and build introductory machine learning models.
Track Progress: Visibly track my development and create a log of my journey for myself and others to see.
.
βββ .gitignore
βββ README.md
βββ numpy_examples/
β βββ 01_array_basics.py
β βββ 02_indexing_and_slicing.py
β βββ 03_math_and_stats.py
βββ pandas_examples/
β βββ 01_series_and_dataframes.py
β βββ 02_data_loading_and_cleaning.py
β βββ 03_data_aggregation.py
βββ ml_projects/
βββ 01_simple_linear_regression.py
numpy_examples/: Contains scripts focused on core NumPy functionalities.
pandas_examples/: Holds scripts dedicated to Pandas operations, from data structures to data cleaning.
ml_projects/: A place for more comprehensive projects where NumPy and Pandas are used for building and training simple machine learning models.
πΊοΈ Learning Roadmap This is the path I am following. Each new concept and project will be added to the repository as it's completed.
β‘οΈ NumPy Deep Dive
Understanding the ndarray object.
Array creation, manipulation, and broadcasting.
Indexing, slicing, and boolean operations.
Mathematical and statistical functions.
β‘οΈ Pandas Mastery
Working with Series and DataFrame objects.
Importing data from various sources (CSV, Excel).
Data cleaning: handling missing values, duplicates, and incorrect data types.
Data wrangling: grouping (groupby), merging, and reshaping data.
Time series analysis.
β‘οΈ Introductory Machine Learning
Using scikit-learn with NumPy and Pandas.
Project 1: Simple Linear Regression.
Project 2: Classification (e.g., Logistic Regression or KNN).
Data visualization with Matplotlib or Seaborn.
π« Connect with Me I'm always open to feedback and collaboration. Feel free to connect with me!
LinkedIn: https://www.linkedin.com/in/safwansaba/