A web-based application for classifying penguin species using the Palmer Penguins dataset and Gaussian Naive Bayes algorithm. Built with Streamlit.
This application allows users to explore the Palmer Penguins dataset, train a machine learning model, and make predictions on penguin species based on physical measurements.
- Data Preview: View the first 10 rows of the dataset
- Statistics: Display descriptive statistics for key features
- Model Training: Automatic training of Gaussian Naive Bayes model with 80/20 train-test split
- Model Evaluation: Accuracy score and classification report
- Interactive Prediction: Input measurements to predict penguin species
- Responsive UI: Clean and user-friendly interface
Experience the application live at: https://ircham3.streamlit.app
-
Clone the repository:
git clone <repository-url> cd dataapps-2
-
Create a virtual environment (optional but recommended):
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install streamlit pandas scikit-learn pillow matplotlib
The application uses penguins_cleaned.csv, which contains measurements from the Palmer Penguins dataset including:
- Species (Adelie, Chinstrap, Gentoo)
- Bill length and depth (mm)
- Flipper length (mm)
- Body mass (g)
- Island and sex (not used in modeling)
- Ensure
penguins_cleaned.csvis in the same directory asmain.py - Run the application:
streamlit run main.py
- Open your browser to the provided local URL (usually http://localhost:8501)
- Algorithm: Gaussian Naive Bayes
- Features: Bill length, bill depth, flipper length, body mass
- Target: Species
- Preprocessing: Label encoding for species, stratified train-test split
- Add data visualization plots
- Implement additional ML algorithms
- Add model comparison functionality
- Include hyperparameter tuning
This project is open-source. Feel free to use and modify.
Contributions are welcome! Please open an issue or submit a pull request.