Instructions:
-
Select a Dataset:
- Choose any publicly available dataset of interest. You can explore various sources such as Kaggle or any other reputable data repository. Do not use proprietary or private data
- Ensure that the dataset is suitable for classification or regression
- Use the code we used in class. Do not use ANY other code.
-
Data Exploration and Preprocessing:
- Preprocess the data as necessary, including handling missing values, encoding categorical variables, scaling numerical features, etc.
-
Algorithm Selection and Implementation:
- Select any machine learning algorithm(s) that we have covered so far in class. Do not use dimensionality reduction techniques or Neural Networks.
- Implement the selected algorithm(s) using a machine learning library from scikit-learn only
-
Model Training and Evaluation:
- Split the dataset into training and testing sets
- Train the model on the training set and evaluate its performance on the testing set using appropriate metrics
-
Results Analysis and Interpretation:
- Interpret the model’s performance and discuss any insights gained from the analysis.
-
Documentation and Submission:
- Prepare a jupyter notebook and submit it via Canvas. You do not need to create a report. Just the jupyter notebook.