Description
In this assignment you shall classify two datasets, Spiral and Diabetes, using machine learning algorithms. You shall present and explain your code in an oral examination.
Development environment
See the Development environment page.
Submission instructions
See the Deadlines and Submissions page.
Requirements
- Write code (you are not allowed to use a GUI that is available in for example Weka) for classifying the two datasets Spiral and Diabetes (see the Datasets page)
- You can choose between the following machine learning libraries: Weka (Java), Scikit-learn (Python) or Caret (R)
- Classify each dataset with at least three algorithms of your choice, using either cross-validation or train-test split
- Each result shall be presented with accuracy score and confusion matrix
You can verify that your results are correct by testing the dataset in Web ML Experimenter, a web-based machine learning tool. Note that the results for cross-validation and train-test split can differ slightly due to random differences in how the dataset is split, but they should be approximately similar.
Getting started
Before starting with the assignment, watch lectures L00 and L01. L02 to L05 are also good to watch.
Here are some useful guides to get you started:
Scikit-learn:
- Your First Machine Learning Project in Python Step-By-Step
- A Gentle Introduction to k-fold Cross-Validation
Weka:
R:
- Supervised Machine Learning: The Caret Package
- How To Estimate Model Accuracy in R Using The Caret Package
If you need help
Ask questions in the Slack channel or contact the main instructor to book an online meeting.