A4 - Machine Learning
Description
- In this assignment, you shall implement the Naïve Bayes machine learning algorithm and use it on some datasets.
- You can use any programming language you like.
You can work alone or in a group of two students.- You shall present your application and code at an oral examination.
- You are not required to build a REST web service for this assignment.
Submission instructions
See the Deadlines and Submissions page.
Requirements
Grade | Requirements |
---|---|
E |
|
C-D |
|
A-B |
|
Note! The purpose of this assignment is that you shall learn how to implement Naïve Bayes, encoding label strings to integers, calculating accuracy scores, performing cross-validation, and generating a confusion matrix. These functionalities are often available in machine learning libraries such as Weka or Scikit-learn, which you are not allowed to use. You are allowed to use library functions for loading and shuffling the data and all necessary mathematical operations, data structures, etc.
Code structure requirements
NaiveBayes class | |
---|---|
void fit ( X:float[][], y:int[] ) | Trains the model on input examples X and labels y. |
int[] predict ( X:float[][] ) | Classifies examples X and returns a list of predictions. |
Other methods | |
---|---|
float accuracy_score ( preds:int[], y:int[] ) | Calculates accuracy score for a list of predictions. |
int[][] confusion_matrix ( preds:int[], y:int[] ) | Generates a confusion matrix and returns it as an integer matrix. |
int[] crossval_predict ( X:float[][], y:int[], folds:int ) | Runs n-fold cross-validation and returns a list of predictions. |
Input data (a float matrix with input variables as columns and examples as rows) is usually denoted with . The categories/labels (a list of integers) is usually denoted as . Predictions (a list of integers) shall be compared with the actual labels () when calculating the accuracy score (percentage correct predictions) and generating the confusion matrix.
Test cases
You can verify your results with the results in Web ML Experimenter. The Iris dataset is built-in in Web ML (click the Try Iris dataset button), and the Banknote authentication can be uploaded from the CSV file. Note that the cross-validation results can differ slightly due to how the data is split into folds, but the accuracy you get should be almost similar to the accuracy in Web ML.