# A4 – Machine Learning

## Description

• In assignment 4 you shall implement the Naïve Bayes machine learning algorithm and use it on some datasets
• It can be implemented in any programming language you like
• You can work alone or in group of two students
• You shall present your application and code at an oral examination
• You are not required to build a REST web service for this assignment

## Submission instructions

See the Deadlines and Submissions page.

## Requirements

E
• Implement the Naïve Bayes algorithm, using the code structure below (you are allowed to add more classes and methods if needed)
• Train the model on the Iris and Banknote authentication datasets (see Datasets page)
• Calculate classification accuracies for both datasets (use all data for both training and testing)
C-D
• Implement code for generating confusion matrices, using the code structure below
A-B
• Implement code for n-fold cross-validation, using the code structure below
• It shall be possible to use 3, 5 or 10 folds (it is okay if your implementation supports other folds)
• Calculate accuracy score for 5-fold cross-validation on both datasets

Note! The purpose of this assignment is that you shall learn how to implement Naïve Bayes, encoding label strings to integers, calculating accuracy scores, performing cross-validation and generating confusion matrix. These functionalities are often available in machine learning libraries such as Weka or Scikit-learn, which you are not allowed to use. You are allowed to use library functions for loading and shuffling the data and all necessary mathematical operations, data structures etc.

## Code structure requirements

NaiveBayes class
void fit ( X:float[][], y:int[] ) Trains the model on input examples X and labels y
int[] predict ( X:float[][] ) Classifies examples X and returns a list of predictions

Other methods
float accuracy_score ( preds:int[], y:int[] ) Calculates accuracy score for a list of predictions
int[][] confusion_matrix ( preds:int[], y:int[] ) Generates a confusion matrix and returns it as an integer matrix
int[] crossval_predict ( X:float[][], y:int[], folds:int ) Runs n-fold cross-validation and returns a list of predictions

Input data (a float matrix with input variables as columns and examples as rows) is usually denoted with X and the categories/labels (a list of integers) is usually denoted as y. Predictions (a list of integers) shall be compared with the actual labels (y) when calculating the accuracy score (percentage correct predictions) and generating the confusion matrix.

## Test cases

You can verify your results with the results in Web ML Experimenter. The Iris dataset is built-in in Web ML (click the Try Iris dataset button), and the Banknote authentication can be uploaded from the csv file. Note that the cross-validation results can differ slightly due to differences in how the data is split into folds, but the accuracy you get should be almost similar to the accuracy in Web ML.

## Welcome to CoursePress

en utav Linnéuniversitets lärplattformar. Som inloggad student kan du kommunicera, hålla koll på dina kurser och mycket mer. Du som är gäst kan nå de flesta kurser och dess innehåll utan att logga in.

Läs mer lärplattformar vid Linnéuniversitetet