In this project, I have built and compared various Machine Learning Classification models to predict the diagnosis of Alzheimer's Disease and also performed Exploratory Data Analysis which gives insight about all the features present in the data. This project is built using Alzheimers Disease Dataset. You can also visit this notebook at Kaggle. The entire project is built using Python programming language.
The first step involves the EDA to have insights about the features and find out which features actually significantly contribute to the target variable. In this dataset, there is no correlation among the features which makes the part of Feature Engineering easier.
First of all, the data was split into train and test data to build the model. Since this is a skewed classification problem, metrics such as confusion matrix, precision, and recall are used to evaluate the models. The aim is to have the precision of the model as high as possible since false negatives imply missed patients who actually require diagnosis but are missed by the model.
Precision: \( \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} \)
Serial No. | Model | Precision |
---|---|---|
1 | Logistic Regression | 0.85 |
2 | Decision Tree | 0.95 |
3 | Random Forest | 0.96 |
4 | Gradient Boost | 0.96 |
In case you are interested in contributing and collaborating on this project, please do contact me on LinkedIn.
This Notebook has been released under the Apache 2.0 open-source license.