Parkinson's Disease Detection
What is Parkinson’s Disease?
Parkinson’s disease is a progressive disorder of the central nervous system affecting movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neurodegenerative disorder affecting dopamine-producing neurons in the brain.
What is XGBoost?
XGBoost is a new Machine Learning algorithm designed with speed and performance in mind. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the xgboost library; this is an implementation of the scikit-learn API for XGBoost classification.
In this Python project, using machine learning algorithm XGBoost, I detect the presence of Parkinson’s Disease in individuals using various factors. I used an XGBClassifier for this and made use of the sklearn library to prepare the dataset.
This detector gives us an accuracy of 94.87%, which is great considering the number of lines of code in this python project.
The entire script can be found here:
Detecting Parkinson’s Disease with XGBoost — Objective
To build a model to accurately detect the presence of Parkinson’s disease in an individual.
Detecting Parkinson’s Disease with XGBoost — About the Python Machine Learning Project
In this Python machine learning project, using the Python libraries scikit-learn, numpy, pandas, and xgboost, A model is built using an XGBClassifier. The dataset is loaded into the data frame to get the features and labels, scale the features, then split the dataset, build an XGBClassifier, and then calculate the accuracy of our model.
Dataset for Python Machine Learning Project
You’ll need the UCI ML Parkinsons dataset for this; you can download it here. The dataset has 24 columns and 195 records and is only 39.7 KB.
Steps for Detecting Parkinson’s Disease with XGBoost
Below are some steps required to practice Python Machine Learning Project –
- Make necessary imports using pip
- Read the data into a DataFrame and get the first 5 records.
- Get the features and labels from the DataFrame (dataset). The features are all the columns except ‘status’, and the labels are those in the ‘status’ column.
- The ‘status’ column has values 0 and 1 as labels; let’s get the counts of these labels for both- 0 and 1.
- Now there are 147 ones and 48 zeros in the status column in our dataset.
- Initialize a MinMaxScaler and scale the features to between -1 and 1 to normalize them. The MinMaxScaler transforms features by scaling them to a given range. The fit_transform() method fits to the data and then transforms it. We don’t need to scale the labels.
- Now, split the dataset into training and testing sets keeping 20% of the data for testing.
- Initialize an XGBClassifier and train the model. This classifies using eXtreme Gradient Boosting- using gradient boosting algorithms for modern data science problems. It falls under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.
- Finally, generate y_pred (predicted values for x_test) and calculate the accuracy for the model. Print it out.