The Stroke Prediction Dataset at Kaggle is an example of how to use Machine Learning for disease prediction.
The dataset comprises more than 5,000 observations of 12 attributes representing patients’ clinical conditions like heart disease, hypertension, glucose, smoking, etc. For each instance, there’s also a binary target variable indicating if a patient had a stroke.
We can build a model to predict the occurrence of a stroke by training typical classification algorithms, for example, Logistic Regression, K-Nearest Neighbors, Support Vector Machines classifiers, Decision Trees, or others.
Of course, the actual applicability of such a model depends on how representative the patients dataset is. However, this is a good exercise for those who wish to understand how to apply machine learning in healthcare.
Reference:
https://www.kaggle.com/fedesoriano/stroke-prediction-dataset