Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review

Afsaneh, Elaheh; Sharifdini, Amin; Ghazzaghi, Hadi; Ghobadi, Mohadeseh Zarei

doi:10.1186/s13098-022-00969-9

Table 1 Some applied ML models in the published papers. The publicly available dataset are mentioned in bold

From: Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review

Sample number	ML models	Refs.
Early diagnosis and prediction of diabetes
T2DM
15,005 subjects with age ≥ 3	XGBoost, DNN, and RF	[63]
1512 subjects	LR, RF, Naive Bayes (NB), SVM, XGBT, ANN, K-nearest neighbor (KNN), DT, XceptionResNet 50, DenseNet121, Vgg16, Vgg19, and InceptionV3, Stacking model of non-invasive variables and the Resnet50 model	[53]
530 participants: 272 were diabetic patients and 258 were non-diabetic patients	Deep autoencoder learning algorithm with CNN networks and deep radial basis function neural network (RBFNN) classifier	[52]
217 participants with diabetes, prediabetes and normal conditions	SVM, K-nearest neighbors, RF, XGBoost, hybrid feature selection-XGBoost	[91]
2371 T1-weighted whole-body MRI data sets	DenseNet architecture	[54]
8454 subjects over five years of follow- up	XGBoost, SVM, LR, RF, and ensemble algorithms	[64]
16,429 men and non-pregnant women ≥ 20 years of age	ANN, LR, and RF models	[55]
453,487 T2DM patients	Reverse engineering and forward simulation (REFS)	[124]
82 obese women (40 non-diabetic and 42 diabetes)	Separability-correlation measure (SCM) and ANN	[57]
13,309 Canadian patients	GBM and LR	[92]
Kaggle diabetes dataset	RF	[58]
1492 healthy individuals	SVM	[59]
10 patients	LR, CNN, Multi-Layer Perceptrons (MLPs), and ensembling methods	[60]
4870 subjects (2955 females and 1915 males)	Bayes classifier and LR	[79]
768 individuals, 500 healthy and 268 with T2DM (UCI Machine Learning Repository: Pima Indians diabetes data set)	AIRS2 and MAIRS2	[61]
Pima Indian women	DT and LR	[65]
2970 youth aged 12–19 years (NHANES dataset)	LR, LogitBoost, and decision tree	[66]
746 subjects	SVM, XGBoost, RF, and their combinations	[62]
GDM
22,242 singleton pregnancies (3182 women developed GDM)	RF, logistic, decision tree, XGB, GDBT, LGB, AdaBoost, Vote, logistic regression with RCS and stepwise logistic regression	[75]
490 pregnant women, 215 with GDM and 275 controls	SVM and light gradient boosting machine (lightGBM)	[76]
588,622 pregnancies from 368,351 women	Gradient-boosting machine model constructed by decision-tree base-learners	[74]
4378 cases	CSHM, BN, LR, CHAID tree, SVM, and NN	[77]
152 women	AIRS	[80]
4771 pregnant women in early gestation	Multivariate Bayesian logistic regression using Markov Chain Monte Carlo simulation algorithm	[81]
All types of Diabetes
2001 cases with diabetes (Kaggle dataset)	Filter based DT-(ID3) algorithm for features selection and Hold out, K-fold, and LOSO for classification	[83]
852 454 individuals with pre-diabetes	LightGBM	[86]
1050 curves of glucose concentration of type 1 and type 2 diabetics	Double-Class AdaBoost	[87]
268 females and 500 controls	Gaussian process (GP)-based classification approach	[88]
5301 African Americans	RF	[89]
268 females and 500 controls	Fuzzy c-means (FCM)- on adaptive network-based fuzzy inference system (ANFIS)	[90]
Pima Indian Diabetes Dataset and Biostat Diabetes Dataset	RLEFRBS	[95]
Prediction of blood glucose (BG)
OhioT1DM dataset: six participants with T1D between 40 and 60 years old	SVM, extended tree classifier (ETC), and random forest classifier (RFC)	[121]
IDIAB, OhioT1DM dataset, and T1DMS datasets	Fully convolutional neural network	[101]
225 T1DM patients with 315,000 h of CGM data	Linear extrapolation, NNs, last observation carried forward, ensemble methods using LSBoost and bagging, one with error-weights, and one without error-weights	[122]
Blood glucose concentration values of 180 h in diabetic patients and GCM of every 5 min	Multi-scale blood glucose prediction model (VMD-KELM-AdaBoost)	[102]
OhioT1DM dataset	Autoregression with ARX model, ML-based regression models, and DL models including a TCN and a vanilla LSTM Network	[104]
10 adult T1DM subjects which was generated using the UVA/Padova T1D	Multi-layer convolutional recurrent neural network (CRNN) architecture	[105]
OhioT1DM dataset	LSTM-based deep RNN	[107]
104 people who had experienced at least one hypoglycemia alert value during a three-day CGM session	SVM using radial basis or linear functions, RF, LR, and K-nearest neighbor	[116]
10 T1DM patients with continuous glucose monitoring system data points	A combination of AR, SVR, and ELM	[109]
10 T1DM adults studied during 12 weeks	SVM and MLP	[119]
10,000 users with more than 1 million nights of CGM data	RF	[120]
26 participants	LSTM-NN-TF-DTW model	[143]
8501 eligible participants	LASSO regression and RF	[130]
124 CGM traces collected over 10 days	Autoregressive, autoregressive moving average, and autoregressive integrated moving average (ARIMA)) and nonlinear machine-learning procedures (SVR, feed-forward neural network (fNN), regression random forest, and LSTM-NN	[110]
Six subjects suffering from T1DM aged between 23 and 52 (average 39 ± 10)	Jump Neural Network	[154]
124 people (22,804 valid nights of data) with T1D	SVR	[117]
463 people with T1DM	Linear discriminant analysis	[118]
154 observations of in-clinic aerobic exercise in 43 adults with T1DM	Decision tree and Random forest	[126]
16 children with T1DM	Extreme learning machine (ELM)-based neural network	[127]
8 patients (320 data points) and a testing set with 8 patients (269 data points)	ELM trained feed-forward neural network	[128]
24 331 adults	Bayesian scoring algorithm	[78]
Ten male subjects with T1DM	pattern classification algorithm	[125]
27,050 adult individuals with no prior diagnosis of T2DM	XGBoost, RF, Glmnet, and LightGBM	[112]
6.8 million data points	Combination of GBD and SVR	[131]
10 patients using 70 mg/dL and 54 mg/dL as thresholds according to the consensus for Level 1 and Level 2 hypoglycemia	Developed SVM	[129]
The health data associated with 18 691 ICU stays and 14 742 critical care patients (MIMIC-III database)	GBT	[132]
29,601 entries from 47 different patients	SVR, WNN, KNN, RFR, GPR, ANN, and RR	[134]
54 978 inpatients who had a minimum of 4 BG measurements and took a minimum of 1 U of insulin during hospitalization	RF classification, multivariable logistic regression, stochastic gradient boosting (SGB), and naive Bayes	[114]
25 T1DM patients	RF	[135]
OhioT1DM dataset	LR, vanilla LSTM, and BiLSTM	[137]
Detection of blood glucose
12 healthy subjects	Back-propagation neural network (BPNN) and multivariate polynomial regression	[151]
540 patients with T2DM	Nonlinear and linear predictive algorithms	[152]
2787 consecutive participants	Combination of elastic network with RF, SVM, and back-propagation artificial neural network (BP-ANN) algorithms as well as LR	[155]
1772 paired data varying from 65 ~ 492 mg/dl and 80 ~ 352 mg/dl	AdaBoost	[156]
15 patients with T1DM under free-living conditions	RReliefF, RF, Gaussian, SVR	[157]
EMR of 127 patients for the first 72 h of ICU care who upon admission to the ICU had a diagnosis of type 1 (N = 8) T2DM (N = 97) or a glucose value > 150 mg/dl (N = 22)	GBT	[133]
Insulin resistance predicting models
8842 Koreans participants	LR, XGBoost, random forest, and ANN	[159]
1344 samples	HOMA-IR model	[160]
2433 T2DM patients	MIL-Boost	[161]
968 patients not affected by T2DM (FIMMG_obs dataset)	TyG-er	[162]
315 T1DM patients	MARSplines and ANN	[163]
Determination of the start of treatment and its effect
13 904 diabetes individuals	LASSO	[164]
100 virtual adult subjects	LASSO and MLR	[166]
100 virtual subjects	GBT and RF	[165]
87 patients	Reinforcement learning	[168]
The two studies had a similar design but enrolled patients who were treatment- naïve (study 1, n = 677) or receiving background metformin (study 2, n = 686)	RF and classification tree algorithms	[169]
12,147 commercially-insured adults and Medicare Advantage beneficiaries with prediabetes or diabetes	RL both with and without regularization and/or stepwise feature selection, Tree-based models, SVM, multivariate adaptive regression splines, and flexible discriminants	[170]
1270 patients with T2DM	Weighted SVM	[171]
100 virtual adults	Neural networks	[167]
3029 patients	Logistic ML algorithm	[172]
Risk assessment of Diabetes
25,186 patients	Regularized and weighted RSF	[177]
273 678 patients	DeepSurv and RSF	[178]
11,000 persons	RF classifier	[179]
15,928 Chinese adults without diabetes at baseline (DRYAD)	XGBoost	[180]
1,832, 270 cases of type 2 diabete	Gradient boosting decision tree algorithm and LightGBM	[181]
6025 participants	Naive Bayes approaches and LR	[182]
40,124 patients from the GIANTT database	Ridge logistic regression, logistic regression with backward selection, LASSO LR, elastic net logistic regression, and RF	[183]
36,652 eligible participants from the Henan Rural Cohort Study	Classification and regression tree (CART), RF, GBM, LR, SVM, and ANN	[184]
997 subjects with CT scans and contextual EMR scores	Deep neural network	[185]
17,658 in-patients with diabetes who underwent 32,758 admissions	LR and XGBoost	[186]
10,464 diabetic patients	LR	[187]
34 patients	Aggregation method	[188]
1647 obese, hypertensive patients	KNN and RF	[189]
800 T2DM patients	BN, ANN, CRT, CHAID, discriminate, QUEST, and ensemble models	[190]
112 patients over a range of 90 days	LR and RF	[191]
Dietary and insulin dose modifications
23 adults with newly diagnosed T2DM	Algorithm-based personalized postprandial-targeting	[194]
100 adults under different realistic scenarios lasting three simulated months	Reinforcement learning	[195]
Diabetes management
12 subjects with T1DM	Linear discriminant analysis, ensemble learning, Gaussian process regression, KNN, SVM, decision trees, and deep neural networks with LSTM	[199]
16,848 inpatients receiving subcutaneous insulin who achieved target blood glucose control of 100–180 mg/dL on a calendar day	A combination of RF, regularized regression, and GBT	[200]
110 pediatric patients with T1DM	RF and quantile regression forest	[202]
68,274 samples collected from 1119 subjects	Deep learning	[204]
116 subjects	SVM	[205]
D1NAMO dataset contains data for nine patients with T1DM	RNN-LSTM	[207]
70 participants with T1DM	K-means clustering	[208]
100 subjects over a two-month scenario	XBM	[209]
250 24 h CGM plots	SVR and multilayer perceptrons	[210]
15 patients with T1DM	SVR	[211]
3 real subjects	Multiple boundaries and domain-based, density-based, reconstruction-based, and unsupervised models	[212]

Back to article page

ISSN: 1758-5996

Contact us

Submission enquiries: journalsubmissions@springernature.com

Diabetology & Metabolic Syndrome

Contact us