Skip to main content

Table 1 Some applied ML models in the published papers. The publicly available dataset are mentioned in bold

From: Recent applications of machine learningĀ and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review

Sample number

ML models

Refs.

Early diagnosis and prediction of diabetes

T2DM

15,005 subjects with ageĀ  ā‰„ Ā 3

XGBoost, DNN, and RF

[63]

1512 subjects

LR, RF, Naive Bayes (NB), SVM, XGBT, ANN, K-nearest neighbor (KNN), DT, XceptionResNet

50, DenseNet121, Vgg16, Vgg19, and InceptionV3, Stacking model of non-invasive variables and the Resnet50 model

[53]

530 participants: 272 were diabetic patients and 258 were non-diabetic patients

Deep autoencoder learning algorithm with CNN networks and deep radial basis function neural network (RBFNN) classifier

[52]

217 participants with diabetes, prediabetes and normal conditions

SVM, K-nearest neighbors, RF, XGBoost, hybrid feature selection-XGBoost

[91]

2371ā€‰T1-weighted whole-body MRI data sets

DenseNet architecture

[54]

8454 subjects over five years of follow- up

XGBoost, SVM, LR, RF, and ensemble algorithms

[64]

16,429 men and non-pregnant womenā€‰ā‰„ā€‰20ā€‰years of age

ANN, LR, and RF models

[55]

453,487 T2DMĀ patients

Reverse engineering and forward simulation (REFS)

[124]

82 obese women (40 non-diabetic and 42 diabetes)

Separability-correlation measure (SCM) and ANN

[57]

13,309 Canadian patients

GBM and LR

[92]

Kaggle diabetes dataset

RF

[58]

1492 healthy individuals

SVM

[59]

10 patients

LR, CNN, Multi-Layer Perceptrons (MLPs), and ensembling methods

[60]

4870 subjects (2955 females and 1915 males)

Bayes classifier and LR

[79]

768 individuals, 500 healthy and 268 with T2DM (UCI Machine Learning Repository: Pima Indians diabetes data set)

AIRS2 and MAIRS2

[61]

Pima Indian women

DT and LR

[65]

2970 youth aged 12ā€“19ā€‰years (NHANES dataset)

LR, LogitBoost, and decision tree

[66]

Ā Ā 746 subjects

SVM, XGBoost, RF, and their combinations

[62]

GDM

Ā Ā 22,242 singleton pregnancies (3182 women developed GDM)

RF, logistic, decision tree, XGB, GDBT, LGB, AdaBoost, Vote, logistic regression with RCS and stepwise logistic regression

[75]

Ā Ā 490 pregnant women, 215 with GDM and 275 controls

SVM and light gradient boosting machine (lightGBM)

[76]

Ā Ā 588,622 pregnancies from 368,351 women

Gradient-boosting machine model constructed by decision-tree base-learners

[74]

Ā Ā 4378 cases

CSHM, BN, LR, CHAID tree, SVM, and NN

[77]

Ā Ā 152 women

AIRS

[80]

Ā Ā 4771 pregnant women in early gestation

Multivariate Bayesian logistic regression using Markov Chain Monte Carlo simulation algorithm

[81]

All types of Diabetes

2001 cases with diabetes (Kaggle dataset)

Filter based DT-(ID3) algorithm for features selection and Hold out, K-fold, and LOSO for classification

[83]

Ā 852 454 individuals with pre-diabetes

LightGBM

[86]

Ā 1050 curves of glucose concentration of type 1 and type 2 diabetics

Double-Class AdaBoost

[87]

Ā 268 females and 500 controls

Gaussian process (GP)-based classification approach

[88]

Ā 5301 African Americans

RF

[89]

Ā 268 females and 500 controls

Fuzzy c-means (FCM)- on adaptive network-based fuzzy inference system (ANFIS)

[90]

Pima Indian Diabetes Dataset and Biostat Diabetes Dataset

RLEFRBS

[95]

Prediction of blood glucose (BG)

OhioT1DM dataset: six participants with T1D between 40 and 60ā€‰years old

SVM, extended tree classifier (ETC), and random forest classifier (RFC)

[121]

IDIAB, OhioT1DM dataset, and T1DMS datasets

Fully convolutional neural network

[101]

225 T1DM patients with 315,000ā€‰h of CGM data

Linear extrapolation, NNs, last observation carried forward, ensemble methods using LSBoost and bagging, one with error-weights, and one without error-weights

[122]

Blood glucose concentration values of 180ā€‰h in diabetic patients and GCM of every 5ā€‰min

Multi-scale blood glucose prediction model (VMD-KELM-AdaBoost)

[102]

OhioT1DM dataset

Autoregression with ARX model, ML-based regression models, and DL models including a TCN and a vanilla LSTM Network

[104]

Ā 10 adult

T1DM subjects which was generated using the UVA/Padova T1D

Multi-layer convolutional recurrent neural network (CRNN) architecture

[105]

OhioT1DM dataset

LSTM-based deep RNN

[107]

104 people who had experienced at least one hypoglycemia alert value during a three-day CGM session

SVM using radial basis or linear functions, RF, LR, and K-nearest neighbor

[116]

10 T1DM patients with continuous glucose monitoring system data points

A combination of AR, SVR, and ELM

[109]

10 T1DM adults studied during 12ā€‰weeks

SVM and MLP

[119]

10,000 users with more than 1Ā million nights of CGM data

RF

[120]

26 participants

LSTM-NN-TF-DTW model

[143]

8501 eligible participants

LASSO regression and RF

[130]

124 CGM traces collected over 10ā€‰days

Autoregressive, autoregressive moving average, and autoregressive integrated moving average (ARIMA)) and nonlinear machine-learning procedures (SVR, feed-forward neural network (fNN), regression random forest, and LSTM-NN

[110]

Six subjects suffering from T1DM aged between 23 and 52 (average 39ā€‰Ā±ā€‰10)

Jump Neural Network

[154]

124 people (22,804 valid nights of data) with T1D

SVR

[117]

Ā 463 people with T1DM

Linear discriminant analysis

[118]

154 observations of in-clinic aerobic exercise in 43 adults with T1DM

Decision tree and Random forest

[126]

16 children with T1DM

Extreme learning machine (ELM)-based neural network

[127]

8 patients (320 data points) and a testing set with 8 patients (269 data points)

ELM trained feed-forward neural network

[128]

24Ā 331 adults

Bayesian scoring algorithm

[78]

Ten male subjects with T1DM

pattern classification algorithm

[125]

27,050 adult individuals with no prior diagnosis of T2DM

XGBoost, RF, Glmnet, and LightGBM

[112]

6.8Ā million data points

Combination of GBD and SVR

[131]

10 patients using 70ā€‰mg/dL and 54ā€‰mg/dL as thresholds according to the consensus for Level 1 and Level 2 hypoglycemia

Developed SVM

[129]

The health data associated with 18 691 ICU stays and 14 742 critical care patients (MIMIC-III database)

GBT

[132]

29,601 entries from 47 different patients

SVR, WNN, KNN, RFR, GPR, ANN, and RR

[134]

54 978 inpatients who had a minimum of 4 BG measurements and took a minimum of 1 U of insulin during hospitalization

RF classification, multivariable logistic regression, stochastic gradient boosting (SGB), and naive Bayes

[114]

25 T1DM patients

RF

[135]

OhioT1DM dataset

LR, vanilla LSTM, and BiLSTM

[137]

Detection of blood glucose

12 healthy subjects

Back-propagation neural network (BPNN) and multivariate polynomial regression

[151]

540 patients with T2DM

Nonlinear and linear predictive algorithms

[152]

2787 consecutive participants

Combination of elastic network with RF, SVM, and back-propagation artificial neural network (BP-ANN) algorithms as well as LR

[155]

1772 paired data varying from

65ā€‰~ā€‰492ā€‰mg/dl and 80ā€‰~ā€‰352ā€‰mg/dl

AdaBoost

[156]

15 patients with T1DM under free-living conditions

RReliefF, RF, Gaussian, SVR

[157]

Ā EMR of 127 patients for the first 72ā€‰h of ICU care who upon admission to the ICU had a diagnosis of type 1 (Nā€‰=ā€‰8) T2DM (Nā€‰=ā€‰97) or a glucose valueā€‰>ā€‰150ā€‰mg/dl (Nā€‰=ā€‰22)

GBT

[133]

Insulin resistance predicting models

8842 Koreans participants

LR, XGBoost, random forest, and ANN

[159]

1344 samples

HOMA-IR model

[160]

2433 T2DM patients

MIL-Boost

[161]

968 patients not

affected by T2DM (FIMMG_obs dataset)

TyG-er

[162]

Ā 315 T1DM patients

MARSplines and ANN

[163]

Determination of the start of treatment and its effect

13 904 diabetes individuals

LASSO

[164]

100 virtual adult subjects

LASSO and MLR

[166]

100 virtual subjects

GBT and RF

[165]

87 patients

Reinforcement learning

[168]

The two studies had a similar design but enrolled patients who were treatment- naĆÆve (study 1, nā€‰=ā€‰677) or receiving background metformin (study 2, nā€‰=ā€‰686)

RF and classification tree algorithms

[169]

12,147 commercially-insured adults and Medicare Advantage beneficiariesĀ with prediabetes or diabetes

RL both with and without regularization and/or stepwise feature selection, Tree-based models, SVM, multivariate adaptive regression splines, and flexible discriminants

[170]

1270 patients with T2DM

Weighted SVM

[171]

100 virtual adults

Neural networks

[167]

3029 patients

Logistic ML algorithm

[172]

Risk assessment of Diabetes

25,186 patients

Regularized and weighted RSF

[177]

273 678 patients

DeepSurv and RSF

[178]

11,000 persons

RF classifier

[179]

15,928 Chinese adults without

diabetes at baseline (DRYAD)

XGBoost

[180]

1,832, 270 cases of type 2 diabete

Gradient boosting decision tree algorithm and LightGBM

[181]

6025 participants

Naive Bayes approaches and LR

[182]

40,124 patients from the GIANTT database

Ridge logistic regression, logistic regression with backward selection, LASSO LR, elastic net logistic regression, and RF

[183]

36,652 eligible participants from the Henan Rural Cohort Study

Classification and regression tree (CART), RF, GBM, LR, SVM, and ANN

[184]

997 subjects with CT scans and contextual EMR scores

Deep neural network

[185]

17,658 in-patients with diabetes who underwent 32,758 admissions

LR and XGBoost

[186]

10,464 diabeticĀ patients

LR

[187]

34 patients

Aggregation method

[188]

1647 obese, hypertensive

patients

KNN and RF

[189]

800 T2DM patients

BN, ANN, CRT, CHAID, discriminate, QUEST, and ensemble models

[190]

112 patients over a range of 90ā€‰days

LR and RF

[191]

Dietary and insulin dose modifications

23 adults with newly diagnosed T2DM

Algorithm-based personalized postprandial-targeting

[194]

100 adults under different realistic scenarios lasting three simulated months

Reinforcement learning

[195]

Diabetes management

12 subjects with T1DM

Linear discriminant analysis, ensemble learning, Gaussian process regression, KNN, SVM, decision trees, and deep neural networks with LSTM

[199]

16,848 inpatients receiving subcutaneous insulin who achieved target blood glucose control of 100ā€“180

mg/dL on a calendar day

A combination of RF, regularized regression, and GBT

[200]

110 pediatric patients

with T1DM

RF and quantile regression forest

[202]

68,274 samples collected from 1119 subjects

Deep learning

[204]

116 subjects

SVM

[205]

D1NAMO dataset contains data for nine patients with T1DM

RNN-LSTM

[207]

70 participants with T1DM

K-means clustering

[208]

100 subjects over a two-month scenario

XBM

[209]

250 24ā€‰h CGM plots

SVR and multilayer perceptrons

[210]

15 patients with T1DM

SVR

[211]

3 real subjects

Multiple boundaries and domain-based, density-based, reconstruction-based, and unsupervised models

[212]