### Data for model development

We aggregated data from three prior diabetes and exercise studies to determine the significant predictors and to test potential model structures. These prior studies were conducted in Virginia, USA [10]; Sao Paolo, Brazil [11]; and Quebec, Canada [12]. The aggregated dataset represents 56 individuals with T2DM performing 488 exercise sessions. All individuals were taking an oral diabetes medication and had complete data records for the following variables: 1) pre-exercise blood glucose (measured no more than five minutes prior to the start of exercise); 2) post-exercise blood glucose (measured within five minutes of exercise termination); 3) age; 4) sex; 5) hemoglobin A1c; 6) metformin status (dummy coded); 7) sulfonylurea status (dummy coded); 8) exercise session number; 9) minutes since last meal; 10) exercise duration; and 11) % of age adjusted maximum heart rate (% AAMHR) during exercise. Additional file 1 includes a table describing the three original datasets that were combined to create the development dataset.

### Data for testing the model

The data used to test the model were collected between December 2009 and November 2011 from participants in a supervised community exercise group at the University of Utah. For this analysis, we included data from participants taking an oral diabetes medication who had complete records on the variables identified as significant in the model development phase. While data on individuals’ baseline physical activity levels was not available, program staff stated that, similar to the trials data, the majority of participants reported in engaging in little to no physical activity in the 12 months preceding enrollment.

Since both datasets were de-identified and the data collected from the diabetes exercise group at the University of Utah was routinely collected on all participants, this study was approved with a waiver of informed consent by the University of Utah Institutional Review Board.

### Assessment of model error in relation to accuracy of glucose measurements

The glucometers used in both the retrospective dataset for model development and the prospective dataset for model testing are designed for individual self-monitoring. These glucometers are required to meet the International Organization for Standardization (ISO) specification for measurement error: ± 0.83 mmol/L if blood glucose is less than 4.2 mmol/L and ± 20% otherwise [13]. Therefore, model error in this study was defined as an error greater than the measurement error of these devices.

### Transformation of minutes since eating

Based on prior evidence suggesting that the glycemic lowering effect of exercise is greater postprandially than pre-prandially [14], a dummy variable was created for the postprandial state (≤ 120 minutes since eating when exercise began vs. >120 minutes).

We also performed a non-linear transformation of minutes since eating. The first component of this transformation was intended to model the variation in postprandial insulin levels [15]. The second component of the transformation was intended to model the increasing effect of counter-regulatory hormones as the time since eating increased beyond 180 minutes. The effect of these hormones is to promote glycogenolysis and therefore a smaller decrease, or even an increase, in glucose compared to postprandial exercise [4, 16].

For the transformation, we first divided the minutes since eating variable into two vectors. The first vector was the range of minutes since eating ≤180. This was multiplied by *π*, sine transformed, and normalized from 0–1. The second vector was the range of minutes since eating >180. This vector was normalized from 0 to 1 and multiplied by -1. The two vectors were then recombined (see Additional file 1 for graph displaying the transformed variable in relation to the variable prior to transformation).

### Determination of significant predictors

The following twelve variables were candidate predictors: pre-exercise blood glucose, age, sex, hemoglobin A1c, metformin status (dummy coded), sulfonylurea status (dummy coded) exercise session number, minutes since last meal (as a linear predictor), non-linear minutes since meal, post-prandial state, exercise duration, and percent of age adjusted maximum heart rate during exercise (% AAMHR).

For variable selection, we used a mixed-effects LASSO (Least Absolute Shrinkage Selection Operator) procedure. As with ordinary multivariable linear regression, the LASSO minimizes the sums of squares, but does so contingent upon the sum of the absolute values of the model coefficients being less than a tuning parameter, S. The result of this penalization is that some model coefficients are constrained to zero while the absolute value of other coefficients increase [17]. The mixed-effect LASSO accounts for the repeated measures within subjects, and the unbalanced structure of the data (i.e., varying number of exercise sessions for individual subjects).

Since our goal was to use the LASSO procedure to identify novel predictors, we “forced” the predictors previously identified to be significant by Jeng at al. (pre-exercise glucose, % AAMHR, and exercise duration) (8), into the model by including them unpenalized in the LASSO. We then systematically decreased the penalization constant, in 0.1 decrements beginning from a point at which only the unpenalized predictors were included, to a value of 0 (no penalization). From the set of potential models output by the LASSO, we selected the model for which the Bayesian Information Criterion (BIC) statistic was minimized.

### Determination of relative importance of predictors

Using the predictors that remained with non-zero coefficients after the LASSO, we applied the Lindemann Merenda and Gold (LMG) algorithm to estimate the relative importance of each of the predictors. This algorithm calculates the contributed proportion of variance explained for each predictor averaged over orderings among predictors [18]. Since this algorithm does not account for repeated measures within subjects, only data from the first exercise session was used for this estimation.

### Rationale for mixed effects model

The data used in this analysis encompasses variation at two levels: variation between subjects and variation within subjects (repeated measures of blood glucose levels from the same individual). We chose a mixed effects model because it can account for these two forms of variability and improve the estimation of population level (fixed) effects [19]. In addition the accuracy of these models improves as the individual contributes more data, supporting our goal of developing a practically useful model [20].

### Testing of model structure

We used a series of likelihood ratio tests to compare the baseline mixed model (random intercepts only; grouped by subject ID) to more complex models. These tests were done sequentially, based on the relative importance of predictors estimated by the LMG algorithm. Predictors that vary within subjects were modeled as random slopes (e.g. individual-specific coefficients for pre-exercise glucose). Predictors that vary between individuals were modeled as grouping factors for random intercepts (e.g. individuals grouped within levels of hemoglobin A1c). Within the lme package in R, random effects are estimated as components of the full model using the expectation–maximization algorithm [21].

### Cross validation of model in development dataset

The resulting model was then tested in a leave one out cross-validation using the development dataset. We calculated model error as the percentage of predictions that were within measurement error.