Skip to main content

PHQ-9, CES-D, health insurance data—who is identified with depression? A Population-based study in persons with diabetes



Several instruments are used to identify depression among patients with diabetes and have been compared for their test criteria, but, not for the overlaps and differences, for example, in the sociodemographic and clinical characteristics of the individuals identified with different instruments.


We conducted a cross-sectional survey among a random sample of a statutory health insurance (SHI) (n = 1,579) with diabetes and linked it with longitudinal SHI data. Depression symptoms were identified using either the Centre for Epidemiological Studies Depression (CES-D) scale or the Patient Health Questionnaire-9 (PHQ-9), and a depressive disorder was identified with a diagnosis in SHI data, resulting in 8 possible groups. Groups were compared using a multinomial logistic model.


In total 33·0% of our analysis sample were identified with depression by at least one method. 5·0% were identified with depression by all methods. Multinomial logistic analysis showed that identification through SHI data only compared to the group with no depression was associated with gender (women). Identification through at least SHI data was associated with taking antidepressants and previous depression. Health related quality of life, especially the mental summary score was associated with depression but not when identified through SHI data only.


The methods overlapped less than expected. We did not find a clear pattern between methods used and characteristics of individuals identified. However, we found first indications that the choice of method is related to specific underlying characteristics in the identified population. These findings need to be confirmed by further studies with larger study samples.

Key points

  • Patients with diabetes often have comorbid depression. Those patients are struggling to meet their treatment goals. Thus, they have a higher risk of getting diabetes related complications as for example coronary heart diseases.

  • A lot of different tools and instruments are available to diagnose depression, to screen for depression among patients with diabetes or to identify depression symptoms or depressive disorder in clinical or epidemiological studies, including interview, questionnaires or claims data. It would be helpful to know if the tools that are used identify the same people or, if this is not the case, whether people identified by different tools have different characteristics or health outcomes.

  • We found that different methods do not identify the same people with depression. There was no clear pattern of differences between the identified groups, however, we found some initial indications that the method chosen is related to particular underlying characteristics in the population identified. Further research with larger data sets is necessary to see if there are differences among the persons that are identified by different tools to give recommendations which screening tool to use for what purpose.


Patients with diabetes have an increased prevalence of depression compared to the general population [1]. Although it remains controversial if diabetes leads to depression or vice versa or if there is a bidirectional association, there is sufficient evidence that depression can have a serious impact on a person’s wellbeing and their ability to self-manage their diabetes [2,3,4,5]. Individuals with diabetes and comorbid depression are found to have unfavorable diabetes related outcomes such as a reduced adherence to their diabetes treatment, higher HbA1c levels, increased diabetes symptoms, or unfavorable micro-, and macrovascular outcomes [2,3,4,5,6]. Beyond unfavorable health outcomes, Brüne et al. (2021) found that people with diabetes and depression had almost two times higher total health care cost compared to people with diabetes without depression [7]. Despite the relevance of comorbid depression in people with diabetes, it is assumed that only 50% are recognized and an even smaller amount is appropriately treated [2].

Several methods are used to identify depression or to estimate the prevalence of it. Prevalence estimates of depression among people with diabetes differ, which is also due to the fact that a range of different methods are used to assess depression [1, 8]. Three systematic reviews found, that in studies where a questionnaire was used to assess depression, the prevalence was about two to three times higher than in those that used a diagnostic interview [8,9,10].

The method used to assess the presence of depression depends on several factors. For example, it may depend on study design, time constraints, personal preferences of the researchers, availability or the aim of the assessment. Furthermore, there are a variety of questionnaires, each with a different objective and somewhat different background or focus [11,12,13,14]. Knowledge of the different methods and instruments to assess depression is therefore important. Up to now, there are a number of studies available that validate these questionnaires in general [15, 16]. Very few studies have compared the different instruments for identifying depression among patients with diabetes. These studies either intended to validate a certain instrument against another in a specific population or wanted to compare psychometric properties or internal reliability [17,18,19,20].

A method other than questionnaires is the use of diagnosis in statutory health insurance (SHI) data to identify persons with depressive disorder. Up to now, there is no study, in which SHI data was used for comparison purposes. In our study, we used two of the most common instruments in addition to SHI data to investigate whether the different methods identify - more or less - the same individuals or whether they identify different individuals. In particular - if the identified individuals differ - we were interested in possible patterns of characteristics of the identified groups. Thus, in contrast to existing validation studies, the aim of this study was to assess and describe in detail the overlap and the differences between groups identified by different methods to find persons with depression (symptoms or disorders), as well as potential associations between individual and clinical characteristics and the method used to identify a person.

Specifically, three methods to identify depression were used and compared: the Centre for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9) - the two most frequently evaluated questionnaires among people with diabetes [20] - or a diagnosis in SHI data. In this way, we aimed to gain basic insights and better understand the issues associated with the use of different methods.


Study design

The study design and recruitment of participants have been described elsewhere [21]. In brief, a cross-sectional survey was conducted in a random sample of individuals with diabetes (N = 4,053) insured by one SHI covering 673,366 persons in Germany. Individuals with diabetes type 1 or 2 were identified using an algorithm taking into account diagnosis based on the 10th International Classification of Diseases (ICD-10) for ‘diabetes’ (E10–E14), prescription of antihyperglycemic drugs (Anatomical-Therapeutic-Chemical [ATC] classification A10), and documentation of blood glucose, or a HbA1c measurements. This algorithm has been validated and used in previous studies [22]. We linked data of the survey to longitudinal SHI data on an individual level. The initial aim of the study was to assess differences in people with diabetes and with and without depression regarding costs and health related quality of life. The presented analyses are secondary analyses that were developed in the course of the study.

Data source

The baseline survey was a 9-page postal questionnaire conducted in 2013. It assessed information on sociodemographic characteristics such as age, sex, and years of education, duration of diabetes, and type of diabetes. PHQ-9 and the German version of the CES-D were used to assess depression symptoms.

SHI data on health care utilization patterns and health care costs for all in- and outpatient treatments were available for the period covering four quarters before and after the quarter of the baseline survey.

Study population

Of 46,566 individuals with diabetes in the SHI 3,642 persons were randomly selected and contacted to participate in the study. In total 1,860 persons sent back their questionnaire (response rate: 51%) and gave written informed consent to use their SHI data. Responders did not differ from the non-responders in having a history of depression diagnosis [23]. For 201 of these persons, a lack of data over the complete observation period existed, e.g. because the person switched health insurance during that time. In total 1,659 persons were considered for the analysis. Further 80 persons were excluded as they provided incomplete information in the questionnaire. Thus, a total of 1,579 persons were included in our analysis (Appendix Fig. 1).

Ethical approval was obtained from the ethics committee of the Heinrich Heine University Düsseldorf and is available under the study reference 3762.

Main outcome – assessment of depression


The CES-D and the German version of it (Allgemeine Depressionsskala) are brief self-report measures, designed to assess symptoms of depression in the general population in epidemiological studies among nine signs and symptoms of depression defined by the American Psychiatric Association Diagnostic and Statistical Manual, fourth edition [11]. Several studies have assessed the validity of the CES-D in different populations [24, 25]. We used the short form of the German version of the CES-D in our study (allgemeine Depressionsskala Kurzform (ADS-K)) [25]. The instrument comprises 15 statements regarding depression. Based on a four-point scale (ranging from “rarely or never” (0 point) to “frequently, all the time” (3 points)), the frequency of depressive symptoms occurring during the last week can be assessed. A score that can range from 0 to 45 is built by adding up the points from each statement. We used a cut-off value of ≥ 17 to define clinically meaningful depressive symptoms as suggested by validation studies [25].


The PHQ-9 is a multipurpose instrument used to screen, monitor and measure the severity of depression symptoms. The PHQ-9 can be assessed using different methods: as a diagnostic algorithm to make a probable diagnosis of major depressive disorder using the nine criteria of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) or to test for other depressive disorders and a cut-off based on summed-item scores to assess the severity of depression symptoms [12]. The algorithm is the scoring method that was originally proposed to screen for depression. Within this study we focused on the PHQ-9 as a screening instrument. According to Kroenke et al. (2001) we defined depression when two or more of the nine symptoms were present at least “more than half the days” in the past two weeks, and one of the symptoms was depressed mood or anhedonia. If the thought of suicide was present, it is considered to be present, regardless of the reported duration [12]. Several studies have used the PHQ-9 to assess depression among individuals with diabetes and used a similar approach [2, 26].

Depression in SHI data

For a diagnosis in SHI data a ICD-10 code for the diagnosis of unipolar depression during the study period of nine quarters was required. Diagnosis of unipolar depression included the following codes:

F32.0-F32.9 Depressive episode,

F33.0-F33.9 Recurrent depressive disorder,

F34.1 Dysthymia,

F38.1. Other recurrent mood [affective] disorders and.

F41.2 Mixed anxiety and depressive disorder.

Group composition based on depression measurement

We classified the participants into eight groups after linking SHI data with survey data. Group 1 reported depression symptoms in the CES-D and PHQ-9 and had a diagnosis in SHI data. Group 2 reported depression symptoms in the CES-D and PHQ-9 but had no diagnosis in SHI data. Group 3 had symptoms according to the PHQ-9 but not according to the CES-D and had a diagnosis in SHI data. For group 4 no symptoms were reported with the PHQ-9 but with the CES-D and they had a diagnosis in SHI data. Group 5 was only identified with the CES-D, Group 6 only with the PHQ-9 and group 7 only with a diagnosis in SHI data. Group 8 had no depression symptoms or diagnosis and was considered as a reference group (Appendix Table 1).

Possible associated variables and covariates

All potentially associated variables and covariates considered as potential predictors were recorded during the baseline survey, except information on clinical and disease related measures (based on SHI data). Based on a literature review and clinical expertise, we considered socio-demographic variables, patient-reported measures on health-related quality of life (HRQoL) and diabetes related distress as well as clinical and disease-specific variables.

The following variables were included as sociodemographic factors: age, gender, marital status (married, single, divorced or separated, widowed), relationship status (with/without partner), origin (resident in Germany since birth/not residing in Germany since birth) as well as employment (yes/no), and retirement status (yes/no). The International Standard Classification of Education (ISCED) was used to categorize participants according to the duration of their education (< 10 years, 10–14 years, > 14 years) [27]. Furthermore, type and duration of diabetes were also assessed in the baseline survey as well as information on a previous diagnosis of depression by a health professional.

HRQoL was investigated using the 12-item Short Form health survey (SF-12), a multipurpose generic measure of health status [28]. The SF-12 can be used to compose a physical health and a mental health summary score (PCS-12 and MCS-12).

We also assessed diabetes-specific distress using the Problem Areas in Diabetes Scale (PAID), a 20-item scale consisting of emotional problems commonly reported among patients with diabetes [19].

SHI data was used to assess clinical and disease related measures. Comorbidities were measured using diagnostic groupings, which are necessary for the morbidity-oriented risk structure adjustment by SHIs in Germany. We used the number of coded morbidity groups in the year prior to the baseline survey (2012) to assess the number of comorbidities [29].

Healthcare costs were calculated from the perspective of a SHI including all costs imposed to the SHI. We took net costs into consideration without taking discounts into account. Costs were analyzed for every person individually, covering the survey quarter plus the four quarters before and after, a total of nine quarters.

The adapted Diabetes Complications Severity Index (aDCSI) was used to assess diabetes complications thereon to determine diabetes severity [30].

Treatment of diabetes was assessed by looking for prescription of insulin or oral antihyperglycemic drugs (OADs) in the SHI data for each participant during the course of the study. Additional it was checked whether persons took antidepressants during the course of the study. These were defined by the ATC Code N06A.

Statistical analyses

We described the study population by using mean, standard deviation and median for quantitative variables as well as frequency and percentage for categorical variables. We used the Mann–Whitney U test for comparison of quantitative variables in two groups and Kruskal-Wallis test for three and more groups. Pearson’s chi-square test was conducted to assess if differences for categorical variables were significant. P-values related to the aforementioned tests show the probability to observe the actual value of the related test statistic or even more extreme values of it assuming the null hypothesis that there are no differences between groups. Smaller p-values indicate against the null hypothesis.

To compare the eight groups, we handled the missing data (cf. description of the study population and Table 1) with the machine learning based R-algorithm missForest to impute. To assess the quality of the imputation we calculated out-of-bag (OOB) imputation errors as the proportion of false classified cases (PFC) for categorical and as normalized root of mean squared error (NRMSE) for quantitative variables. Since the comparison of all eight groups to each other (the so called many-to-many problem) requires 28 pairwise comparisons, each with respect to a variety of characteristics, one should expect a considerable number of false rejections/effects. In order to be able on the one hand to control the type I error (i.e., rejection of at least one true null hypothesis, also known as family-wise error rate) and on the other hand to see any effects after multiple adjustment (done by the Bonferroni correction), we focused on the comparison of seven groups with depressive disorder to the group with no depression or depressive symptoms (i.e., group 8) as the reference group (the so called many-to-one problem). We used a multinomial logistic regression to model the group membership, whereby the log odds of being in one group relative to being in the reference group is modelled as a linear combination of predictor variables. Thus, an indirect comparison of seven groups with depressive disorder to each other may be done by comparing those differences to the reference group.

Gender, age, marital status, employment status, type of diabetes and diabetes duration, insulin and OAD usage, aDCSI score, previous depression and intake of antidepressant medication, number of comorbidities, HRQoL, PAID score, and total health care costs were used as potential candidates for independent variables in the multinomial model. We selected the finale multinomial model by keeping important variables (age, sex, comorbidities, MCS-12 and PCS-12), removing collinear variables as well as minimizing Akaike information criterion (AIC). The final model includes the independent variables: age, gender, taking insulin, previous depression, taking antidepressant, the number of comorbidities, HRQoL, and the PAID score.

P-values related to the estimates of the multinomial regression odds ratios (OR) for being in a group with depressive disorder compared to the reference group, are the probabilities to observe the actual value of the OR or more extreme values and under the null hypothesis that there is no effect (OR = 1). Smaller p-values are an indication, that null hypothesis may be wrong and there is an effect.

The significance level (also for multiple comparisons) was set to α = 0·05.


Description of the study population

Table 1 describes the 1,579 participants and their characteristics. For 271 subjects in the total sample (17·2%) data of at least one variable in the baseline survey were missing while 1,308 persons had complete data.

Participants had a mean age of 67 years and almost 40% were female. About 90% were German and 84% were in a relationship. About one in five had more than 14 years of education. Almost 70% of the participants were retired. More than 75% were married, around 7% were divorced or separated and 12·4% were widowed.

On average participants had diabetes for 11 years, the majority had T2DM (85·9%). About one-third of the participants were treated with insulin, around 67% took OAD. 17·5% took antidepressants. The mean healthcare costs in our sample were 10,123€. Participants had on average 41·7 points on the physical component summary scale (PCS) of the SF-12 and 50·1 on the mental component summary scale (MCS). The average PAID Score in the sample was 19·4. 14% of people in the sample reported that they had previously been diagnosed with depression.

Table 1 Baseline characteristics of the DiaDec sample

Prevalence of depression according to the different methods

Figure 1 displays overlaps between the different methods and reports the overall prevalence within the sample. In total 33·0% of our analysis sample (521) were identified with some form of depression by at least one method. The prevalence of depression in our sample ranged from 11·6% (PHQ-9) up to 22·4% (SHI data).

Fig. 1
figure 1

Venn diagram showing the persons identified by different methods to assess depressive disorder and intersections between the different methods

The different groups and their characteristics are described in Table 2. Group 8 – the reference group - was the largest group with 1,058 persons and group 3 identified through the PHQ-9 and a diagnosis in SHI data the smallest with 22 persons. With respect to sociodemographic variables the percentage of females was highest in group 7 (51·7%) while it was lowest in group 6 (33·3%). Group 7 (only identified by a diagnosis in SHI data) was the group with most persons being German of origin (92·7%) and group 2 (identified by both instruments) the one with the smallest number of persons with German origin (76·1%). In group 8 most people were in a relationship (87·7%) and group 1 (identified by all methods) was the group where the smallest number of persons was in a relationship (70·1%). One third of group 6 (identified through PHQ-9 only) were retired but only about 52% of the persons in group 1. A duration of education for more than 14 years was highest in the group 8 (23·2%) and in group 1 (20·3%) and lowest in group 2 (15·2%). Group 1 had also the highest share of persons with type 1 diabetes (12·6%). With regard to diabetes specific and health care related outcomes, the highest number of persons with type 1 diabetes was found in group 1(12·8%) and the lowest amount was found in group 7 (4·9%). Group 2 and 3 had the highest share of persons taking insulin (43·5 and 50%) whereas in all other groups the share was around 30%. For OAD in all groups the share of persons taking them was between 60 and 70%. Average health care costs were highest in group 3 with a median of more than 13,900 € and lowest in group 8 (median 5,283 €).

Table 2 Comparison of groups according to depression status

Looking at HRQoL, the average score on the PCS12 was highest in group 8 (median 47·3) and group 7 (median 42·3) and lowest in group 1 (median 30·6). These findings were similar for the MCS12. The average PAID score was highest in group 1 (median 45·0) and lowest in group 8 (median 10·0) and group 7 (median 15·0). In group 1 was the highest share of persons reporting a previous depression (67·1%) and in group 8 the lowest share (4·9%).

Results of the multinomial model

Table 3 reports the results of the multinomial logistic regression model with imputed data (the OOB imputation errors are reasonably small ranging from 0·086 to 0·71), comparing the seven groups with depressive disorder with the reference group with no depressive disorder (i.e., group 8). Overall, several differences in associations with the independent variables and the groups identified by the three methods were identified (even Bonferroni adjusted). We did not find a clear pattern between methods used and characteristics of individuals identified. However, we found some remarkable points.

First, we observed that a person who took antidepressants compared to a person who did not take antidepressants was 12 times (or for that matter about 9, 8 and 7 times) more likely to be in group 3 (group 1, 7 or 4, respectively) than in the reference group, i.e., OR = 12·00 (8·94, 8·31 and 7·25, respectively). These four groups are characterized by a diagnosis in SHI data. Contrastingly, in groups not identified through a diagnosis in SHI data, i.e., groups 2, 5 and 6, the estimated effects of taking antidepressants were considerably smaller and even not significant for groups 5 and 6 (depression symptoms according to CES-D and PHQ-9 only). A quite similar pattern was noticed for reporting previous depression and comorbidities: Persons reporting a previous depression where significantly more likely to be in one of the groups identified through a diagnosis in SHI data (group 1, 3, 4 and 7) compared to the reference group and people with more comorbidities were more likely to be groups 4 and 7 (both identified through SHI diagnosis).

Second, women were almost twice more likely to be in the group with an SHI data-based diagnosis only (group 7) than in the reference group (OR = 1.86). But there were no further significant associations related to other groups.

Third, age was a significant factor for group membership probability. With each year of life, it is less likely to be in any group with depressive disorder than in the reference group (all OR’s are less than one), however, not significant for groups without SHI-based depression diagnosis. Low HRQoL values and especially low MCS-12 values were associated with belonging to any group but not the one identified by SHI data only, each in comparison to the reference group. We observed that a person with low MCS-12 is significantly more likely to be in a group with both symptoms according to PHQ-9- and CES-D (i.e., group 1 and group 2) than in any other group. Furthermore, the results regarding the PAID Score point in the same direction, values were associated with belonging to any group (except the smallest group) but not the one identified by SHI data only.

Table 3 Results of the multinomial model reporting odds ratio (OR) and 95% confidence intervals (95% CI) for belonging to the different groups compared to belonging to the group with no depression (group 8)


National and international guidelines recommend screening people with diabetes for depression to identify patients in need of psychological treatment [31, 32]. However, neither of these guidelines give detailed instructions on which screening instrument to use or describe the differences for the identified groups. A recent meta-analysis of diagnostic accuracy of depression questionnaires in adult patients with diabetes by de Joode et al. (2019) showed, that the CES-D and the PHQ-9 are the most frequently evaluated depression questionnaires among patients with diabetes [20]. They differed in terms of sensitivity and specificity, however none of the two instruments was found to be superior over the other. The results of our study show that between 14·6% and 22·4% of individuals with diabetes had depression depending on the method used to assess it. High prevalence estimates can be expected, since on the one hand, there is evidence that depression is a risk factor for diabetes and, on the other hand, studies show that the distress caused by diabetes contributes to the development of depression [1, 8, 33,34,35,36]. The results of our study are within the range of findings from the two most recent meta-analyses on depression among persons with diabetes where prevalence ranged from 1·8% up to 88·0% [1, 8]. One could assume, that both instruments used would identify more or less the same persons since they both measure depression symptoms within the last or the last two weeks. One could also assume an overlap between the two instruments and the persons identified through SHI data, however this overlap would be expected to be a little less pronounced as SHI data covers diagnosis from two years. We indeed found some overlap between the methods; however, surprisingly the majority of persons was identified by one instrument only (20·7% of the total sample), 7·3% of the whole sample were identified using two methods and 5·6% were identified with all three methods. The largest number of persons was identified through SHI data only (group 7). In total 68·0% of those identified with depression in our sample were identified through SHI data of which 42·1% were also identified through one of the two instruments. The characteristics of individuals identified by either of the two instruments were quite similar. Within our sample women were more likely to belong to the group identified through SHI data only (group 7). This is in line with results of an analysis of routine German SHI data that found women are diagnosed more frequently than men in all age groups [37].

It seems that persons who have a diagnosis of depression in SHI data but do not show symptoms in either of the questionnaires (group 7) do not noticeably differ in their HRQoL when compared with the group with no depression. Neither were the reported scores for diabetes related distress high in this group.

Screening for depression among individuals with diabetes seems to be necessary since all groups identified through at least one questionnaire (groups 1–7) had more unfavorable outcomes compared to the group with no depression.

Our findings show that, even though the same disease should be measured, the degree of variability in persons identified across the methods is substantial. If we would have used the PHQ-9 only we would have missed 133 patients who have depressive symptoms according to the CES-D but not according to the PHQ-9. Similarly, if we would have used only the CES-D we would have missed 58 persons who had symptoms according to the PHQ-9 but not according to the CES-D. Unfortunately, the differences between the groups were not pronounced enough to draw conclusions on which method is to be preferred.

To keep in mind: We found some indication that the method chosen to identify persons with a depressive disorder might be related to particular underlying characteristics in the population identified. To our knowledge, there is no study, which has used a similar many-to-one approach. It will be interesting to compare findings of future studies with larger samples.

Strength and limitations

To our knowledge, this is the first study which analyses groups identified by different instruments to assess depression, and includes also SHI data. The linkage of survey data with SHI data allows a more detailed description of the identified persons which would not be possible with using only one of the two data sources. The analyzed data set is rather large allowing for robust estimates. Moreover, the response rate was with 51·0% reasonably high for a survey-based study. An nonresponse analysis did not reveal any major differences between responders and nonresponders especially with respect to depression [23]. Thus, nonresponse bias should be small. However only persons of one SHI could participate in the study which might influence the results since, for example, the prevalence of diabetes varies among the different SHIs in Germany [38]. Survey data was only assessed at one point during the study period whereas the SHI data covers the whole study period thus the prevalence observed in SHI data might be, among other reasons, higher as the time frame during which it is assessed is longer. Moreover, it has to be kept in mind that a diagnosis in SHI data is not valid as a screening measure for depression since people with a diagnosis have most likely received some form of therapy. Furthermore, within SHI data we find clinical diagnosis whereas the results of the CES-D and the PHQ-9 are not clinical diagnosis but results of screening measures for depression. Additionally, it might be the case that once a person has received a depression diagnosis it will not be removed from the track record even though the person does not have depression anymore. Likewise, we could not get a full history of diagnosed depression but only data on depression diagnosis 12 month before and after the baseline assessment. Our focus is on acute depression, in line with the two instruments used during the baseline survey, which is not covered by a lifetime history of depression. Since we include a considerable time frame before and after the baseline assessment misclassification is assumed to be low.


Our study is the first study that describes the overlap and differences between individuals identified with different methods to detect depression. Although several characteristics were found to be associated with belonging to the different groups; we did not find a clear pattern among those characteristics. However, we have found some initial indications that the method chosen is related to particular underlying characteristics in the population identified. The methods have a relatively low overlap. The majority of persons were identified using a diagnosis in SHI data. Those identified through SHI data only did not differ in their HRQoL when compared to those with no depression. This could be either due to a successful therapy or due to a spontaneous relapse. Our study shows, that there might be similarities but also differences in characteristics of identified persons depending on the method used. By using either of the three methods, one should be aware that certain persons are missed. Therefore, further research with a comprehensive data set, that is sufficiently large in terms of case numbers, is needed to address the implications of using either of the methods.Especially prospective studies investigating clinical outcomes would be important. This knowledge is crucial to enable clinicians to make an informed decision about the usage of either of the two instruments in every day practice, taking into account setting, time constraints and other relevant circumstances.

Data availability

Due to ethical concerns, supporting data on the results of the questionnaire cannot be made openly available. Additionally, data of the statutory health insurance was already existing and was obtained upon request and subject to licence restrictions from a number of different sources. Full details how these data were obtained are available in the documentation available at:


  1. Harding KA, Pushpanathan ME, Whitworth SR, Nanthakumar S, Bucks RS, Skinner TC. Depression prevalence in type 2 diabetes is not related to diabetes–depression symptom overlap but is related to symptom dimensions within patient self-report measures: a meta‐analysis. Diabet Med. 2019;36:1600–11.

    Article  CAS  PubMed  Google Scholar 

  2. Katon WJ, Simon G, Russo J, Von Korff M, Lin EHB, Ludman E, et al. Quality of depression care in a population-based sample of patients with diabetes and major depression. Med Care. 2004;42:1222–9.

    Article  PubMed  Google Scholar 

  3. Egede LE, Ellis C, Grubaugh AL. The effect of depression on self-care behaviors and quality of care in a national sample of adults with diabetes. Gen Hosp Psychiatry. 2009;31:422–7.

    Article  PubMed  Google Scholar 

  4. Semenkovich K, Brown ME, Svrakic DM, Lustman PJ. Depression in type 2 diabetes mellitus: prevalence, impact, and treatment. Drugs. 2015;75:577–87.

    Article  CAS  PubMed  Google Scholar 

  5. Gonzalez JS, Safren SA, Cagliero E, Wexler DJ, Delahanty L, Wittenberg E, et al. Depression, self-care, and medication adherence in type 2 diabetes: relationships across the full range of symptom severity. Diabetes Care. 2007;30:2222–7.

    Article  PubMed  Google Scholar 

  6. Genis-Mendoza AD, González-Castro TB, Tovilla-Vidal G, Juárez-Rojop IE, Castillo-Avila RG, López-Narváez ML, et al. Increased levels of HbA1c in individuals with type 2 diabetes and depression: a Meta-analysis of 34 studies with 68,398 participants. Biomedicines. 2022;10:1919.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Brüne M, Linnenkamp U, Andrich S, Jaffan-Kolb L, Claessen H, Dintsios C-M, et al. Health Care Use and costs in individuals with diabetes with and without Comorbid Depression in Germany: results of the cross-sectional DiaDec Study. Diabetes Care. 2021;44:407–15.

    Article  PubMed  Google Scholar 

  8. Khaledi M, Haghighatdoost F, Feizi A, Aminorroaya A. The prevalence of comorbid depression in patients with type 2 diabetes: an updated systematic review and meta-analysis on huge number of observational studies. Acta Diabetol. 2019;56:631–50.

    Article  PubMed  Google Scholar 

  9. Anderson RJ, Freedland KE, Clouse RE, Lustman PJ. The prevalence of comorbid depression in adults with diabetes: a meta-analysis. Diabetes Care. 2001;24:1069–78.

    Article  CAS  PubMed  Google Scholar 

  10. Ali S, Stone MA, Peters JL, Davies MJ, Khunti K. The prevalence of co-morbid depression in adults with type 2 diabetes: a systematic review and meta-analysis. Diabet Med J Br Diabet Assoc. 2006;23:1165–73.

    Article  CAS  Google Scholar 

  11. Radloff LS, The CES-D, Scale. A self-report Depression Scale for Research in the General Population. Appl Psychol Meas. 1977;1:385–401.

    Article  Google Scholar 

  12. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hamilton M, A RATING SCALE FOR, DEPRESSION. J Neurol Neurosurg Psychiatry. 1960;23:56–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Beck AT. An inventory for Measuring Depression. Arch Gen Psychiatry. 1961;4:561.

    Article  CAS  PubMed  Google Scholar 

  15. Vilagut G, Forero CG, Barbaglia G, Alonso J. Screening for Depression in the General Population with the Center for Epidemiologic Studies Depression (CES-D): A Systematic Review with Meta-Analysis. van der Feltz-Cornelis C, editor. PLOS ONE. 2016;11:e0155431.

  16. Levis B, Benedetti A, Thombs BD. Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis.BMJ. 2019;l1476.

  17. Khamseh ME, Baradaran HR, Javanbakht A, Mirghorbani M, Yadollahi Z, Malek M. Comparison of the CES-D and PHQ-9 depression scales in people with type 2 diabetes in Tehran, Iran. BMC Psychiatry. 2011;11:61.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhang Y, Ting RZW, Lam MHB, Lam S-P, Yeung RO, Nan H et al. Measuring depression with CES-D in Chinese patients with type 2 diabetes: the validity and its comparison to PHQ-9. BMC Psychiatry [Internet]. 2015 [cited 2017 May 11];15. Available from:

  19. Hermanns N, Kulzer B, Krichbaum M, Kubiak T, Haak T. How to screen for depression and emotional problems in patients with diabetes: comparison of screening characteristics of depression questionnaires, measurement of diabetes-specific emotional problems and standard clinical assessment. Diabetologia. 2006;49:469–77.

    Article  CAS  PubMed  Google Scholar 

  20. de Joode JW, van Dijk SEM, Walburg FS, Bosmans JE, van Marwijk HWJ, de Boer MR et al. Diagnostic accuracy of depression questionnaires in adult patients with diabetes: A systematic review and meta-analysis. Cheungpasitporn W, editor. PLOS ONE. 2019;14:e0218512.

  21. Kvitkina T, Brune M, Chernyak N, Begun A, Andrich S, Linnenkamp U, et al. Protocol of the DiaDec-study: quality of life, health care utilisation and costs in patients with diabetes: the role of depression. J Diabetol Endocrinol. 2016;1:12–7.

    Article  Google Scholar 

  22. Icks A, Haastert B, Trautner C, Giani G, Glaeske G, Hoffmann F. Incidence of lower-limb Amputations in the Diabetic compared to the non-diabetic Population. Findings from Nationwide Insurance Data, Germany, 2005–2007. Exp Clin Endocrinol Amp Diabetes. 2009;117:500–4.

    Article  CAS  Google Scholar 

  23. Linnenkamp U, Gontscharuk V, Brüne M, Chernyak N, Kvitkina T, Arend W et al. Using statutory health insurance data to evaluate non-responsein a cross-sectional study on depression among patients withdiabetes in Germany. Int J Epidemiol. forthcoming

  24. Lehmann V, Makine C, Karşıdağ C, Kadıoğlu P, Karşıdağ K, Pouwer F. Validation of the turkish version of the centre for epidemiologic Studies Depression Scale (CES-D) in patients with type 2 diabetes mellitus. BMC Med Res Methodol. 2011;11:109.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Maksimović S, Ziegenbein M, Machleidt W, Sieberer M. [Measurement invariance of the german version of the Center for Epidemiological Studies Depression Scale (CES-D 20) among males and females with and without a history of migration]. Psychiatr Prax. 2014;41:324–30.

    PubMed  Google Scholar 

  26. Reddy P, Philpot B, Ford D, Dunbar JA. Identification of depression in diabetes: the efficacy of PHQ-9 and HADS-D. Br J Gen Pract J R Coll Gen Pract. 2010;60:e239–245.

    Article  Google Scholar 

  27. Organisation for Economic Co-operation and Development (OECD). Classifying educational programmes: manual for ISCED-97 implementation in OECD countries | VOCEDplus, the international tertiary education and research database [Internet]. Paris: UNESCO Institute for Statistics; [cited 2017 May 4]. Available from:

  28. Ware J, Kosinski M, Keller SD. A 12-Item short-form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34:220–33.

    Article  PubMed  Google Scholar 

  29. Buchner F, Goepffarth D, Wasem J. The new risk adjustment formula in Germany: implementation and first experiences. Health Policy Amst Neth. 2013;109:253–62.

    Article  Google Scholar 

  30. Chang H-Y, Weiner JP, Richards TM, Bleich SN, Segal JB. Validating the adapted diabetes complications Severity Index in Claims Data. Am J Manag Care Am J Manag Care. 2012;18:721–6.

    PubMed  Google Scholar 

  31. Nationale VersorgungsLeitlinie (NVL) Unipolare Depression Langfassung. 2. Auflage, Version. 1, 2015. AWMF-Register-Nr.: nvl-005 [Internet]. Available from:

  32. IDF Clinical Guidelines Task Force. Global Guideline for Type 2 Diabetes [Internet]. 2012 [cited 2017 Jul 26]. Available from:

  33. Rubin RR, Ma Y, Marrero DG, Peyrot M, Barrett-Connor EL, Kahn SE, et al. Elevated depression symptoms, Antidepressant Medicine Use, and risk of developing diabetes during the diabetes Prevention Program. Diabetes Care. 2008;31:420–6.

    Article  PubMed  Google Scholar 

  34. Meng R, Liu N, Yu C, Pan X, Lv J, Guo Y, et al. Association between major depressive episode and risk of type 2 diabetes: a large prospective cohort study in chinese adults. J Affect Disord. 2018;234:59–66.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Chen S, Zhang Q, Dai G, Hu J, Zhu C, Su L et al. Association of depression with pre-diabetes, undiagnosed diabetes, and previously diagnosed diabetes: a meta-analysis.Endocrine. 2016

  36. Qiao Y, Liu S, Li G, Lu Y, Wu Y, Ding Y, et al. Role of depressive symptoms in cardiometabolic diseases and subsequent transitions to all-cause mortality: an application of multistate models in a prospective cohort study. Stroke Vasc Neurol. 2021;6:511–8.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Grobe TG, Kleine-Budde K, Bramesfeld A, Thom J, Bretschneider J, Hapke U. Prävalenzen von Depressionen bei Erwachsenen – eine vergleichende Analyse bundesweiter Survey- und Routinedaten. Gesundheitswesen. 2019;81:1011–7.

    Article  PubMed  Google Scholar 

  38. Hoffmann F, Icks A. Diabetes “epidemic” in Germany? A critical look at health insurance data sources. Exp Clin Endocrinol Diabetes Off J Ger Soc Endocrinol Ger Diabetes Assoc. 2012;120:410–5.

    CAS  Google Scholar 

Download references


We thank Annett Fiege for collecting the data.


The study received funding from the German Federal Ministry of Education and Research (BMBF, No. 01GY1133), funding included peer review of the proposed research. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations



Linnenkamp and Andrich wrote the paper. Chernyak, Icks, Brüne and Kruse conceived and designed the experiments. Brüne, Kvitkina and Arend performed the experiments, Linnenkamp, Andrich, Gontscharuk, Ogurtsova, Icks, Hoffmann, Hermanns, Kulzer, Evers and Hilligsmann analyzed and interpreted the data, Schmitz-Losem contributed data. All authors carefully read and approved the final manuscript.

Corresponding author

Correspondence to Ute Linnenkamp.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Role of the funder/sponsor

The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Ethics approval

Ethical approval was obtained from the ethics committee of the Heinrich-Heine University Düsseldorf and is available under the study reference 3762.

Statement to confirm that all methods were carried out in accordance with relevant guidelines and regulations

We have adhered to best practice guidelines for Strengthening the Reporting of Observational Studies in Epidemiology (STROBE).


Different instruments to screen for depression do not identify the same persons, however no systematic differences exist between the different groups of persons.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Andrea Icks and Silke Andrich shared last authorship.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Linnenkamp, U., Gontscharuk, V., Ogurtsova, K. et al. PHQ-9, CES-D, health insurance data—who is identified with depression? A Population-based study in persons with diabetes. Diabetol Metab Syndr 15, 54 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: