U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Fink HA, Hemmy LS, Linskens EJ, et al. Diagnosis and Treatment of Clinical Alzheimer’s-Type Dementia: A Systematic Review [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2020 Apr. (Comparative Effectiveness Review, No. 223.)

Cover of Diagnosis and Treatment of Clinical Alzheimer’s-Type Dementia: A Systematic Review

Diagnosis and Treatment of Clinical Alzheimer’s-Type Dementia: A Systematic Review [Internet].

Show details

Chapter 4Key Question 1: Brief Cognitive Tests for Identifying CATD

Key Messages

  • Many brief cognitive tests had high (≥0.8) sensitivity and specificity for distinguishing between clinical Alzheimer’s-type dementia (CATD) and normal cognition in older adults. Of these, clock drawing, Mini-Mental State Exam (MMSE), list learning, and semantic [category] fluency were most frequently studied.
    • Cognitive tests were reported as more accurate distinguishing CATD or moderate CATD from normal cognition than they were for distinguishing CATD from mild cognitive impairment (MCI) or distinguishing mild CATD from normal cognition.
  • Most studies did not evaluate the accuracy of cut points suggested by prior work; this prevented validation of cut points, direct comparisons between studies, and pooling of data across studies.
  • Most studies that examined the accuracy of combinations of cognitive tests had high specificity for identifying CATD.
  • Few studies directly tested the effects of participant characteristics on the accuracy of brief cognitive tests for CATD, including whether accuracy varies by age, sex, race/ethnicity, education, or history of depression.
  • We found no data from eligible studies in older adults on the accuracy of several commonly used stand-alone tests and brief multidomain batteries for distinguishing CATD from MCI or normal cognition (e.g., Mini-Cog, Saint Louis University Mental Status [SLUMS], Telephone Interview for Cognitive Status [TICS]).

Eligible Studies

We identified 69 eligible publications reporting 65 unique studies that evaluated the accuracy of cognitive tests for identifying CATD. Nine studies were assessed as high risk of bias (ROB) and not used in our analyses. These excluded studies all had multiple concerns contributing to an overall rating of high ROB, commonly including patient selection, test not administered in English, index test definition or interpretation, interval between cognitive testing and CATD diagnosis, and participant attrition. The 56 remaining studies with low or medium ROB were analyzed.

Few studies evaluated the sensitivity or specificity of previously recommended brief cognitive test cut points, though these are not established for many tests. Instead, most calculated “optimal” cut points using data from their own samples to maximally separate diagnostic groups. Some reported cut points such as 1.5 or 2.0 standard deviations below their own cognitively normal comparison samples, to simulate the clinical practice of referencing an individual’s test performance to normative data. When available, we reported the classification accuracy of brief cognitive tests separately for clinically recommended and optimal cut points.

Characteristics of the participants with CATD, MCI, and normal cognition enrolled in the 56 analyzed studies are shown in Tables 4.1a-c. Most participants were diagnosed with mild to moderate CATD. Mean age was 74 years and approximately 41 percent of participants were male. Among the few studies that reported race or ethnicity data, most participants were white. Appendix C provides evidence tables, plots, and summary ROB assessments. For brief cognitive tests included in this review, scoring metrics, scoring range, direction indicating better performance and administration times are detailed in Appendix Table C.9.

Table 4.1a. Characteristics of participants with CATD in studies evaluating classification accuracy of brief cognitive tests for CATD versus MCI or normal cognition.

Table 4.1a

Characteristics of participants with CATD in studies evaluating classification accuracy of brief cognitive tests for CATD versus MCI or normal cognition.

Table 4.1b. Characteristics of participants with MCI in studies evaluating classification accuracy of brief cognitive tests for CATD versus MCI.

Table 4.1b

Characteristics of participants with MCI in studies evaluating classification accuracy of brief cognitive tests for CATD versus MCI.

Table 4.1c. Characteristics of participants with normal cognition in studies evaluating classification accuracy of brief cognitive tests for CATD versus normal cognition.

Table 4.1c

Characteristics of participants with normal cognition in studies evaluating classification accuracy of brief cognitive tests for CATD versus normal cognition.

Harms of Cognitive Testing

No studies reported data on harms of brief cognitive testing for identifying CATD. Further, among 30 identified systematic reviews of cognitive testing for dementia published since 2013, none reported data on harms of this cognitive testing.

Brief Cognitive Tests Commonly Used as Individual Stand-Alone Tests

Baseline Study Characteristics

Twenty-six unique studies (n=6,953) evaluated brief cognitive tests commonly used as individual stand-alone tests for the identification of CATD (Table 4.2). Ten of these evaluated clock drawing tests, seven evaluated the MMSE, three evaluated the Montreal Cognitive Assessment (MoCA), two evaluated the Memory Impairment Screen (MIS), one evaluated the 7 Minute Screen (7MS), one evaluated the Minnesota Cognitive Acuity Screen (MCAS), and one evaluated the Test Your Memory (TYM) test.

Study participants included 2,652 with CATD, 740 with MCI, and 3,561 healthy control older adults. Many studies reported using Diagnostic and Statistical Manual of Mental Disorders (DSM)-III-R or DSM-IV criteria for defining dementia. Most reported using National Institute of Neurological and Communicative Disorders and Stroke and Alzheimer’s Disease and Related Disorders (NINCDS-ADRDA) criteria for diagnosis of CATD, except for one that used only DSM-IV51 and one that used DSM-IV and International Classification of Disease (ICD)-10.52 None used National Institute on Aging-Alzheimer’s Association (NIA-AA) criteria. Participants with MCI were diagnosed using Petersen criteria,53 and/or by specifying a Clinical Dementia Rating (CDR) score of 0.5. Normal older adult control participants were most commonly defined as cognitively normal based on a diagnostic workup, though two studies defined it by patient self-report,54, 55 and two did not provide clear definitions.56, 57

Participant mean age was 74 years, 40 percent were male, and mean years of education was 14. From eight studies reporting race or ethnicity, 87 percent of participants were white.51, 54, 5763

Table 4.2. Summary of reported results for primary outcomes: brief cognitive tests commonly used as individual stand-alone tests.

Table 4.2

Summary of reported results for primary outcomes: brief cognitive tests commonly used as individual stand-alone tests.

Clock Drawing Tests

Eleven publications of 10 unique studies (n=1,177 participants) evaluated clock drawing tasks for identifying CATD.48, 49, 55, 56, 6672 Several methods are available for administration and scoring of clock drawing tests, most requiring five minutes or less. While all eligible studies evaluated tasks that required the subject to draw the numbers and hands of a clock, two provided a pre-drawn circle71, 72 and others provided only a blank sheet. Some scoring methods used a holistic scale requiring raters to score the clock on multiple features considered together.56, 72 Many scoring rubrics assigned points for specific elements of overall clock appearance as well as accuracy of both number and hand placement,48, 6771 and others used both types of scoring.49, 55, 66 Some scoring rubrics were newly generated, while others were based on or adapted from existing literature.69, 7277

Classification Accuracy

CATD Versus Normal Cognition

Nine studies evaluated clock drawing tests for distinguishing patients with CATD as defined by NINCDS-ADRDA criteria (n=419) from demographically similar or matched older adults with normal cognition (n=603).48, 49, 55, 56, 66, 68, 69, 71, 72 To define cognitively normal controls, four studies reported completion of a diagnostic workup 48, 49, 66, 72 and the remainder reported brief cognitive testing or a medical history interview.

The most commonly evaluated clock drawing scoring scale (n=136 CATD, n=327 normal controls)48, 49, 66, 68 was the 10-point Rouleau scale,74 which assigns 2 points for clock face, 4 points for numbers, and 4 points for hands. The best-performing Rouleau cut point for sensitivity (by Bayesian algorithm) was <10 (sensitivity 0.93, specificity 0.42). The best-performing Rouleau cut point for specificity (by regression analysis or receiver operating characteristic [ROC] analyses with or without Youden Index) was <8 (sensitivity 0.74 to 0.88, specificity 0.63 to 0.88). Two studies (n=58 CATD, n=58 normal controls)55, 56 evaluated the Sunderland scale, a 10-point holistic scale with a qualitative description provided for each point on the scale. Sensitivity and specificity ranged from 0.57 and 1.00, respectively, for a cut point of 5 to 0.79 and 0.93, respectively, for a cut point of 8. Several studies reported indices derived from Mendez scoring,55, 68, 69 with the best performing total score of 18 (sensitivity 0.91 and specificity 1.00). Single studies each evaluated the Shulman,56, 78#435#435 Tuokko,71 and Watson66, 73 scoring methods, with sensitivities ranging from 0.52 to 0.97 and specificities ranging from 0.80 to 0.96. Finally, three studies evaluated the Wolf-Klein72 scoring method,56, 66, 72 with cut points ranging from 5 to 8 and a cut point of 5 producing the highest overall classification, with sensitivity and specificity of 0.87 and 0.93, respectively.

CATD Versus MCI

Two studies evaluated clock drawing tests for distinguishing CATD (n=93) from MCI (n=62).67, 70 Both used the CLOX, an executive clock drawing task,75 and scores evaluated included CLOX 1 (free drawn), CLOX 2 (copy), and a modified Rouleau error scoring method.74 Both derived post hoc optimal cut scores that best distinguished the CATD and MCI groups from each other using ROC analyses. The optimal cut scores and corresponding sensitivity and specificity were: CLOX 1: 11.5 (0.76 and 0.72), 67 CLOX 2: 13.5 (0.67 and 0.62)67 and Rouleau error scoring: 11 (0.58 and 1.00).70

Variation in Classification Accuracy by Participant Characteristics

Three studies stratified CATD subjects by disease severity (i.e., very mild, mild, moderate) and separately evaluated clock drawing tests in distinguishing these three CATD subgroups from cognitively normal adults.55, 56, 66 In one of these studies, sensitivity distinguishing from normal controls was 0.33 to 0.44 for participants with very mild CATD, 0.77 to 0.82 for mild CATD, and 1.00 for moderate CATD.55 In a second study, sensitivity distinguishing from normal controls mostly ranged between 0.5 to 0.6 in participants with mild CATD and 0.8 to 1.0 in those with moderate CATD.66 In a third study, sensitivity of clock drawing was 0.13 to 0.88 in participants with MMSE ≥24 and 0.75 to 0.85 in participants with MMSE <24.56 Specificity for distinguishing from cognitively normal controls did not appear to vary by baseline CATD severity in any of these studies. No studies reported testing for an interaction. Further, no studies reported whether participant characteristics affected accuracy of clock drawing tests for distinguishing CATD from MCI.

Mini-Mental State Examination (MMSE)

Seven studies (n=1,892 participants) evaluated the performance of the MMSE79 total score for identifying CATD.51, 54, 57, 61, 64, 80, 81 The MMSE assesses orientation, attention, memory, language, and visual-spatial skills (maximum score 30, higher score is better, approximately 10 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

Seven studies evaluated MMSE scores for distinguishing patients with mild-to-moderate CATD (n=818) from older adults with normal cognition (n=906). Studies evaluated total score cut points from 21 to 27. Most often, cut points were determined post hoc to maximize separation of participants with CATD from those with normal cognition in the study sample (using ROC curves or logistic regression). Less often, studies examined cut points commonly used in clinical settings (e.g., 24). Sensitivity ranged from 0.56 to 1.00 and specificity ranged from 0.59 to 1.00, with most >0.75. No clear pattern suggested an optimal MMSE cut point within the studied range.

CATD Versus MCI

Two studies evaluated optimal post hoc MMSE cut points for distinguishing patients with mild-to-moderate CATD (n=435) from demographically similar older adults with MCI (n=169).51, 61 For one, the cut point of 25.5 had a sensitivity of 0.88 and specificity of 0.83,61 whereas for the other, the cut point of 18 had a sensitivity of 0.79 and specificity of 0.79.51

Variation in Classification Accuracy by Participant Characteristics

Three studies reported evaluating whether the accuracy of MMSE performance for distinguishing CATD from normal cognition varied by participant characteristics. In one study, optimal cut points for distinguishing CATD from normal cognition did not differ significantly as a function of age, gender, or education.54 A second study examined MMSE accuracy distinguishing CATD from normal cognition within separate strata of educational attainment.57 For an MMSE cut point of 24, sensitivity and specificity, respectively, were 1.0 and 0.59 for participants with a middle school education, 0.88 and 0.79 in those with a high school education, and 0.83 and 1.0 in those with a college education. A third study stratified CATD subjects by severity (i.e. mild vs. moderate) and reported that MMSE sensitivity for distinguishing CATD from normal cognition was 0.79 for mild CATD (MMSE cut point not reported) and 1.0 for moderate CATD (MMSE cut point 23).80 None of these studies tested whether differences in MMSE classification rates by CATD severity were statistically significant.

No studies reported examining whether MMSE performance in distinguishing between CATD and MCI varied by participant characteristics.

Montreal Cognitive Assessment (MoCA)

Three studies (n=1,482 participants) evaluated MoCA82 scores for identifying CATD.51, 60, 83 The MoCA is designed for the assessment of somewhat higher functioning patients than many other individual stand-alone tests (higher ceiling). It assesses attention and concentration, executive function, memory, language, visuoconstructional skills, conceptual thinking, calculations, and orientation (maximum score 30, higher score is better, approximately 10 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

Two studies51, 60 evaluated the MoCA for distinguishing patients with CATD (n=571) from those with normal cognition (n=293), with normal cognition confirmed by diagnostic evaluation. Both studies evaluated the traditional total score. One study reported results of MoCA classification using a combination of directly measured MoCA scores and MoCA scores estimated from MMSE.60 However, results reported here are based only on directly measured MoCA scores obtained by direct communication from the study authors. Post hoc optimal cut points of 22 and 23 produced sensitivity of 0.93 to 0.94 and specificity of 0.94 to 1.0. One study also evaluated a shortened version of the MoCA (maximum score 16) and reported that an optimal cut point of 10 had a sensitivity of 0.96 and specificity of 0.91.51

CATD Versus MCI

Three studies51, 60, 83 evaluated the MoCA for distinguishing CATD (n=671) from MCI defined in a manner consistent with the Petersen criteria53 (n=518). For post hoc optimal cut points of 19 to 24, sensitivities were 0.76 to 0.97 and specificities were 0.78 to 0.88. The one of these studies that evaluated a 16-point version of the MoCA reported a sensitivity of 0.67 and specificity of 0.79 for an optimal cut point of 6.51

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether MoCA test performance for distinguishing CATD from normal cognition or MCI varied by participant characteristics.

Memory Impairment Screen (MIS)

Two studies (n=712 participants) evaluated the performance of the MIS total score for identifying CATD.58, 59 The MIS consists of four items that evaluate memory with both free and cued recall (maximum score 8, higher score is better, less than five minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

Two studies evaluated MIS test performance for distinguishing patients with mild-to-moderate CATD (n=67) from older adults defined with normal cognition based on a diagnostic work-up (n=645). Studies evaluated both post hoc cut scores to maximize CATD prediction in the study sample (ROC analysis) and a priori cut points with suspected clinical relevance (e.g. Alzheimer’s Association recommendations). MIS total score cut points evaluated ranged from 0 to 8, with the best performing scores ranging from 2 to 4 depending on the severity of the CATD sample. At these cut points, sensitivity ranged from 0.75 to 1.0 and specificity ranged from 0.85 to 1.0.

CATD Versus MCI

No studies reported data on MIS test performance for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

Both studies reported on whether MIS test performance for distinguishing CATD from normal cognition varied by different participant characteristics. In each case, age, gender, education, and depression were tested but found to be non-significant.58, 59 No study reported statistical tests to evaluate variation in cognitive test accuracy by participant race/ethnicity, but one study reported that the frequency of false positive CATD classification with MIS did not differ between African American and white participants.58 Both studies also stratified CATD subjects by dementia severity, then separately evaluated the MIS for distinguishing each of these CATD subgroups from cognitively normal adults. In the first study, for an MIS cut point of 4, sensitivity and specificity, respectively, were 0.79 and 0.96 in participants with mild CATD and 0.95 and 0.96 in those with moderate CATD.58 In the second study, for an MIS cut point of 4, sensitivity was 0.75 in participants with very mild dementia (CDR 0.5), 0.81 in those with mild dementia (CDR 1.0), and 1.0 in those with moderate dementia (CDR 2.0), whereas specificity appeared similar regardless of dementia severity.59 Neither study tested whether differences in MIS classification rates by CATD severity were statistically significant.

Brief Alzheimer’s Screen (BAS)

One study (n=1,534 participants) evaluated the BAS weighted total score for identifying CATD.62 The BAS was developed from the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) neuropsychological evaluation data set and consists of items taken from the MMSE (date, 3-word recall, spelling ‘WORLD’ backwards) along with a 30-second semantic (animals) fluency evaluation (no maximum score, higher score is better, less than 5 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

The BAS was evaluated for distinguishing patients with mild CATD (n=674) from healthy older adults evaluated with diagnostic workup (n=860). Weighted sum score cut points ranging from 22 to 26 resulted in sensitivities from 0.90 to 0.98 and specificities from 0.96 to 0.99.

CATD Versus MCI

No studies reported data on the BAS for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether BAS test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Test Your Memory (TYM)

One study (n=376 participants) evaluated the TYM total score for identifying CATD.84 TYM is a self-administered and performance-based test consisting of 10 common cognitive testing tasks. The ability to independently complete the test is also a performance item and added to the total score (maximum of 50, higher score is better, no provider administration time, approximately 2 minutes to score).

Classification Accuracy

CATD Versus Normal Cognition

The TYM total score was evaluated for distinguishing patients with mild-moderate CATD (n=94) from healthy older adults (n=282). For the post hoc optimal cut point of 42, sensitivity was 0.93 and specificity was 0.86.

CATD Versus MCI

No studies reported data on the TYM for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether TYM diagnostic test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Minnesota Cognitive Acuity Screen (MCAS)

One study (n=150 participants) evaluated the MCAS85 total score63 for identifying CATD. The MCAS is telephone-administered and assesses orientation, attention, delayed recall, comprehension, repetition, naming, computation, judgment, and verbal fluency (no maximum score, higher score is better, approximately 15 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

No studies reported data on the MCAS test for distinguishing CATD from normal cognition.

CATD Versus MCI

This study evaluated the MCAS test for distinguishing patients with possible or probable mild CATD (n=50) from amnestic MCI defined by a diagnostic workup aligned with Petersen criteria53 (n=100). At the post hoc optimal cut point of 42.5, sensitivity was 0.86 and specificity was 0.77.

Variation in Classification Accuracy by Participant Characteristics

Analyses testing whether MCAS test performance for distinguishing CATD from MCI varied by participant age and education reported little improvement over base models and authors presented cut point data without adjustment.

7 Minute Screen (7MS)

One study (n=120 participants) evaluated the 7MS86 total score87 for identifying CATD. The 7MS assesses orientation, memory (cued recall), clock drawing, and verbal fluency (no maximum score, higher score is better, less than 10 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

This study evaluated the 7MS for distinguishing patients with mild to moderate CATD (n=60) from healthy older adults evaluated by medical history and self-reported as functionally independent (n=60). 7MS cut points for the total score were identified by maximizing CATD prediction in the study sample using logistic regression including weighted terms for each subtest score. Using model estimated probability of CATD of <0.1 as a cut point for controls and >0.9 as a cut point for CATD, sensitivity ranged from 0.92 to 1.0 and specificity ranged from 0.96 to 1.0 in initial and repeated random subsamples for validation.

CATD Versus MCI

No studies reported data on the 7MS for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

In a subset of AD patients with MMSE scores of ≥21 (n=95), using model probabilities of <0.1 and >0.9 as cut points, both sensitivity and specificity values were 0.98. In a subset of CATD patients with MMSE scores of ≥24 (n=13), sensitivity was 0.98 and specificity was 1.0. The study also evaluated participant age, gender, and education in the logistic regression predicting classification of CATD versus healthy older adults, but all were non-significant.

Brief Memory and Executive Test (BMET)

One study (n=102 participants) evaluated the BMET total score for identifying CATD.52 The BMET was developed to distinguish CATD from vascular cognitive impairment and consists of executive and memory tasks (maximum score 16, higher score is better, approximately 10 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

The BMET was evaluated for distinguishing patients with mild to moderate CATD (n=51) from healthy older adults (n=51). At the post hoc optimal cut point of 13, sensitivity was 0.86 and specificity was 1.00.

CATD Versus MCI

No studies reported data on the BMET for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether BMET diagnostic test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Brief Multidomain Batteries

Baseline Study Characteristics

Ten unique studies evaluated the performance of summary metrics from established brief multidomain batteries of cognitive tests for identifying CATD (Table 4.3). Three of these studies evaluated the Dementia Rating Scale (DRS), one evaluated Addenbrooke’s Cognitive Examination (ACE), one evaluated the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), one evaluated the CogState Brief Battery (CBB), one evaluated the CERAD Neuropsychological Battery, one evaluated the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), one evaluated the Wechsler Adult Intelligence Scale (WAIS), and one evaluated Wechsler Memory Scales (WMS).

Study participants included 864 patients with CATD, 276 with MCI, and 1,536 healthy control older adults. All studies reported using NINCDS-ADRDA criteria for diagnosis of CATD and none used NIA-AA criteria. Participants with MCI were diagnosed consistent with Petersen criteria.53 Normal older adult control participants were most commonly evaluated with a diagnostic workup and/or assessment sufficient to assign a CDR score of 0. One study defined participants as cognitively unimpaired through self-report.88

Participant mean age was 72 years, 44 percent were male, and mean years of education was 13. Six studies reported race or ethnicity, with five describing predominantly white samples61, 8891 and one reporting an all Asian (predominantly Chinese) sample.92

Table 4.3. Summary of reported results for primary outcomes: brief multidomain battery summary scores for distinguishing CATD from normal cognition.

Table 4.3

Summary of reported results for primary outcomes: brief multidomain battery summary scores for distinguishing CATD from normal cognition.

Dementia Rating Scale (DRS)

Three studies (n=936 participants) evaluated the DRS (also known as the Mattis DRS)93 for identifying CATD.89, 90, 94 The DRS is a brief battery of commonly used tasks designed to evaluate CATD in five domains: attention, initiation and perseveration, construction, conceptual ability, and memory (higher scores are better, approximately 30 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

Two studies evaluated DRS performance for distinguishing patients with mild to moderate CATD (n=372) from demographically similar older adults with normal cognition defined by a diagnostic workup (n=417).89, 94 Studies evaluated the DRS total score, subscale scores, and subscale combinations. All cut points were identified by maximizing CATD prediction from normal cognition in the study samples (logistic regression, ROC analysis). Optimal DRS total score cut points ranged from 129 to 132, with sensitivity ranging from 0.96 to 0.97 and specificity from 0.92 to 0.99.89, 94 In further analyses, one study reported optimal cutpoints with corresponding sensitivity and specificity for DRS subscales as follows: Attention (sensitivity 0.71 and specificity 0.84 for a cut point of 35), Conceptualization (sensitivity 0.69 and specificity 0.91 for a cut point of 33), Construction (sensitivity 0.73 and specificity 0.70 for a cut point of 6), Memory (sensitivity 0.93 and specificity 0.98 for a cut point of 22), and Initiation/Preservation (sensitivity 0.93 and specificity 0.94 for a cut point of 33).89 In data from two cohorts (n=641), a combined Memory and Initiation/Perseveration index (adjusted for age and education) was associated with sensitivity ranging from 0.91 to 0.98 and specificity from 0.93 to 0.98, respectively.89

CATD Versus MCI

One study90 evaluated the DRS for distinguishing between patients with CATD (n=49) and MCI determined by a diagnostic workup (n=98).90 At a post hoc optimal DRS total score cut point of 123, sensitivity was 0.78 and specificity was 0.83.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether DRS diagnostic test performance for distinguishing CATD from normal cognition or MCI varied by participant characteristics.90

Alzheimer’s Disease Assessment Scale-Cognitive (ADAS-Cog)

One study (n=269) evaluated ADAS-Cog95 total score92 for distinguishing CATD from normal cognition or MCI. Designed to emphasize memory evaluation in CATD, the original ADAS-Cog includes 11 tasks assessing memory, language, and praxis, and the ADAS-Cog-12 adds a delayed recall task intended to increase sensitivity for earlier stages of AD96 (lower scores are better, approximately 30 minutes administration time).

Participants’ CATD was defined as mild (CDR scores 0.5 to 1.0) and those with normal cognition and MCI had CDR scores of 0 and 0.5, respectively. Unique compared with other studies that examined brief multidomain batteries, 83 percent of participants in this study self-reported as Chinese, and 13 percent self-reported as Malay, Indian, Eurasian, or other. Testing was administered in English for 75 percent of participants, the minimum for study eligibility in the current review, and in Mandarin for 25 percent. Participants with CATD were approximately 6 to 10 years older than those with MCI or normal cognition.

Classification Accuracy

CATD Versus Normal Cognition

ADAS-Cog 11 and ADAS-Cog 12 total scores were evaluated for distinguishing individuals with CATD (n=64) from those with normal cognition (n=125). A post hoc optimal ADAS-Cog 11 cut point of 14 had a sensitivity of 0.81 and specificity of 1.0, and a post hoc optimal ADAS-Cog 12 cut point of 21 had a sensitivity of 0.73 and specificity of 1.0.

CATD Versus MCI

ADAS-Cog 11 and ADAS-Cog 12 total scores also were evaluated for distinguishing individuals with CATD (n=64) from those with MCI (n=80). A post hoc optimal ADAS-Cog 11 cut point of 12 had a sensitivity of 0.86 and specificity of 0.89, and a post hoc optimal ADAS-Cog 12 cut point of 21 had a sensitivity of 0.79 and a specificity of 0.89.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether ADAS-Cog test performance for distinguishing between CATD and either normal cognition or MCI varied by participant characteristics.

Cogstate Brief Battery (CBB)

The CBB is a computer-administered battery of four tasks assessing attention, processing speed, visual learning, and working memory (scoring characteristics are task and score dependent, higher summary scores are better, approximately 12 to 15 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

One study (n=710) evaluated two CBB composite summary scores for distinguishing mild to moderate CATD (n=51) from normal cognition (n=659).97 Optimal cut scores were identified by maximizing CATD prediction in the study sample using ROC analysis. A post hoc optimal cut point for the Attention/Psychomotor composite summary score of <90 had a sensitivity of 0.53 and specificity of 0.86. A post hoc optimal cut point for the CBB Learning/Working Memory composite summary score of <90, evaluated in a subset of 684 participants, had a sensitivity of 1.0 and specificity of 0.85.

CATD Versus MCI

No studies reported data on the CBB for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether CBB test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) Neuropsychological Battery

One study (n=250 participants) evaluated a total score derived from the CERAD Neuropsychological Battery98 for identifying CATD.61 The CERAD was developed with National Institute on Aging (NIA) support to standardize assessment procedures in AD. The CERAD Battery includes assessment of mental status, language ability, constructional praxis, and memory. The CERAD total score evaluated included all tasks in the original CERAD battery (verbal fluency, list-learning, constructional praxis, a brief Boston Naming Test [BNT]) except object naming, and the MMSE (maximum score 100, higher score is better, approximately 20 minutes administration time). Patients with mild CATD were compared with demographically similar patients with MCI,53 and healthy normal controls defined by a CDR score of 0.

Classification Accuracy

CATD Versus Normal Cognition

For distinguishing individuals with CATD (n=95) from those with normal cognition (n=95), a post hoc optimal cut point of 77 for this CERAD total score had a sensitivity of 0.94 and specificity of 0.93.

CATD Versus MCI

For distinguishing individuals with CATD (n=95) from those with MCI (n=60), a post hoc optimal cut point of 68 for this CERAD total score had a sensitivity of 0.80 and specificity of 0.81.

Variation in Classification Accuracy by Participant Characteristics

No studies reported data on whether CERAD Neuropsychological Battery test performance for distinguishing CATD from either normal cognition or MCI varied by participant characteristics.

Repeatable Battery for the Assessment of Neuropsychological Status (RBANS)

One study (n=238) evaluated performance of the RBANS for identifying CATD.88, 99 The RBANS is a brief battery assessing attention, language visuospatial/construction, and memory, designed with multiple forms to be used in repeated assessment (higher scores are better, administration time approximately 30 minutes). The RBANS scores evaluated included a Verbal Index, Visual Index, and a combined Verbal plus Visual Index score (index scores have a mean of 100 and SD of 15). Patients with mild CATD, as defined by cognitive testing and clinical records, were compared with patients with MCI,53 and older adults who self-reported normal cognition.

Classification Accuracy

CATD Versus Normal Cognition

For distinguishing individuals with CATD (n=100) from those with normal cognition (n=100), unspecified post hoc optimal RBANS cut points had sensitivities and specificities, respectively, of 0.88 and 0.82 for the Verbal Index, of 0.86 and 0.77 for the Visual Index, and 0.92 and 0.79 for the combined Verbal plus Visual Index.

CATD Versus MCI

For distinguishing individuals with CATD (n=100) from those with MCI (n=38), unspecified post hoc optimal RBANS cut points had sensitivities and specificities, respectively, of 0.61 and 0.71 for the Verbal Index, 0.68 and 0.76 for the Visual Index, and 0.66 and 0.75 for the combined Verbal plus Visual Index.

Variation in Classification Accuracy by Participant Characteristics

No studies reported data on whether RBANS test performance for distinguishing CATD from either normal cognition or MCI varied by participant characteristics.

Wechsler Adult Intelligence Scale (WAIS)

One study (n=98 participants) evaluated a WAIS-derived summary score for identifying CATD.100 The WAIS is a battery of tests designed to evaluate general intellectual ability that also provides domain summary indices. The WAIS battery administration time is not brief cognitive testing, but abbreviated summary metrics may be administered in 30 minutes or less.

Classification Accuracy

CATD Versus Normal Cognition

This study evaluated the “Fuld profile,” a seven subtest index found to be associated with cholinergic deficiency and CATD (profile scored as yes/no, approximately 30 minutes administration time for included subtests).101 Test performance consistent with the Fuld profile (yes or no) using the WAIS-R was used to distinguish between mild to moderate CATD (n=44) and normal cognition (n=54) and had a sensitivity of 0.07 and specificity of 0.93.100

CATD Versus MCI

No studies reported data on the WAIS for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether WAIS diagnostic test performance for distinguishing between CATD and normal cognition varied by participant characteristics.

Wechsler Memory Scales (WMS)

One study (n=68) evaluated WMS-derived scores for identifying CATD.91 The WMS is a battery of tests producing various index scores to characterize memory ability/dysfunction (auditory versus visual, immediate versus delayed, etc.). The full WMS battery administration time is not brief cognitive testing, but abbreviated summary metrics may be administered in 30 minutes or less.

Classification Accuracy

CATD Versus Normal Cognition

This study evaluated WMS-III derived index scores for distinguishing mild to moderate CATD (n=34) from normal cognition (n=34) and asked whether classification ability is better using the index alone versus when compared with a measure of general intellectual ability (the WAIS-III General Ability Index; GAI). The WMS-III index scores evaluated were the General Memory Index (GMI), Immediate Memory Index (IMI), and the Delayed Memory Index (DMI) (index scores have a mean of 100 and SD of 15, higher scores are better). For each WMS-III memory index score, classification was tested 1) using the index alone, 2) using a simple difference score between the memory index and the GAI, 3) using a memory index-GAI difference score stratified by GAI, and 4) using a memory index difference score that took into account the participant’s predicted memory ability from the GAI. Each of the four classification methods was assessed using a 5th and 10th age-based normative percentile cut point. Most resulting scores produced classification with sensitivity and specificity values at 0.70 or above with many much higher. In each case, optimal classification was achieved using only the WMS-III index score and not including the WAIS-III GAI. GMI classification resulted in sensitivities of 0.94 and 0.97 and specificities of 0.97 and 0.91, respectively, for the 5th and 10th percentiles. IMI classification resulted in sensitivities of 0.85 and 0.94 and specificities of 1.00 and 0.94, respectively, for the 5th and 10th percentiles. DMI classification resulted in sensitivities of 0.88 and 0.97 and specificities of 0.97 and 0.97, respectively, for the 5th and 10th percentiles.

CATD Versus MCI

No studies reported data on the WMS for distinguishing between CATD and MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether WMS test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Addenbrooke’s Cognitive Exam (ACE)

One study (n=59 participants) evaluated the ACE102, 103 for identifying CATD.104 The ACE is a very brief battery consisting of several brief tasks to assess attention, memory, fluency, language, and visuospatial abilities (maximum score 100, higher score is better, approximately 15-20 minutes administration time).

Classification Accuracy

CATD Versus Normal Cognition

This study evaluated the ACE-III total score for distinguishing individuals with early onset CATD (age ≤65 years; n=31) from otherwise undefined healthy controls (n=28).104 A post hoc optimal cut score of 88 had a sensitivity of 0.97 and specificity of 0.96.

CATD Versus MCI

No studies reported data on the ACE for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether ACE diagnostic test performance for distinguishing CATD from normal cognition varied by participant characteristics.

Memory Tests

Baseline Study Characteristics

Seventeen unique studies evaluated the performance of eligible memory tests for identifying CATD, including memory for word lists, prose, figure drawings, and common objects (Table 4.4). These included 13 studies that evaluated performance on list-learning tasks, four that evaluated prose recall, two that evaluated figure recall, and four that evaluated other memory tests.

Study participants included 1,341 with CATD, 242 with MCI, and 2,478 healthy control older adults. All studies reported using NINCDS-ADRDA criteria for diagnosis of CATD with the exception of one study which described DSM-IV dementia criteria and a neurologist assigned subtype.105 Participants with MCI were diagnosed consistent with Petersen criteria.53 Normal older adult control participants were most commonly evaluated with a diagnostic workup and/or assessment sufficient to assign a CDR score of 0. Two studies described control participants as self-reporting that they were cognitively unimpaired.54, 85

Participant mean age was 74 years, 43 percent were male, and mean years of education was 13. Five studies reported race or ethnicity data, with four describing predominantly white samples54, 61, 106 107 and one reporting an all Asian (predominantly Chinese) sample.92

Table 4.4. Summary of reported results for primary outcomes: memory.

Table 4.4

Summary of reported results for primary outcomes: memory.

List Learning

Fifteen publications of 13 studies (n=3,084)48, 50, 54, 61, 67, 81, 85, 92, 105109, 111, 112 evaluated list-learning verbal memory tests for distinguishing CATD from normal cognition or MCI. Most list-learning procedures include four broad categories of performance: initial efforts to learn a list of words during multiple presentations (immediate recall trials), recall of the list at some later time often with alternate tasks in between (delayed recall), the ability to identify words on the list from distractors (recognition), and evaluation of errors made during the prior tasks.

Classification Accuracy

CATD Versus Normal Cognition

Thirteen publications of 11 studies evaluated list-learning performance for distinguishing patients with CATD (n=676) from demographically similar or matched older adults with normal cognition (n=2,038).48, 50, 54, 61, 81, 85, 92, 105, 107109, 111, 112 Three studies evaluated various scores for the CERAD word list,98 including delayed recall, percent retention (savings), and recognition metrics. Cut scores were identified by maximizing diagnosis group separation within the study sample using ROC analysis or by comparing performance to control group norms (most commonly 2 standard deviations [SD] below the control group mean). CERAD list delayed recall scores, using cut points at 4, 4.5 and 2 SD below control performance, respectively, had sensitivity ranging from 0.86 to 0.93 and specificity ranging from 0.84 to 0.94.48, 61, 108, 109 One study reported that CERAD list delayed recall percent retention (savings) scores at a cut point of 66 percent had a sensitivity of 0.88 and specificity of 0.82.48 Three studies reported various CERAD list recognition scores, with sensitivity ranging from 0.25 to 0.60 and specificity ranging from 0.91 to 0.98.48, 108, 109 Finally, two studies reported group discrimination for CERAD intrusion error scores, with sensitivity ranging from 0.14 to 0.62 and specificity ranging from 0.78 to 0.96.50, 109

Two studies evaluated the Free and Cued Selective Reminding test (FCSR) for distinguishing CATD from normal cognition.105, 112, 113 Both reported results for free recall, one reporting that an unspecified post hoc optimal cut point had a sensitivity of 0.91 and specificity of 0.73,112 and the other that a cut point of 24 cited from prior work114 also reported that an FCSR total recall cut point of 44 cited from prior work had a sensitivity of 0.71 and specificity of 0.94.

The other five studies each evaluated scores for different word lists. Three studies reported immediate list recall and learning trial totals, including the CogState International Shopping List Test (ISLT),111 Hopkins Verbal Learning Test (HVLT),54 and Neuropsychological Assessment Battery (NAB) word list.107 The best-performing post hoc optimal cut scores for these measures had sensitivity ranging from 0.78 to 0.92 and specificity ranging from 0.75 to 0.95.

Three studies reported delayed recall scores, including the DemTect81 Delayed Word Recall (DWR) test85 and NAB word list.107 The best-performing post hoc cut scores for these measures had sensitivity ranging from 0.89 to 0.93 and specificity ranging from 0.76 to 0.98. One study reported on performance of a combined metric from the ADAS-Cog (immediate recall, delayed recall, and recognition) in which a cut point of >14 had a sensitivity and specificity of 0.77 and 0.98, respectively.92

CATD Versus MCI

Five studies61, 67, 92, 106, 107 evaluated list-learning performance for distinguishing individuals with CATD (n=313) from similar or matched older adults with MCI (n=242). No two studies evaluated the same list learning test or scores, but four of the five included traditional measures of recall or retention.61, 67, 106, 107 One reported that an optimal performing cut point of 2 on the CERAD list delayed free recall score had a sensitivity of 0.68 and specificity of 0.81.61 A second study reported that an optimal performing cut point of 15 on the HVLT three trials total score had a sensitivity of 0.69 and specificity of 0.91.67 A third study reported both best-performing and conventional cut points for several scores from the NAB list.107 Optimal performing cut points, as determined by ROC analysis, and associated sensitivity and specificity, respectively, were as follows: 30 for list A immediate recall (0.58 and 0.86), 41 for list B immediate recall (0.65 and 0.72), 30 for list A short delay (0.73 and 0.83), and 36 for list A long delay (0.89 and 0.52). Another study reported that <30 percent retention scores for the RBANS list had a sensitivity of 0.90 and specificity of 0.72.106 Last, one study reported on performance of a combined metric from the ADAS-Cog (immediate recall, delayed recall, and recognition), and reported that the cut point of 14 had a sensitivity and specificity of 0.76 and 0.85, respectively.92, 106

Variation in Classification Accuracy by Participant Characteristics

In one study, optimal HVLT cut points for distinguishing between CATD and normal cognition did not differ significantly as a function of age, gender, or education.54 A second study stratified CATD subjects by severity (i.e. mild, moderate, or severe CATD), then separately evaluated CERAD list recognition scores for distinguishing each of these CATD subgroups from cognitively normal adults; however, this study did not statistically test for different classification rates.108, 109 For most scores reported, specificity was consistently high across severity groups, while sensitivity was generally higher in more severely impaired participants.

No studies reported on whether list learning test performance for distinguishing CATD from MCI varied by participant characteristics.

Prose Recall

Four studies (n=1,012) evaluated the performance of prose recall tasks (repeating short stories or paragraphs from memory) for identifying CATD. Three studies compared individuals with mild to moderate CATD to older adults with normal cognition94, 112, 115 and one compared individuals with CATD to those with amnestic MCI.106, 94, 106, 115 All cut points evaluated were post hoc and no two studies evaluated the same score.

Classification Accuracy

CATD Versus Normal Cognition

Three studies94, 112, 115 evaluated prose recall for distinguishing individuals with CATD (n=220) from those with normal cognition (n=675) using the WMS Logical Memory (LM) subtest.116, 117 The first study evaluated several scores based on propositional content (breaking down text into small units of meaning) with sensitivities and specificities ranging from 0.75 to 0.84 and 0.81 to 0.89, respectively.115 The other two studies both evaluated the delayed recall score from the WMS-Revised LM subtest. In one, a post hoc optimal cut point of <10 had a sensitivity of 0.87 and specificity of 0.89,94 and in the other, an unspecified cut point had a sensitivity of 0.71 and specificity of 0.87.112

CATD Versus MCI

One study106 evaluated prose recall for distinguishing CATD (n=73) from amnestic MCI (n=44) using the RBANS Story Memory subtest.99 At a cut score of below 60 percent retention (savings), sensitivity was 0.85 and specificity was 0.55.

Variation in Classification Accuracy by Participant Characteristics

No studies reported data on whether prose recall test performance for distinguishing CATD from either normal cognition or MCI varied by participant characteristics.

Figure Recall

Three publications of two studies (n=447)48, 50, 94 evaluated the performance of figure recall tasks (most commonly reproducing designs from memory by drawing them on paper) for identifying CATD.

Classification Accuracy

CATD Versus Normal Cognition

Two studies evaluated the WMS116, 117 Visual Reproduction (VR) subtest for distinguishing individuals with CATD (n=127) from those with normal cognition (n=320). Best-performing cut points were identified by maximizing CATD prediction in the study samples (ROC, discriminant function). In one study, an optimal immediate recall cut point of <9 had a sensitivity of 0.90 and specificity of 0.79; an optimal savings score of <30 percent had a sensitivity of 0.74 and specificity of 0.93, and an optimal figural intrusions score of >0 had sensitivity of 0.27 and specificity of 0.82.48, 50 For delayed figure recall, this study reported sensitivity of 0.87 and specificity of 0.87 for an optimal cut point of 2,48, 50 while another study reported sensitivity of 0.87 and specificity of 0.86 for an optimal cut point of 3.94

CATD Versus MCI

No studies reported data on figure recall test performance for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether figure recall test performance for distinguishing CATD from either normal cognition or MCI varied by participant characteristics.

Other Memory Tests

Four studies (n=1,206) evaluated the performance of other eligible memory tasks for identifying CATD,67, 110, 112, 118 although no two studies reported data for the same test.

Classification Accuracy

CATD Versus Normal Cognition

One study (n=127) evaluated combined verbal (prose recall) and visuospatial (figure recall) memory tasks for distinguishing participants with mild to moderate CATD (n=58) from healthy older adults defined with normal cognition by medical history (n=69).118 A post hoc optimal cut point for combined percent retention (savings) scores for delayed recall performance on the WMS-Revised LM and VR subtests had a sensitivity of 0.88 and specificity of 0.99. A second study (n=583) evaluated a recollection estimate (as opposed to familiarity) from a dual process dissociation procedure for distinguishing mild CATD (n=64) from CDR-defined normal control participants (n=519).112 An unreported post hoc optimal cut point had a sensitivity of 0.77 and specificity of 0.86.

CATD Versus MCI

One study (n=68) evaluated the Placing Test, a visuospatial memory test, for distinguishing between CATD (n=40) and MCI (n=28).67 A post hoc optimal cut point for total score of 10.5 had a sensitivity of 0.90 and specificity of 0.50, a cut point for objects of 6.5 had a sensitivity of 0.80 and specificity of 0.71, and a cut point for faces of 5.5 had a sensitivity of 0.68 and specificity of 0.68.

Variation in Classification Accuracy by Participant Characteristics

One study (n=412 participants) reported age-stratified results for distinguishing between mild-moderate CATD (n=268) and normal cognition (n=144) with the Fuld Object Memory Evaluation (FOME),119, 120 a test of memory and tactile recognition.110 Total recall scores were evaluated for age groups 59 to 68, 69 to 78, and 79 to 90. Cut scores of 17, 18, and 19 had sensitivity ranging from 0.93 to 0.95 and specificity ranging from 0.94 to 1.00. Best-performing cut scores were 19 in the youngest group and 18 in the 79 to 90 age group. Cut scores of 17 and 18 performed equally well for the 69 through 78 age group. The study did not report on whether differences in FOME classification between CATD and normal cognition by participant age were statistically significant.

Tests of Executive Function

Baseline Study Characteristics

Five unique studies evaluated the performance of eligible tests of executive function, including complex trail and coding tasks, design fluency, and conceptual rule attainment (rule learning and switching) tasks, for identifying CATD (Table 4.5). These included three studies that evaluated part B of the Trail Making Test (TMT), one that evaluated the Wisconsin Card Sorting Test (WCST), one that evaluated the Digit Symbol substitution task, and one that evaluated the Graphic Pattern Generation Test (GPGT) performance.

Study participants included 394 patients with CATD, 200 with MCI, and 573 healthy control older adults. All studies reported using NINCDS-ADRDA criteria for the diagnosis of CATD and none used NIA-AA criteria. Participants with MCI were diagnosed consistent with Petersen criteria.53 Normal older adult control participants were most commonly evaluated with a diagnostic workup or some combination of history and brief cognitive assessment. One study described control participants as self-reporting they were cognitively unimpaired.121

Participant mean age was 76 years, 43 percent were male, and mean years of education was 15 years. Only one study reported race or ethnicity data, in which 69 percent of participants were white and 31 percent were black.122

Table 4.5. Summary of reported results for primary outcomes: executive function.

Table 4.5

Summary of reported results for primary outcomes: executive function.

Trail Making Test (TMT) Part B

Three studies (n=736) evaluated the TMT part B123, 124 for distinguishing mild to moderate CATD from either normal cognition or MCI.48, 94, 122 In the TMT part B, mental flexibility is assessed by asking participants to quickly draw lines between circles with ascending numbers and letters, alternating between the two.

Classification Accuracy

CATD Versus Normal Cognition

Two studies evaluated TMT part B time to completion in seconds for distinguishing individuals with CATD (n=143) from those with workup confirmed normal cognition (n=336).48, 94 Each identified an optimal cut point by maximizing CATD prediction in the study sample using ROC analysis. For the first study, for an optimal cut point of >172 seconds, sensitivity was 0.87 and specificity was 0.88.48 For the second study, for an optimal cut point of >130 seconds, sensitivity was 0.85 and specificity was 0.83.94

CATD Versus MCI

One study also evaluated TMT part B for distinguishing individuals with CATD (n=57) from those with MCI defined by Petersen criteria53 (n=200).122 ROC analysis was used to determine optimal cut points for several combinations of performance time and errors. These included completion time in seconds (cut point z score ‑1.0 compared with normative data had sensitivity 0.53 and specificity 0.57), number of errors (cut point >1 had sensitivity 0.72 and specificity 0.41), a combination of time and errors (sensitivity 0.44 and specificity 0.67), and a combination of time or errors (sensitivity 0.81 and specificity 0.31).

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether TMT part B test performance for distinguishing between CATD and either normal cognition or MCI varied by participant characteristics.

Digit Symbol Substitution

One study (n=283) evaluated a test of digit symbol substitution for distinguishing CATD from normal cognition.48 The Wechsler Adult Intelligence Scale Revised (WAIS-R)117 Digit Symbol subtest was analyzed for distinguishing individuals with mild to moderate CATD (n=45) from older adults with normal cognition (n=238). In the Digit Symbol task, processing speed and divided attention are assessed by asking participants to quickly re-code a sheet of numbers into abstract symbols based upon a provided key of digit/symbol pairs.

Classification Accuracy

CATD Versus Normal Cognition

This study defined an optimal cut point for the WAIS-R Digit Symbol subtest total score by maximizing CATD prediction in the study sample using ROC analysis. For an optimal cut point of <34, sensitivity was 0.95 and specificity was 0.67.

CATD Versus MCI

No studies reported data on digit symbol substitution test performance for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether digit symbol substitution test performance for distinguishing CATD from either normal cognition or MCI varied by participant characteristics.

Tests of Design/Figure Fluency

One study (n=277) evaluated the performance of a test of figural fluency for distinguishing CATD from normal cognition.125 This study evaluated the GPGT,126, 127 a test of design fluency in which participants must draw as many different designs as possible within a set of parameters, but without time limits. Patients with mild to moderate CATD (n=110) were compared with demographically similar older adults with normal cognition (n=167; defined by MMSE ≥27).

Classification Accuracy

CATD Versus Normal Cognition

This study evaluated GPGT scores for row 1 perseverations (repeated figure design errors) and row 1 unique figure designs. Optimal cut points were identified by maximizing CATD prediction in the study sample using ROC analysis. A cut point of 4 on perseverations in row 1 was associated with a sensitivity of 0.76 and specificity of 0.37, and a cut point of 15 unique designs in row 1 was associated with a sensitivity of 0.81 and a specificity of 0.36.

CATD Versus MCI

No studies reported data on figure fluency test performance for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

No studies reported on whether GPGT diagnostic test performance for distinguishing CATD from either normal cognition or MCI varied by different participant characteristics.

Wisconsin Card Sorting Test (WCST)

One study (n=162) evaluated the performance of the WCST,128, 129 a test of abstraction and mental flexibility for distinguishing between CATD and normal cognition.121 Participants completing the WCST are asked to sort cards by color, shape, or number according to rules that change once a pattern has been established, requiring them to identify implicit rules and infer when they have changed. In this study, the performance of a modified version of the WCST was evaluated for distinguishing between mild to moderate CATD (n=87) and self-reported normal cognition (n=75) in older adults.

Classification Accuracy

CATD Versus Normal Cognition

Optimal cut points for WCST non-perseverative errors, perseverative errors, and the number of categories achieved were identified by maximizing CATD prediction in the study sample using ROC analysis. A cut point of >15 non-perseverative errors had a sensitivity of 0.58 and specificity of 0.84. A cut point of >5 perseverative errors had a sensitivity of 0.76 and specificity of 0.93. A cut point of <5 categories had a sensitivity of 0.93 and specificity of 0.82.

CATD Versus MCI

No studies reported on WCST performance for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

In a subgroup analysis evaluating the WCST for distinguishing between mild CATD (defined by DRS >120; n=27) and normal cognition, results appeared similar to those for the comparison between participants with mild to moderate severity and those with normal cognition. In this subgroup, >15 non-perseverative errors had a sensitivity of 0.48 and specificity of 0.84, >5 perseverative errors had a sensitivity of 0.74 and specificity of 0.93, and <5 categories had a sensitivity of 0.83 and specificity of 0.81. However, no statistical tests for interaction by CATD severity were reported.

Language Tests

Baseline Study Characteristics

Twelve publications of 10 unique studies evaluated eligible language tests, including tests of verbal fluency and confrontation naming, for distinguishing CATD from normal cognition (Table 4.6).48, 50, 66, 67, 81, 94, 106, 108, 109, 130132 These included 3 studies that evaluated the BNT and 10 that evaluated various types of verbal fluency tasks.

Study participants included 751 patients with CATD, 29 with MCI, and 896 healthy control older adults. All studies reported using NINCDS-ADRDA criteria for diagnosis of CATD and none used NIA-AA criteria. Participants with MCI were diagnosed consistent with Petersen criteria.53 Normal older adult control participants were evaluated with a diagnostic workup, except one study that described control participants as defined by medical history.132

Mean participant age was 74 years, 44 percent were male, and mean years of education was 13. The only study that reported race or ethnicity data was 98 percent white.132

Table 4.6. Summary of reported results for primary outcomes: language.

Table 4.6

Summary of reported results for primary outcomes: language.

Tests of Verbal Fluency

Ten publications of nine unique studies (n=1,58648, 50, 66, 81, 94, 109, 130133 evaluated tests of verbal fluency for distinguishing between individuals with mild to moderate CATD and older adults with normal cognition. Verbal fluency tests assess both language and executive functions. Most commonly, participants are asked to provide as many words as possible within one minute that fall into a known category (semantic fluency) or begin with a specific letter (phonemic fluency).

Classification Accuracy

CATD Versus Normal Cognition
Semantic (Category) Fluency

All nine studies evaluated semantic (category) fluency tasks. Four studies evaluated classification metrics specifically for the naming of animals.66, 109, 130, 132 Most reported data for cut points identified by maximizing CATD prediction in the study sample using either logistic regression or ROC analyses.

For optimal cut points ranging from 12 to 16, sensitivity ranged from 0.73 to 0.92 and specificity ranged from 0.87 to 1.00. Only one study identified a cut point based upon the commonly used clinical threshold of 2 SD below the control group performance mean, for which sensitivity ranged from 0.35 to 0.94, with lower values for groups with less severe cognitive impairment, and specificity was 1.0.109 Three studies evaluated classification metrics for the combined total of animals, fruit, and vegetable naming,94, 131, 132 with each reporting cut points identified by maximizing CATD prediction in the study sample. For optimal cut points ranging from 28 to 38, sensitivity ranged from 0.93 to 1.00 and specificity ranged from 0.88 to 1.00. Two studies evaluating the naming of items found in a supermarket81, 132 reported that sensitivity and specificity for unspecified best-performing cut points ranged between 0.92 to 0.93 and 0.81 to 0.97, respectively. One of these studies also reported that sensitivity and specificity, respectively, for unspecified optimal cut points was 0.96 and 0.89 for naming fruits, 0.96 and 0.87 for naming vegetables, and 0.94 to 0.96 and 0.87 to 0.92 for first names.132 Finally, one study reported classification with modeling of combined semantic fluency scores (including correct responses, perseveration errors, intrusion errors, response clustering and switching) to maximize diagnostic assignment.133 A model restricted to correct responses and errors had a sensitivity of 0.90 and specificity of 0.89. A second model adding response clustering and switching to the first model produced a sensitivity of 0.93 and specificity of 0.95.

Phonemic (Letter) Fluency

Four studies evaluated phonemic (letter) fluency tasks.48, 130132 All cut points were identified by maximizing CATD prediction in the study sample using ROC analysis. Two evaluated the task of naming words beginning with the letter A.130, 132 For one of these studies, a cut point of <13 had a sensitivity of 0.76 and specificity of 0.74,130 while for the other study, a cut point of <7 had a sensitivity of 0.72 and specificity of 0.93.132 One of these studies also reported sensitivity and specificity for the optimal cut points for naming F-words (cut point <9 had a sensitivity of 0.79 and specificity of 0.87), for naming S-words (cut point <11 had a sensitivity of 0.87 and specificity of 0.87), and for the combined total of F, A, and S-word tasks (cut points 30 to 31 had sensitivity ranging from 0.87 to 0.89 and specificity ranging from 0.85 to 0.92).132 A third study evaluated the combined total of C, F, and L-word tasks and reported that for an optimal cut point of 25, sensitivity was 0.73 and specificity was 0.78.131

Combined Semantic and Phonemic Fluency

Finally, three studies50, 130, 131 evaluated metrics that combined semantic and phonemic fluency performance. One evaluated the difference between semantic and phonemic fluency (number of animals named minus number of F-words named), and reported that for an optimal cut point of ‑1, sensitivity was 0.53 and specificity was 0.96.130 The second study also evaluated difference scores (number of words with a given letter named minus number of words from a given category named) and reported a non-significant odds ratio for prediction of classification and no further information.131 The third study reported that for a combined proportion of intrusion errors (incorrect words produced), at an optimal cut point of 0, sensitivity was 0.39 and specificity was 0.87. For a combined proportion of perseverative errors (repeated responses), at an optimal cut point of 2, sensitivity was 0.67 and specificity was 0.52.50

CATD Versus MCI

No studies reported data on tests of verbal fluency for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

Four studies66, 109, 130, 132, 134 stratified CATD subjects by clinical severity (i.e., mild, moderate, and severe) and evaluated verbal fluency tests for distinguishing cognitively normal participants from individuals in different CATD severity categories. However, none statistically tested whether optimal cut points or diagnostic accuracy for verbal fluency tests differed between these different CATD severity subgroup-normal cognition comparisons. One study evaluated verbal fluency for distinguishing between CATD and normal cognition stratified by sex, but did not test whether differences in classification rates by sex were statistically significant.

Boston Naming Test (BNT)

Four publications of three studies (n=542)48, 50, 67, 109 evaluated the 15 and 30-item versions of the BNT,135 a commonly used test of confrontation naming in which participants are asked to name common objects from line drawings.

Classification Accuracy

CATD Versus Normal Cognition

Two studies compared individuals with CATD (n=192) to demographically similar or matched older adults with normal cognition (n=287).48, 50, 109 Studies evaluated a variety of BNT versions and scores with no overlap between studies. At an optimal cut point of ≤22 on the 30-item BNT, as determined with ROC analysis, one study reported a sensitivity of 0.75 and a specificity of 0.85.48, 50 Using alternate scoring methods based on semantic (concept) and lexical (word) naming errors, sensitivity ranged from 0.50 to 0.74 and specificity ranged from 0.70 to 0.72.50

CATD Versus MCI

One study evaluated the BNT for distinguishing between CATD (n=36) and Petersen criteria MCI53 (n=27).67 At an optimal cut point of ≤21 on the 30-item BNT, as determined using ROC analysis in the study sample, sensitivity was 0.64 and specificity was 0.81.

Variation in Classification Accuracy by Participant Characteristics

One study evaluated the 15-item BNT total score for distinguishing CATD from normal cognition using a cut point at 2 SD below the control sample mean in separate CATD severity strata.109 In participants with mild, moderate, and severe CATD, sensitivities were 0.53, 0.55, and 0.84, respectively, while specificity was 0.92 for each CATD severity group. However, the study did not test whether differences in sensitivity by group were statistically significant. No studies reported data on whether BNT performance for distinguishing CATD from MCI varied by participant characteristics.

Test Combinations

Baseline Study Characteristics

Ten publications of nine eligible studies evaluated test combinations for identifying CATD (Table 4.7). These included three studies that evaluated adding an additional test to the MMSE or MIS,56, 80, 136 and six studies that evaluated other test combinations.48, 50, 94, 109, 137139

Most studies compared individuals with mild to moderate CATD by NINCDS-ADRDA criteria with older adults with normal cognition as confirmed by a diagnostic workup. Exceptions included one study that defined dementia by CDR and physician diagnosed CATD,139 two that defined normal cognition by CDR,137, 139 and one that did not report methods for establishing normality.56 None used NIA-AA criteria.

Participant mean age was 76 years and 38 percent of participants were male. From three studies reporting, race/ethnicity data were predominantly white (93-100%) in two studies137, 139 and slightly over half African American in a third study.136

Table 4.7. Summary of reported results for primary outcomes: test combinations.

Table 4.7

Summary of reported results for primary outcomes: test combinations.

Supplementing Brief Stand-Alone Cognitive Tests

Two studies (n=204) evaluated a test protocol that supplemented use of the MMSE with another commonly used test56, 80 to identify CATD and one study (n=295) evaluated combining the MIS with semantic (category) verbal fluency.136

Classification Accuracy

CATD Versus Normal Cognition

One study evaluated combining the MMSE with the clock drawing task.56 With the MMSE cut point kept constant at 23, the clock drawing scoring method was varied. For double failure (both the MMSE and clock drawing), sensitivity ranged from 0.36 to 0.50 and specificity was 1.00. For single failure (either MMSE or clock drawing), sensitivity ranged from 0.86 to 0.96 and specificity ranged from 0.96 to 1.00. A second study that evaluated combining the MMSE with phonemic (letter) verbal fluency only presented results stratified by CATD severity, which are detailed below.80 A third study evaluated the combination of MIS and a semantic verbal fluency (animals) task for diagnosis of CATD.136 Failing both tests (MIS ≤4 and animals ≤9) had a sensitivity of 0.91 and specificity of 0.81.

CATD Versus MCI

No studies reported data on test combinations of brief, stand-alone cognitive tests with another brief cognitive test for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

One study evaluated combining the MMSE with a phonemic (letter – “F”, “A”, and “S”) verbal fluency task in a model, stratified by CATD severity.80 When compared to normal controls, among participants with CATD and MMSE>23, sensitivity and specificity for distinguishing CATD from normal cognition were 0.88 and 0.99, respectively, whereas among participants with MMSE ≤23, sensitivity for distinguishing CATD from normal cognition was 1.0 and specificity was 0.99.

Other Test Combinations

Seven publications of six studies (n=1,189) evaluated other brief test combinations,48, 50, 94, 108, 109, 137139 primarily including versions and subsets of the BNT, WAIS Digit Symbol, list learning, verbal fluency, TMT-B, WMS Logical Memory, and WMS Visual Reproduction.

Classification Accuracy

CATD Versus Normal Cognition

One study reported that a combination of delayed recall scores from the CERAD list-learning and WMS VR tasks, time to complete TMT B, and the 30-item BNT best discriminated between CATD and normal cognition. Incorporating these test results in regression models, with and without adjustment for age, sensitivity ranged from 0.97 to 0.98 and specificity ranged from 0.79 to 0.82.48 The same investigators also evaluated a combined intrusion errors score from the 15-item BNT, CERAD list-learning, and semantic and phonemic fluency.50 The optimal combination, determined by logistic regression, had a sensitivity of 0.29 and specificity of 0.98. A third study reported that a best-performing combination of tests of CERAD list-learning delayed recall and the 15-item BNT had sensitivity of 0.90 and specificity of 0.92.109

Two studies (n=302) evaluated137, 139 the combination of the 60-item BNT, Digit Symbol and WMS LM for distinguishing CATD from normal cognition.137, 139 In one of the studies, which compared mild CATD to normal cognition, sensitivity was 0.95 and specificity was 1.0,139 and in other study, which compared very mild CATD (CDR 0.5) to normal cognition, sensitivity was 0.68 and specificity was 0.74.137

Two studies evaluated combinations of semantic (category) fluency plus one other metric.94, 137 When the WMS VR delayed recall task was added to semantic fluency, sensitivity was 0.96 and specificity was 0.93. When the WMS LM delayed recall task was added to semantic fluency, sensitivity was 0.78 and specificity was 0.74. When the DRS total score was added to semantic fluency, sensitivity was 0.95 and specificity was 0.94.

Finally, one study evaluated a linear logistic combination of the WAIS-R Performance IQ index and the trials total score from the FCSR list learning task, resulting in a sensitivity of 0.93 and specificity of 0.93.

CATD Versus MCI

No studies reported data on other test combinations for distinguishing CATD from MCI.

Variation in Classification Accuracy by Participant Characteristics

One study reported that age and education were not statistically significant in their classification prediction model using the 30-item BNT CERAD list, TMT-B and WMS VR.48 Otherwise, no studies reported analyses addressing whether diagnostic performance of other test combinations for distinguishing between CATD and either normal cognition or MCI varied by participant characteristics.

Comparative Accuracy of Cognitive Tests

Study Characteristics

Nine studies (n=2,746) reported statistical tests to directly compare accuracy between individual cognitive tests or combinations for distinguishing between CATD and either normal cognition or MCI in the same study population.49, 51, 52, 54, 61, 89, 106, 112, 133 Most reported NINCDS-ADRDA criteria for the diagnosis of CATD with the exception of two studies that reported DSM-IV criteria.51, 52 None used NIA-AA criteria. Cognitively normal older adult samples were defined with diagnostic work up with the exception of one sample which self-reported as unimpaired54 and one which was not described.52 Participants with MCI were defined consistent with Petersen criteria.53 For the nine studies that reported comparative diagnostic accuracy, mean participant age was 74 years and 41 percent were male. In the six studies reporting race/ethnicity, all described participants as either exclusively or majority white.49, 51, 54, 61, 89, 106

Classification Accuracy

CATD Versus Normal Cognition

Eight studies evaluated the comparative accuracy of cognitive tests for distinguishing CATD from normal cognition, but there was no overlap in the tests compared across studies. In one study, the global Clock Drawing Test (CDT) score was found to significantly improve specificity (72% vs 63%) over Rouleau scoring total (p<.04, n=279).49 In a second study (n=478), MoCA classification was statistically significantly better than the MMSE (AUC 0.99 vs. 0.98, p<.05), and the same study found that the traditional MoCA was statistically significantly better than the short MoCA (AUC 0.99 vs. 0.99, p<.05).51 Other studies reported that classification performance did not statistically significantly differ between the MMSE and BMET (total plus two domain scores) (n=102),52 the MMSE and HVLT (list learning) trials total (n=380),54 the MMSE and CERAD total score (n=190),61 or between the CERAD total score and the CERAD list learning delayed recall (n=190).61

One study evaluating DRS summary scores (n=359) reported results for the proportion of participants who were correctly classified.89 The proportion correctly classified by the Memory subscale (94%) was better than for the Construction (72%, p<0.001), Attention (74%, p<0.001), and Conceptualization (75%, p<0.001) subscales, respectively. In addition, the proportion correctly classified by the Initiation/Perseveration subscale (93%) was better than for the Construction (72%, p<0.001), Attention (74%, p<0.001), and Conceptualization (75%, P<0.001) subscales. 89

The last two studies examined whether alternate ways of assessing language and memory offer improved classification over traditional neuropsychological measures. In the first (n=85), semantic fluency including correct responses and errors (intrusions and preservations) classified individuals between CATD and normal cognition significantly less well than evaluation of semantic fluency that also included clustering and switching metrics (reviewed above) (p for comparison of AUCs <0.05).133 In the second of these studies (n=583), a memory process dissociation procedure classified participants statistically significantly better than MMSE, WMS LM immediate recall, WMS paired associate learning, semantic fluency, phonemic fluency, BNT, TMT B, or WAIS-R Digit Symbol, not significantly different from WMS LM delayed recall, and statistically significantly worse than FCSR test free recall.112

CATD Versus MCI

Three studies evaluated the comparative accuracy of cognitive tests for distinguishing CATD versus MCI with no overlap in test comparisons across studies. In one study (n=449), compared with a short version of the MoCA, both the full MoCA (AUC 0.83 vs. 0.81, p<0.05) and the MMSE (AUC 0.85 vs. 0.81, n=449, p<0.05) were significantly more accurate.51 A second study (n=155) reported that the MMSE score better discriminated between CATD and MCI than the CERAD total score (p<0.01) and the CERAD total score performed better than the CERAD list delayed recall (p<0.007).61 The third study (n=117) reported that the RBANS Delayed Memory Index did not distinguish CATD from MCI differently than did either the RBANS list learning retention or prose/story recall retention scores alone.106

Variation in Classification Accuracy by Participant Characteristics

No studies reported data on whether the comparative accuracy of different cognitive tests for distinguishing CATD from either normal cognition or MCI varies by patient characteristics.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (14M)
  • Disable Glossary Links

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...