5Food-Based Assessment of Dietary Intake

Publication Details

This chapter addresses the question, What food-based dietary assessment methods hold promise for eligibility determination in WIC based on criteria related to either failure to meet Dietary Guidelines (indicated primarily by not meeting Food Guide Pyramid recommendations) or inadequate intake (indicated by falling below nutrient intake cut-off points based on Dietary Reference Intakes)? To answer the question, the committee examined the scientific basis for the potential performance of food-based methods for eligibility determination at the individual level. This examination required consideration of relevant dietary research at the group level. The committee was most interested in reviewing studies of dietary methods designed to assess the usual1 or long-term intakes of individuals and groups, especially those methods that may have the characteristics that meet the criteria for assessing dietary risk described in Chapter 4. To the extent possible, the committee focused on studies conducted with populations served by WIC: women in the childbearing years, children younger than 5 years of age, and low-income women and children from diverse ethnic backgrounds.

The term food-based dietary assessment methods refers to assessment tools used to estimate the usual nutrient or food intake of an individual or a group. Dietary intake is self-reported by individuals (since direct observation of intake by trained observers is impractical), and therefore poses greater challenges than does using anthropometric or biochemical measures for the determination of WIC eligibility. To use a dietary method to assess an individual's dietary risk of failure to meet Dietary Guidelines or inadequate intake, the method must have acceptable performance characteristics (described in Chapter 4). The committee focused on available dietary tools with regard to their ability to estimate usual intake and their performance characteristics (validity, reliability, measurement error, bias, and misclassification error). The intent was to determine how well the tools could identify an individual's WIC eligibility status based on the dietary risk of failure to meet Dietary Guidelines or inadequate intake. The committee considered data related to the correct identification of intakes of nutrients, foods, and food groups since elements from any of these three groupings could be used as the indicator on which a criterion could be based. For example, a method to identify failure to meet Dietary Guidelines must be able to identify accurately a person's usual intake from each of the five basic food groups of the Food Guide Pyramid.

This chapter describes (1) the importance of assessing usual intake, (2) commonly used research-quality dietary assessment methods, including their strengths and limitations, (3) methods that compare food intakes with the Dietary Guidelines, and (4) conclusions about food-based methods for eligibility determination.

A FOCUS ON USUAL INTAKE

As explained below, dietary assessment for the purpose of determining WIC eligibility must be based on long-term intake or the usual pattern of dietary intake, rather than intake reported for a single day or a few days. In the United States and other developed countries, a person's dietary intake varies substantially from day to day (Basiotis et al., 1987; Carriquiry, 1999; IOM, 2000a; Nelson et al., 1989; Tarasuk, 1996; Tarasuk and Beaton, 1999). This variation introduces random error in estimates of usual intake. Day-to-day variation in intake arises from multiple biologic and environmental influences such as appetite, physical activity, illness, season of the year, holidays, and personal economic conditions. An individual's intake may become either more erratic or more monotonous when economic constraints are added to other influences on dietary intake.

Relationships Among Daily Nutrient Intakes, Usual Intakes, and a Cut-Off Point

Figure 5-1 presents distributions of intake for a hypothetical nutrient X that is normally distributed. It depicts the relationship between the distributions of usual intakes of individuals within a population and the distribution of usual intake for that population (solid line P). L marks the cut-off point for determining whether an individual's usual intake is above or below a specified cut-off level L. The individual values reflected in a dotted line represent the day-to-day intakes of an individual that taken together comprise usual intake. On any given day, Individual A and Individual B can have a dietary intake for a specified nutrient that is at, above, or below L. However, Individual A has a long-term average intake (usual intake) below cutpoint L, whereas Individual B has an average or usual intake above cutpoint L. Compared with a set of recalls, a single recall or day of observation would identify many more individuals as falling below L for most nutrients. Therefore, the accurate approximation of an individual's usual intake requires data collection over many days (Basiotis et al., 1987; Beaton, 1994; IOM, 2000a; Sempos et al., 1993).

FIGURE 5-1. Relationship Between Distributions of Usual Intakes of Nutrient X for Individuals Within a Population (P) and a Generic Cut-Off Level L.

FIGURE 5-1

Relationship Between Distributions of Usual Intakes of Nutrient X for Individuals Within a Population (P) and a Generic Cut-Off Level L. SOURCE: Adapted from Yudkin (1996).

Identifying Who Falls Above or Below a Cut-Off Point

Estimating the proportion of a population group with a nutrient intake above or below L requires the collection of one day of intake data per person in the population plus an independent second day of intake for at least a subsample of the population (Carriquiry, 1999; IOM, 2000a; Nusser et al., 1996). This pro cedure allows for statistical adjustment of the distribution of nutrient intake for the group. That is, with data from 2 days, one can account for the day-to-day variation in intake that is described in the previous section. The statistical methods used account for day-to-day variability of intake in the population and other factors such as day-of-the-week and the skewness of the intake of nutrient X. However, no method based on one or two recalls is available to identify whether an individual's usual intake would be above or below L.

Variability in Food Intake

Turning from nutrients to foods, some individuals are relatively consistent in their intake of a few foods (such as low-fat milk or coffee) from day to day, but they may vary widely in their intake of other foods (e.g., corn or water-melon) (Feskanich et al., 1993). Available data suggest that within-person variability is at least as great a problem in estimating an individual's food intake as it is in estimating an individual's nutrient intake. In a German study based on 12 diet recalls per person collected over 1 year, the ratio of within-person to between-person variation in food group consumption was greater than 1.0 for nearly all of the 24 food groups included (Bohlscheid-Thomas et al., 1997). The ratio of within-person to between-person variation ranged from 0.6 for spreads to 65.1 for legumes. The high ratios2 reflect large day-to-day within-person variation in the consumption of different foods.

In summary, a large body of literature indicates that day-to-day variation in nutrient and food intake is so large in the United States that one or two diet recalls or food records cannot provide accurate information on usual nutrient and food intake for an individual.

OVERVIEW OF RESEARCH-QUALITY DIETARY METHODS FOR ESTIMATING FOOD OR NUTRIENT INTAKE

A large body of literature addresses the performance of methods developed to assess dietary intakes and conduct research on diet and health. Four methods—diet history, diet recall (typically 24-hours), food record, and food frequency questionnaire (FFQ)—have been widely studied (Bingham, 1987; Dwyer, 1999; IOM, 2000a; Pao and Cypel, 1996; Tarasuk, 1996; Thompson and Byers, 1994). Most studies of dietary data collection methods focus on the ability of a method to estimate nutrient intake accurately—ranging from just one nutrient to a wide array of them. Some studies examine performance with re spect to intake of foods or food groups. The findings discussed in this chapter highlight 24-hour diet recalls and food frequencies, since these are the most commonly used dietary methods in the WIC clinic (see Chapter 2).3

General Characteristics

The strengths and limitations of available dietary methods have been extensively reviewed elsewhere (Bingham, 1987; Briefel et al., 1992; Dwyer, 1999; Pao and Cypel, 1996; Tarasuk, 1996; Willett, 2000) and are summarized in Table 5-1. Each of the four methods may be used to provide nutrient intake data, food intake data, or both. Table 5-1 also presents major findings that have implications for use of each of the four methods in the WIC program. In addition, after providing descriptive information about the methods, the table presents two major groups of characteristics that are related to the framework described in Chapter 4—performance characteristics and characteristics related to responsiveness to operational constraints in the WIC setting. These characteristics include the resources required to administer the method (WIC staff, time, and facilities such as computer software), and burden and ability of the client to report or record intake accurately.

TABLE 5-1. Comparison of Performance and Operational Constraints of Selected Dietary Assessment Methods in the WIC Setting.

TABLE 5-1

Comparison of Performance and Operational Constraints of Selected Dietary Assessment Methods in the WIC Setting.

As shown in Table 5-1, the diet history and FFQ methods attempt to estimate the usual intake of individuals over a long period of time, often the past year. The 24-hour diet recall and food record methods reflect intake over 1 day or a few days. As discussed in the previous section, recalls and records are not good measures of an individual's usual intake unless a number of independent days are observed.4 On average, diet recalls and food records tend to underestimate usual intake—energy intake in particular. On the other hand, FFQs and diet histories tend to overestimate mean energy intakes, depending on the length of the food lists that are used and subjects' abilities to estimate accurately the frequency and typical portion sizes of foods they consume.

Methods Studies Conducted with Low-Income Women and Children

Table 5-2 summarizes the few dietary methods studies that have been conducted with low-income pregnant women and young children or in the WIC population. These studies have been primarily aimed at developing or testing the use of a food frequency instrument to assess nutrient intake in the clinic setting. When comparing results for nutrient intakes, correlations between the FFQ and sets of diet recalls are similar to or lower than those reported by studies conducted with more advantaged populations (e.g., see Table 5-1).

TABLE 5-2. Dietary Studies Conducted in the Low-Income or WIC Population.

TABLE 5-2

Dietary Studies Conducted in the Low-Income or WIC Population.

Sources of Error in Dietary Methods

The validity of a diet method depends on the use of a standardized methodology, the interviewer's skill, and the subject's ability to report intake accurately. The reliability or reproducibility of a diet method relates to actual withinperson variability in intake as well as to measurement error. Measurement error may be introduced by the subject, the interviewer, the methodology (such as the food measurement aids used to estimate portion size), and functions such as food coding. Bias may be caused by the systematic underreporting, overreporting, or omission of foods by an individual; interviewing or scoring processes; or errors in the food composition database used to code the dietary intake data. Discussion of some important sources of error follows.

Day-to-Day Variation

The major source of random error is day-to-day variation in intake, described earlier in this chapter (see earlier section, “A Focus on Usual Intake”). Because of high day-to-day variation in intake, high reliability (e.g., 0.8 or greater) of the diet recall or food record method would require many days of intake data. The number of days varies by the nutrient and frequency of consumption of food items containing the nutrient (Basiotis et al., 1987; IOM, 2000a; Nelson et al., 1989; Sempos et al., 1985). The error introduced by withinperson variation is so large that it rules out the usefulness of a single diet recall or diet record as a method of estimating an individual's usual intake. It appears impossible to eliminate within-person variation as a source of random error in the estimation of an individual's usual intake. Even if usual nutrient intake could be assessed with several days of observations of an individual's intake, collection of multiple days of intake is not feasible in the WIC clinic setting (see criterion 6, “Operational Constraints,” in Chapter 4).

There are two approaches to minimizing within-person variability in dietary data. The first involves collecting many days of dietary intake data and averaging the data to capture usual (mean) intake as well as the precision of the estimate (standard deviation around the mean). The number of days needed to attain a usually desired level of reliability of 0.8 or higher varies by the nutrient or food group to be measured because it is directly related to the magnitude of the within-person variability (IOM, 2000a; Nelson et al., 1989). Although the errors of individuals in a group tend to cancel each other out and leave an unbiased estimate of the true value for the group, estimates of usual intake with sufficient accuracy and reliability to judge an individual's eligibility status require multiple measures of daily intake.

The second approach to minimizing within-person variability is to use an FFQ. With this method, the individual is expected to summarize the usual intake of food items, based on her knowledge of how her dietary choices vary from day to day. In this case, reliability is typically judged by assessing the reproducibility of the intake estimates from repeated administrations of the questionnaire, and validity is assessed by comparing the intake estimates with usual nutrient intakes estimated from multiple days of intake using either diet recalls or diet records.

Reliability or reproducibility of both nutrient intake and food intake may be a problem in FFQs, just as it is in diet recalls. Using FFQs, correlation coefficients of 0.4–0.7 are typical for the reliability (reproducibility) of nutrients (McPherson et al., 2000; Serdula et al., 2001; Thompson and Byers, 1994), food groups, and single food items (Ajani et al., 1994; Bohlscheid-Thomas et al., 1997; Colditz et al., 1987; Feskanich et al., 1993; Jain and McLaughlin, 2000; Jarvinen et al., 1993; Salvini et al., 1989).

In a review of the literature on diet methods for children, McPherson et al. (2000) reported on two reliability test-retests of an FFQ among adolescents. An FFQ was administered 1 year apart to 9- to 18-year-olds by Rockett and colleagues (1995). They found average correlations of 0.5 for fruits, vegetables, and fruits and vegetables combined, and higher reproducibility for girls than for boys. Frank et al. (1992) compared a 64-item FFQ administered 2 weeks apart to 12- to 17-year-olds. Two-thirds of the adolescents reported similar results for low-fat milk, diet carbonated soft drinks, and shellfish. For 12 food groups, there was 50 percent or better agreement between the two FFQs.

Underreporting and Overreporting Intake

Diet Recalls and Food Records. Table 5-1 indicates that, in affluent societies such as the United States, diet recalls and food records for adults are both subject to systematic error or bias, primarily the underreporting of energy intake (Bingham, 1987, 1991). In U.S. dietary intake surveys that use diet recalls, up to 31 percent of the subjects may underreport their intake (Briefel et al., 1995, 1997; Klesges et al., 1995a). Compared with individuals of healthy weight, overweight adults and adolescents (and those trying to lose weight) are more likely to underreport energy intakes (Briefel et al., 1997; Klesges et al., 1995a). Similarly, those with lower socioeconomic status, education, and literacy levels are more likely to underreport intake than are other groups (Briefel et al., 1995, 1997; Klesges et al., 1995a). Baranowski et al. (1991) found that mothers were more likely to underreport than to overreport their young children's food intake during 24-hour diet recalls; mothers underreported food intake 18 percent of the time and overreported food intake 10 percent of the time. Several research groups (Johnson et al., 1998; Kroke et al., 1999; Sawaya et al., 1996; Tran et al., 2000) confirm that 24-hour diet recalls underreport energy intake when intakes are compared with estimates of energy expenditure as measured by doubly labeled water. A review of dietary assessment among preschool children found that diet recalls both overestimated and underestimated energy intake (Serdula et al., 2001). A review of dietary method studies among children ages 5–18 years also found that food records underestimated energy intake compared to doubly labeled water (McPherson et al., 2000). At present, there is no definitive way to identify individuals who either underreport or overreport their intake on diet recalls or food records—except, perhaps, in the extreme. Such systematic errors may mean that sets of diet recalls or records are questionable standards for evaluating the performance of FFQs and diet histories, but such evaluation is common practice (see below).

Food Frequency Questionnaires and Diet Histories. Previous studies of doubly labeled water in children have shown that FFQs overestimate total energy intake by about 50 percent in children (Goran, 1998; McPherson et al., 2000; Serdula et al., 2001). Kaskoun and colleagues (1994) reported that FFQs completed by parents for 4- to 6-year-old children substantially overestimate the children's energy intake by 58 percent in comparison with total energy expenditure as measured by doubly labeled water. Dietary studies using FFQs overestimated energy intake among children ages 5–18 years (McPherson et al., 2000). Taylor and Goulding (1998) found that a 35-item FFQ overestimated calcium intake by 18 percent compared with 4-day diet records based on parents' reports of the intakes of their 3- to 6-year-old children. The overestimation of intake based on long lists of foods in an FFQ is one reason that researchers statistically adjust for a group's total caloric intake when analyzing nutrient intakes from a FFQ. The usefulness of such adjustments when using a tool to establish eligibility is questionable. In addition, care must be taken to not overadjust (Thompson and Byers, 1994). Dietary histories also were found to overestimate energy intake by 12 percent in a small group of 3-year-old children and by 8 percent of 5-year-old children compared to the doubly labeled water method (Serdula et al., 2001). Therefore, using a diet history method to assess an individual's dietary risk for WIC eligibility would be biased toward higher estimates of energy, food, and nutrients. Using a diet assessment tool that overestimates intake would result in falsely classifying many individuals as meeting the Dietary Guidelines or having intakes that exceed a cut-off point for nutrient intake.

Differences by Type of Food. Several investigators (e.g., Feskanich et al., 1993; Salvini et al., 1989; Worsley et al., 1984) reported that people tend to overestimate their intake of foods perceived as healthy (such as vegetables) and underreport foods considered to be less healthy. Bingham (1987) suggested that fat, sugar, and alcohol are most subject to underreporting; however, there are no definitive conclusions about systematic errors related to specific foods or dietary patterns (Schoeller, 1990; Tarasuk, 1996; Tarasuk and Brooker, 1997). A ten dency to overreport vegetables and underreport sources of fat, sugar, and alcohol would lead to overestimation of intake of one food group and some essential nutrients and to underestimation of energy. These inaccuracies could result in a low specificity. That is, people who truly do not meet criteria based on the Dietary Guidelines or nutrient cut-off points would be misclassified as ineligible for WIC.

Portion Size Estimation

Diet recalls and food records are subject to respondents' errors in reporting or recalling portion sizes consumed. Weighed food records provide more accurate portion size data, but weighing requires additional time and effort by the subject. In a series of experiments investigating the cognitive processes involved in long-term recall, Smith (1991) studied the ability of subjects to recall or distinguish portion sizes accurately. He found that individuals cannot distinguish between the definitions of portion size provided by commonly used FFQs (for example, small, medium, or large; or medium = 1 medium apple). This research suggests that individuals have poor ability to provide accurate portion size information and that typical food frequency instruments are not satisfactory for collecting high-quality information on portion sizes. This limits the usefulness of FFQs in quantifying the numbers of standard servings of food consumed or nutrient intakes by individuals—thus increasing the chance of misclassifying a person's WIC eligibility status.

Interviewer Bias

The person collecting the dietary intake data may introduce systematic error by assuming certain cultural practices rather than asking the subject, or by using unstandardized, leading probes to elicit information. In the research setting, controls ordinarily are in place to minimize these problems. In a service setting, however, there may be interruptions, distractions, time constraints, and minimally trained staff collecting dietary intake information.

The Accuracy of Food Frequency Questionnaires

Correlations with Usual Intake from Diet Recalls or Food Records— Adolescents and Adults

FFQs have many features that make them seem attractive for dietary data collection in WIC settings (Table 5-1), but do they reduce within-person variation and other sources of error enough that a valid result can be obtained in a short time? To examine the validity of FFQs, investigators often compare results from an FFQ with the estimation of usual intake obtained from a set of research-quality diet recalls or food records. They estimate usual intake of the individuals in the group by obtaining 24-hour recalls or food records over many days using standardized methods. The results are sometimes called a gold standard against which the accuracy of other methods can be compared, despite the possibility of systematic underreporting as mentioned above.

Correlations between estimates from FFQs and two 7-day records are typically in the range of 0.3 to 0.6 for most nutrients (Sempos et al., 1992). After statistical adjustments (deattenuation) for energy intake and within-person variation using data from diet recalls or diet records, correlations reported for FFQs used in research studies range between 0.4 and 0.8 (Block et al., 1990; Blum et al., 1999; Brown et al., 1996; Friis et al., 1997; Robinson et al., 1996; Stein et al., 1992; Suitor et al., 1989; Treiber et al., 1990; Willett et al., 1987). Mean correlation coefficients cluster around 0.5 (Jain and McLaughlin, 2000; Jain et al., 1996; Longenecker et al., 1993). In general, correlations for adolescents between the validation standard and diet method were higher for single diet recalls and diet records than for FFQs (McPherson et al., 2000). In one study among adolescents, correlations between 3-day diet records and serum micronutrients ranged from 0.32 to 0.65 (McPherson et al., 2000).

The nutrients being assessed and the number of items on an FFQ can affect the validity of the questionnaire. A 15-item questionnaire designed to determine the adequacy only of calcium intake had a 0.8 correlation with intake determined from a 4-day food record (Angus et al., 1989). Among tools that assessed a broad range of nutrients, the highest correlation coefficients that the committee found for women were those reported by the EPIC Group of Spain (1997) for a 50- to 60-minute diet history interview compared with 24-hour recalls obtained over the previous year. Excluding cholesterol, the correlations ranged from 0.51 for -carotene to 0.83 for alcohol; half were 0.7 or greater. However, even a correlation coefficient of 0.8 reflects a substantial degree of error when examined at the level of the individual (see “Agreement of Results by Quartile and Misclassification,” below).

Wei et al. (1999) reported on the use of a modified FFQ to assess nutrients in low-income pregnant women ages 14 to 43 years (see Table 5-2). Fourteen percent of the sample was excluded due to unusually high intakes (above 4,500 calories) indicating probable overestimation problems for a proportion of the population. Unadjusted correlation coefficients ranged from 0.3 for carotene to 0.6 for folate, with a mean correlation coefficient of 0.47, following exclusions.

The validity of questionnaires with regard to food or food group intake also is a problem. Little evidence is available concerning the ability of FFQs to estimate intake correctly when servings of foods or food groups (rather than nutrients) are the units of comparison (Thompson et al., 2000). In the study by Bohlscheid-Thomas and colleagues (1997), correlation coefficients between food group intakes obtained from the 24-hour recalls and a subsequent FFQ ranged from 0.14 for legumes to 0.9 for alcoholic beverages. For 9 food groups, corre lations were less than 0.4, for 11 they were between 0.4 and 0.6, and for 4 they were greater than 0.6. Similarly, Feskanich and coworkers (1993) reported a range of 0.17 for “other nuts” to 0.95 for “bananas,” with a mean correlation of 0.6 after adjusting for within-person variation in intake. Field et al. (1998) found correlations of 0.1 to 0.3 for vegetables, fruit juices, and fruits, and 0.4 for fruits and vegetables combined among ninth to twelfth graders between a 27-item FFQ and an average of three diet recalls. In general, these correlation coefficients are not better than those found by investigators studying nutrients rather than foods.

Correlations with Usual Intake from Diet Recalls or Food Records—Young Children

Few validity studies have been conducted of questionnaires designed to assess the diets of young children (Baranowski et al., 1991; Blum et al., 1999; Goran et al., 1998; McPherson et al., 2000; Persson and Carlgren, 1984). Blum et al. (1999) assessed the validity of the Harvard Service FFQ in Native American and Caucasian children 1 to 5 years of age in the North Dakota WIC Program (see Table 5-2). An 84-item FFQ was self-administered twice by parents, at the first WIC visit and then after the completion of three 24-hour recalls. Correlations ranged from 0.26 for fiber to 0.63 for magnesium and averaged 0.5.

Persson and Carlgren (1984) evaluated various dietary assessment techniques in a study of Swedish infants and children. They found that a short FFQ (asked of parents) was a poor screening instrument with systematic biases when used for 4-year-olds. Staple foods such as potatoes, bread, cheese, and fruits were overestimated and sucrose-rich foods such as cakes were underestimated compared with results from food records.

Agreement of Results by Quantile and Misclassification

A number of researchers question the appropriateness of using the correlation coefficient (Hebert and Miller, 1991; Liu, 1994) or a single type of correlation coefficient (Negri et al., 1994) to assess the validity and reliability of food-based questionnaires because a high correlation does not necessarily mean high agreement. This question is especially relevant to the situation in WIC, where estimation errors are of great concern if they result in the misclassification of individuals with regard to their dietary risk. Another way to examine validity and the potential misclassification problem is to examine results of studies that report agreement of the results by quantile.

Robinson et al. (1996) compared results from a 4-day diet record obtained at 16 weeks of gestation with those from a 100-item FFQ obtained at 15 weeks of gestation. They found a range: 30 percent of the women were classified in the same quartile of intake for starch, and 41 percent were in the same quartile for calcium. Eight percent were classified in opposite quartiles for energy, protein, and vitamin E intakes. Friis et al. (1997) found that 71 percent of young women were in the same quintile or within one quintile when comparing intakes from an FFQ and three sets of 4-day food records. On average, 3.8 percent were grossly misclassified into the highest and lowest quintiles by the two methods.

Freeman, Sullivan, et al. (1994) compared a 4-week FFQ (either the Block FFQ or the Harvard Service FFQ) with three 24-hour diet recalls conducted by telephone among 94 children and 235 women participating in WIC (see Table 5-2). Most correlations between the FFQ and the average of three recalls were below 0.5. The FFQ performed more poorly among children than among women and also among Hispanics than among African Americans and non-Hispanic whites.

Suitor et al. (1989) compared the results of three 24-hour dietary recalls and a 90-item FFQ among pregnant women and found that fewer than half of the women who were in the lowest quintile by one method also were in the lowest quintile by the other method (see Table 5-2). The quintile agreement ranged from 27 percent for iron to 54 percent for calcium. Percentage agreement improved (to 43 percent for protein and to 77 percent for calcium) when individuals from the first and second quintile of the FFQ were compared with those in the first quintile of the 24-hour dietary recalls. Clearly, substantial misclassification of nutrient intake occurred at the individual level.

Different questionnaires give different results with the same subjects (McCann et al., 1999; Wirfalt et al., 1998). Although McCann and colleagues (1999) reported that the results of different methods are correlated (i.e., r ranges from 0.29 to 0.80), the methods would likely classify individuals differently. Walker and Blettner (1985) examined potential agreement when results from an imperfect method of dietary assessment (e.g., an FFQ) are compared with those from a method believed to be accurate (e.g., many days of research-quality food records). Table 5-3 shows their calculations of the probabilities of misclassification in quintile ranking for correlation coefficients ranging from 0.0 to 0.95.

TABLE 5-3. Probabilities of Misclassification of a Reference Ranking in Quintiles, Using an Imperfect Alternative.

TABLE 5-3

Probabilities of Misclassification of a Reference Ranking in Quintiles, Using an Imperfect Alternative.

Note that even if the correlation coefficient between the two methods were 0.8 (ordinarily considered to be excellent correspondence), less than half of all respondents would be allocated to the same quintile by the two methods. This indicates that FFQs hold great potential for misclassification at the level of the individual—regardless of whether nutrient, food, or food group intakes are being estimated.

Another way to examine the error in misclassification would be to consider the sensitivity and specificity of the tool and how they would translate to numbers of people miscategorized. Using the relatively high sensitivity and specificity values from the example in the following section and assuming that 25 percent of the population meets the Dietary Guidelines (a value much higher than currently estimated), we see in Table 5-4 that roughly one-fourth of the population (275/1,000 individuals) would be misclassified. Increasing the sensitivity by increasing the cut-off would increase the number of eligible individuals who test positive and reduce misclassification. If a lower, more realistic value representing the percentage of the population that meets the Dietary Guidelines were used, the percent of eligible persons who would be found ineligible would be larger (Table 5-5).

TABLE 5-4. Results from a Dietary Tool with a Relatively High Sensitivity and Specificity when 25 percent of the Population Meets the Dietary Guidelines.

TABLE 5-4

Results from a Dietary Tool with a Relatively High Sensitivity and Specificity when 25 percent of the Population Meets the Dietary Guidelines.

TABLE 5-5. Results from a Dietary Tool with a Relatively High Sensitivity and Specificity when 5 percent of the Population Meets the Dietary Guidelines.

TABLE 5-5

Results from a Dietary Tool with a Relatively High Sensitivity and Specificity when 5 percent of the Population Meets the Dietary Guidelines.

Limitations and Uses of Brief Dietary Methods

Shortening and simplifying FFQs may make it easier for WIC clientele to respond (whether the FFQ is self-administered or administered by WIC personnel) (Subar et al., 1995), but is the validity of short FFQs acceptable? Based on studies by Byers et al. (1985), Caan et al. (1995), Haile et al. (1986), and others, it is unreasonable to expect that a shortened FFQ will be more accurate than a longer version. For example, Caan et al. (1995) evaluated the sensitivity, specificity, and positive predictive value of a 15-item fat screener when used to identify persons with total fat intakes greater than 38 percent of calories. When they compared results with those obtained from the 60-item Health Habits and History Questionnaire (Block et al., 1990), the fat screener had a low rate (2.7 percent) of gross misclassification—for example, the rate when the lowest quintile by the FFQ was compared with the highest two quintiles by the screener. Caan and colleagues (1995) found that the fat screener had insufficient sensitivity and specificity to be used as a single assessment method for fat. For example, when sensitivity was 75 percent, specificity was 65 percent; but when the cut-off point was raised, sensitivity was 47 percent and specificity was 89 percent. They suggested that the screener would be useful in combination with other dietary methods that also estimate energy intake.

Others have found that measures taken to shorten and simplify questionnaires reduce their validity in the research setting. For example, Schaffer and colleagues (1997) reported that median energy intake from a shortened telephone version of an FFQ was 23 percent lower in women than that obtained from a longer FFQ. These investigators reported correlation coefficients ranging from 0.45 for vitamin E to 0.78 for fiber for the two FFQs, suggesting considerable lack of agreement. Similarly, Thompson and coworkers (2000) reported that both a 7-item and a 16-item screener for fruit and vegetable consumption underestimate intake.

Brief dietary tools have varying degrees of usefulness, depending upon the need for quantitative, qualitative, or behavioral data. They have been developed to measure usual intake, to screen for high intakes of certain nutrients (e.g., total fat, iron, calcium), or to measure usual intake of particular food groups (such as fruits and vegetables). Several examples have been published (e.g., Block et al., 1989; Caan et al., 1995; Feskanich et al., 1993; Kristal et al., 1990; McPherson et al., 2000; NCHS, 1994; Thompson and Byers, 1994). A major limitation of using them to assess intake in the WIC clinic is that they usually target one nutrient or food group, rather than the entire diet. Thus, they are not directly relevant to determining whether the individual met the Dietary Guidelines or consumed an adequate diet, but they may be useful for planning targeted nutrition education.

METHODS TO COMPARE FOOD INTAKES WITH THE DIETARY GUIDELINES

The committee was given the charge of investigating methods to determine if an individual fails to meet the Dietary Guidelines. For example, can a practical, accurate method be found or developed to compare reported food intake with recommendations derived from the Dietary Guidelines (USDA/HHS, 2000). The committee found no studies that directly examine the performance of dietary intake tools used to compare an individual's food intake with the Dietary Guidelines, but did find the following related information.

Dietary Intake Form Method

Strohmeyer and colleagues (1984) claimed that a rapid dietary screening device (called the Dietary Intake Form, or DIF) “ … provides a rapid, valid, reliable, and acceptable method of identifying the individual with a poor diet” (p. 428). Although the DIF was developed before the existence of the Dietary Guidelines and the Food Guide Pyramid, it was intended to compare a person's intake with reference values that are similar to the Pyramid's five food group recommendations. The DIF asks the person to write the number of times the following foods are consumed per week: yogurt and milk; cheese; fish, eggs, meat; dried peas and beans; leafy green vegetables; citrus fruit; other fruits and vegetables; bread; and noodles, grains, and cereals. It also asks the respondent to circle his or her portion size as it compares with specified standards. The average time to complete the DIF is about 4.5 minutes, with a range of 2 to 10 minutes.5 A staff member computes a DIF score by a series of arithmetic processes.

The methods that Strohmeyer and colleagues used to test reliability and validity are of questionable relevance to the WIC setting. They tested reliability using 40 college students who completed the DIF on two occasions 2 weeks apart. Correlation coefficients for the paired food-group and total dietary scores averaged 0.81. Validity testing involved input by researchers rather than by clients and scoring by researchers rather than by clinic staff. Researchers entered data from 29 8-day food diaries onto DIFs and then computed dietary scores. Subsequently, they correlated those DIF scores with total mean Nutrient Adequacy Ratios (NAR, in which NAR equals the subject's daily intake of a nutrient divided by the Recommended Dietary Allowance of that nutrient). Under these carefully controlled conditions, the correlation of DIF and NAR scores was 0.83. It is likely that reliability and validity testing using the clinic population and clinic personnel would produce less favorable results. More importantly, the limitations described for brief FFQs would be applicable to the DIF as well.

Mean Adequacy Ratio Methods

A more recent study examined the sensitivity and specificity of two Pyramid-based methods of scoring nutritional adequacy (Schuette et al., 1996). For both scoring methods, registered dietitians obtained data from 1-day food records. They assigned the reported food items to the five Pyramid food groups and “other” (fats, oils, sugars). In the first method, the score represents the number of food groups from which the person consumed at least the minimum recommended number of servings. In the second method, the score represents the number of food groups from which the person consumed at least one serving. The two types of scores were compared with a mean adequacy ratio (MAR-5)6 based on the subject's intakes of iron, calcium, magnesium, vitamin A, and vitamin B6 as calculated from the same food record. For the first method, sensitivity was 99 percent but specificity was only 16 percent. That is, the first food group method classified nutritionally inadequate diets as inadequate, but it had extremely low ability to classify nutritionally adequate diets as adequate. For the second method, sensitivity was 89 percent and specificity was 45 percent. Thus, even when the cut-off point was more lenient (as in the second method) the ability to identify the nutritionally adequate diets was no better than chance. Either MAR method would depend on data from one or two 24-hour diet recalls, and thus would be subject to all the limitations of diet recalls presented earlier in this chapter.

Estimating the Number of Pyramid Portions

Accuracy of Estimation

The portion sizes that an individual consumes can make a great difference in the degree to which his or her intake meets the recommendations made in the Food Guide Pyramid. Questionnaires either assume a standard portion size, which may or may not be shown on the questionnaire,7 or the respondent is asked to choose a single average portion size (small, medium, or large). However, two major factors affect the accuracy of portion size estimation: (1) withinperson variability in portion size, and (2) ability to recall portion size (see earlier section, “Portion Size Estimation”).

Within-person variability in portion size is greater than between-person variability for most foods and for all the food groups studied by Hunter et al. (1988). That is, for food groups, the range of the variance ratios (within/between) obtained from four 1-week diet records was 1.6 (fruit) to 4.8 (meat) when pizza was excluded (the variance ratio was 22 for pizza). No studies were found that examine the extent to which the portion size used on a questionnaire reflects the individual's average portion size.

Using the U.S. Department of Agriculture Protocol for Portion Sizes

Even if portion size has been reported accurately, the consumption of mixed foods complicates the estimation of the number of portions a person consumes from each of the five Pyramid food groups. For example, 1 cup of some kinds of breakfast cereal may be about half grain and half sugar by weight so should be counted as only one-half serving from the breads and cereals food group.

To determine the numbers of servings of foods in the five major food groups from diet recalls or records accurately, researchers at the U.S. Department of Agriculture (USDA) developed the Continuing Survey of Food Intake by Individuals (CSFII) Pyramid Servings database (Cleveland et al., 1997). Eighty-nine percent of the foods in this database are multiple-ingredient foods. USDA separated these foods into their ingredients and categorized these ingredients into food groups that were consistent with Pyramid definitions for serving sizes (USDA, 1992). If a woman reported eating chicken pie, for example, the database allows estimation of the servings or fractions of a serving of grains, meat, vegetables, and milk products (if applicable) provided by the specified weight of the pie. This means that the accurate comparison of food group intake with recommended intake would require accurate food intake data collected over a number of independent days together with computerized assignment of food ingredients to food groups. Notably, this method of estimating servings was used in two rigorous studies that found that fewer than 1 percent of women (Krebs-Smith et al., 1997) and young children (Munoz et al., 1997) met recommendations for all five food groups (see Chapter 8).

Healthy Eating Index Scores

USDA's Center for Nutrition Policy and Promotion developed the Healthy Eating Index (HEI) to assess and monitor the dietary status of Americans (Kennedy et al., 1995). The 10 components of the HEI represent different aspects of a healthful diet. Five of the components cover the five food groups from the Food Guide Pyramid and the other five cover elements of the 1995 Dietary Guidelines concerning fat, saturated fat, cholesterol, sodium, and variety. The computation of the number of servings from each food group requires the use of complex computerized methods to disaggregate mixed foods into ingredients (Cleveland et al., 1997). Each component may receive a maximum score of 10. The index yields a single score (the maximum score is 100) covering diet as a whole and measuring “how well the diets of all Americans conform to the recommendations of the Dietary Guidelines and the Food Guide Pyramid” (Variyam et al., 1998).

Theoretically, an HEI score would be a comprehensive indicator of whether a potential WIC participant of at least 2 years of age fails to meet Dietary Guidelines. However, the complexity of methods required to obtain this score limits the feasibility of using it in the WIC setting. The process described above must be used to separate foods into ingredients and categorize the ingredients into food groups, and separate scores must be computed for each of the 10 components of the HEI score.

An HEI score of 100 is equivalent to meeting all the Food Guide Pyramid recommendations plus recommendations for fat, saturated fat, cholesterol, and sodium.8 According to Bowman and colleagues (1998), a score of more than 80 implies a “good” diet. Using 1994–1996 CSFII data, approximately 12 percent of the population had a good diet. A good ranking, based on an HEI score of 80, is considerably more lenient than a criterion in which intake of fewer than the recommended number of servings in the Food Guide Pyramid is the cut-off for failure to meet Dietary Guidelines. Even if an HEI score could be obtained accurately in the WIC setting, the score would likely be sensitive, but not specific. The HEI score could be no more accurate than the data from which it is derived. Thus, it is subject to the limitations of the diet recall or FFQ used.

CONCLUSIONS REGARDING FOOD-BASED DIETARY ASSESSMENT METHODS FOR ELIGIBILITY DETERMINATION

Under the best circumstances in a research setting, dietary assessment tools are not accurate for individuals. In particular, a diet recall or food record cannot provide a sufficiently accurate estimate of usual food or nutrient intake to avoid extensive misclassification. Similarly, research-quality FFQs result in substantial misclassification of individuals in a group when results from FFQs are compared with those from sets of diet recalls or food records. Moreover, studies by Bowman et al. (1998), Krebs-Smith et al. (1997), McCullough et al. (2000), and Munoz et al. (1997) (see Chapter 8) suggested that even if the use of research methods were possible in the WIC setting, such methods would identify nearly everyone as failing to meet Dietary Guidelines.

In WIC, a dietary assessment method is used by the competent professional authority (CPA) to determine an individual's eligibility for WIC in the event that the person has no anthropometric, medical, or biochemical risks (see Chapter 2). The result thus may determine whether or not the applicant will receive WIC benefits for a period of several months or longer. Ordinarily, the CPA compares the individual's reported intake of foods with preset standards for the numbers of servings in five or more food groups. Even if reported intakes were accurate, estimation of food group scores would likely be inaccurate because of the high frequency of mixed foods. If reported intake or assigned food group scores are inaccurate, correct identification of eligibility status is compromised.

Shortening FFQs tends to decrease their validity. Very short screens are targeted to one nutrient or food group rather than providing a relatively complete assessment of dietary intake. Methods to compare food intakes with dietary guidance have the limitations of short screens or are too complex to be useful in the WIC setting. Environmental and other factors present in the WIC setting are expected to decrease the validity of tools when compared with those found in the research setting. Consequently, the validity reported for research-quality FFQs can be considered an upper limit for the validity of questionnaires used by WIC.

When using these dietary assessment procedures for group assessment, researchers generally have been willing to tolerate a substantial amount of error, for which they could partially compensate by increasing the number of participants in their research or using statistical correction procedures, called corrections for attenuation (Traub, 1994). Error in the assessment of an individual for certification in the WIC program (that is, misclassification error), however, has serious consequences: truly eligible individuals may not be classified as eligible for the services (less than perfect sensitivity), or individuals not truly eligible for the services may receive them (less than perfect specificity).

Because of these limitations, the committee concludes that there are not now, nor will there likely ever be, dietary assessment methods that are both suf ficiently valid and practical to distinguish individuals who are ineligible from those eligible for WIC based on the criterion failure to meet Dietary Guidelines or based on cut-off points for nutrient intake. Nonetheless, dietary tools have an important role in WIC in planning or targeting nutrition education for WIC clients, as described in Chapter 9.

Footnotes

1

Usual intake is defined as the long-run average intake of food, nutrients, or a specific nutrient for an individual (IOM, 2000a).

2

The within-person variability is an individual's day-to-day variability in reported intakes (or intraindividual variability or standard deviation within). The between-person variability (or interindividual variability) is the variability in intakes from person to person. A higher ratio of within- to between-person variability means that the variability of the food or nutrient intake is greater within an individual than the variability between individuals.

3

The dietary history method used in the WIC clinic is not necessarily the traditional diet history method, which takes about one to two hours to administer properly. Food records are not often used because of time limitations and difficulties obtaining complete and accurate records.

4

For some nutrients (such as vitamin A) that are highly concentrated in certain foods, or foods that are eaten sporadically, many days or months of intake may be needed to accurately estimate the usual intake of an individual (IOM, 2000a).

5

It is notable that 21 percent of the subjects did not complete the forms; reasons were not reported.

6

MAR-5 = average nutrient adequacy ratio (NAR) of the five nutrients. NAR is the nutrient content calculated as a percentage of the RDA and truncated at 100.

7

Often the portion size used is either the median for the population group as obtained from a nationwide survey or a common unit such as one slice of bread.

8

The HEI also includes a variety score, but it is not applicable to the current Dietary Guidelines.