NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
ECRI Health Technology Assessment Group. Determinants of Disability in Patients With Chronic Renal Failure. Rockville (MD) : Agency for Healthcare Research and Quality (US); 2000 May. (Evidence Reports/Technology Assessments, No. 13.)
This publication is provided for historical reference only and the information may be out of date.
Because we were unable to locate any published studies that reported the kind of data useful to this project, we attempted to use individual patient data from the USRDS for our purposes. In particular, the Dialysis Morbidity and Mortality Study (DMMS) Wave 2, which was conducted as a prospective quality-of-life study on more than 4,000 patients who started dialysis during 1996 and 1997, was expected to provide particularly useful data for the purposes of this report. However, it was not designed to study disability.
Description of USRDS Data1
Since its creation in 1988, the USRDS has pursued the collection and analysis of information on the incidence, prevalence, treatment, morbidity, and mortality of ESRD in the United States. The USRDS was operated by the Coordinating Center at the University of Michigan from 1995 to 1999, and is now operated by the Minneapolis Medical Research Foundation. It is funded primarily by the NIDDK of the National Institutes of Health with supplementary funding from the Health Care Financing Administration (HCFA).
The USRDS Database
HCFA provides most of the basic data in the USRDS database. In addition to all the data from its ESRD Program Management and Medical Information System (PMMIS) and the Annual Facility Survey, HCFA shares data on transplant followup and Medicare Parts A and B services derived from Medicare claims. These HCFA-supplied data are the core of the USRDS database. Data in the USRDS database collected by HCFA's ESRD Networks, Federal insurance carriers, and fiscal intermediaries are supplemented by data from the Social Security Administration, the U.S. Bureau of the Census, local and national ESRD provider databases, and international ESRD registries.
In addition, HCFA helps the USRDS with Special Studies, smaller studies with a specific purpose that collect data from a patient subgroup of interest. Most of the new primary data for Special Studies are collected through the 18 ESRD Networks, which are funded by HCFA. Data from the Special Studies are fully integrated into the USRDS database. Data collection began in 1996 for the DMMS Wave 2 (to be described and analyzed in this report). Data not otherwise contained in the USRDS database were collected for the entire DMMS project (Waves 1 to 4) from a national sample of nearly 24,000 patients drawn from all U.S. dialysis units.
The USRDS database is updated and a summary published every year. The last update was in the Spring of 1999, using data collected through early 1998. Because of delays in processing data through the Medicare system, the USRDS has generally waited 15 months before reporting patient-specific data for a given time period. Thus, tables in the 1996 Annual Data Report (ADR) for example, generally reported data through December 1993. Because of improvements in the flow of data to the USRDS, this 15-month rule was relaxed in the 1997 and 1998 ADRs.
USRDS Goals
The USRDS has six primary goals. The last 2 were added in 1994 and have been reflected in all data reports since then:
- 1.
to characterize the total ESRD patient population and describe the distribution of patients by sociodemographic variables across treatment modalities;
- 2.
to report on the incidence, prevalence, mortality rates, and trends over time of ESRD by primary diagnosis, treatment modality, and other sociodemographic variables;
- 3.
to develop and analyze data on the effect of various modalities of treatment by disease and patient group categories; and
- 4.
to identify problems and opportunities for more focused special studies of renal research issues. This goal has been addressed with special studies requiring new data collection.
- 5.
to conduct cost-effectiveness studies and other economic studies of ESRD, and
- 6.
to put new emphasis on supporting investigator-initiated projects to conduct biomedical and economic analyses of ESRD patients.
Data Files
Individual patient data are made available on CD-ROM to interested researchers who apply for access. Key patient data that may compromise the privacy of these individuals is removed before dissemination.
DMMS Wave 2 Study Description
The DMMS Wave 2 was a prospectively designed study conducted during the years 1996 to 1997. It included a random sampling of 25 percent of U.S. dialysis centers (989 centers total). Only incident dialysis patients were included (patients who had started dialysis within 60 days of the study start date).
Patient selection was performed in the following manner: dialysis center staff were asked to identify all new peritoneal dialysis and in-center hemodialysis patients. All incident peritoneal dialysis patients were asked to participate. Twenty percent of in-center hemodialysis patients, picked based on the last digit of their social security number, were also asked to participate, in order to create an approximately equal ratio of peritoneal to hemodialysis patients. Home hemodialysis patients were excluded from this study (United States Renal Data System, 1999b).
The primary goal of the DMMS Wave 2 study was to assess pre-ESRD treatment practices, vascular access, and quality-of-life of patients starting on dialysis. It consisted of four basic segments: the first was a medical records questionnaire filled out by the dialysis center staff after the patient had agreed to participate in the study. The second was a patient quality-of-life questionnaire, filled out by the patient with or without assistance within 3 months of the study start date. The third section was the same patient quality-of-life questionnaire filled out approximately 9 to 12 months later. The fourth section was an abbreviated version of the initial medical questionnaire that was filled out 9 to 12 months after the original. An analysis of this data set has been published in abstract form by the originating researchers, and a full-length peer-reviewed journal article is also being prepared (Hirth, 1999).
The data from DMMS Wave 2, as provided by the USRDS Coordinating Center, are presented in Appendix A, including minor coding changes made by ECRI.
Methodology
Acquisition of Data
The CD-ROM that included DMMS Wave 2 data was obtained through application to USRDS and NIDDK. We explained the purpose of this study, and received permission to obtain the CD-ROM in April 1999. The CD was obtained from the USRDS Coordinating Center, then at the University of Michigan.
Validity Analysis
Before using the USRDS database in de novo analyses, it was important to ensure that the results of such analyses would be relevant to the entire U.S. population of persons with ESRD that might apply for SSDI or SSI disability benefits. Obviously, any results derived from these data would be of limited utility if they could not be extended beyond the specific study sites and patients (Cook and Campbell, 1979). It was also important to determine whether a variable measures what it is supposed to measure and that no major coding errors were present.
There are several different types of validity analyses that are possible, some of which can be performed after a study has been conducted, and others of which require appropriate study design (and, hence, depend upon the originating researchers to ensure and report). Below, we discuss three general types of validity-external, internal, and construct-and our results after assessing these aspects of the USRDS data.
External Validity
The term "external validity" refers to whether the findings of a study can be generalized to the population it was intending to represent, as well as across populations, places, and times (Cook and Campbell, 1979). Not all aspects of external validity can be empirically assessed, but they must be ensured and reported by the researchers conducting the study; for example, response rate in a voluntary-response study and the description of those agreeing to participate versus those refusing are important indications of external validity that only the original researchers can (and should) assess. As post-hoc researchers, we can only compare the characteristics of patients in the DMMS Wave 2 to those reported for the entire USRDS database in the USRDS Annual Data Report.
Internal Validity
Internal validity refers to whether the data contained in a study can reliably lead to the types of conclusions that the study was intended to make. DMMS Wave 2 intended to assess the quality-of-life of patients on dialysis, to assess their 9 to 12 month outcomes based on the type of treatment they received, and what extent of physician contact and treatment they received before going on dialysis.
Internal validity is primarily concerned with cause-and-effect relationships (e.g., whether a certain treatment caused a particular outcome). Because the DMMS Wave 2 study was not designed to establish causal relationships (but rather, simply to report the characteristics of patients with ESRD), we did not consider internal validity further in this report.
Construct Validity
Construct validity refers to whether variables in a study measure the concept of interest. This is often tested by looking at correlations between variables. In particular, construct validity means that variables that claim to measure the same phenomenon correlate strongly with one another ("convergence"), and that related, but conceptually distinct, variables "diverge" (low correlation). If, for example, a database that asks about employment status in several different ways results in answers that are markedly different (and thus have low correlations), one would have to conclude that this database has low construct validity (Cook and Campbell, 1979).
The results of these validity tests were then reviewed by three physicians in the fields of nephrology and pathology.
Analysis Reliability
Even if the DMMS Wave 2 data "pass" all of the above types of validity tests, it may still be possible that these data are not usable for statistical analyses. This could result from a small number of patients relative to the number of variables of interest (insufficient power) from a substantial amount of missing data for many of the patients. These types of situations become problematic when the analysis one wishes to do is multivariate. Multivariate models can be particularly unstable under certain circumstances and multiple regression equations are prone to "shrinkage," such that results are not as significant when applied to the general population as they were applied to the test population. Therefore, we followed a standard procedure of reliability by determining whether the same results would be obtained using randomly selected halves of the database.
Results
This section describes the validity and reliability analyses we conducted to ensure that the USRDS DMMS Wave 2 database was a reasonable source of individual patient data that could be reliably generalized to the entire U.S. population of patients with ESRD under age 65 who would be likely to apply for disability insurance. Our general conclusions were that this database lacked the necessary reliability to determine which patient characteristics predict that a patient is unable to work. The generalizability of these data to the population of interest, as related to disability, was also brought into question.
Validity Analyses
External Validity
Study size
In part, external validity is affected by the size of the study population. Studies with large numbers of patients are more likely to yield generalizable results than studies with small numbers of patients. In the DMMS Wave 2 study, information was recorded for 4,026 patients on dialysis at the start of the study in 1996: medical information was available for 3,985 patients on dialysis, and 2,713 completed the patient questionnaire addressing quality-of-life issues during an interview. While compliance with filling out the patient questionnaire was low, this would still be considered a large pool of individual data. However, as discussed below, the actual number of patients with followup information on employment status is much smaller than this, which causes some statistical difficulties for this analysis.
Patient selection bias
External validity also depends upon patient selection for the study. People and places selected randomly are more likely to yield data generalizable to the larger population than those nonrandomly selected. Information in the USRDS Researchers Guide indicates that, for DMMS Wave 2, 25 percent of U.S. dialysis centers were randomly selected from which to gather the patient pool. All identified peritoneal dialysis patients and 20 percent of all in-center hemodialysis patients at these centers were asked to participate. (These steps were taken due to the small proportion of PD patients in the U.S. dialysis population, and a desire to create a data set with equal numbers of patients receiving each type of treatment) (United States Renal Data System, 1999b). Hemodialysis patients were selected based on the last digit of their Social Security number. The fact that both centers and patients were selected without regard to individual characteristics represents a strength of the DMMS Wave 2 data.
However, the overrepresentation of patients on PD and the exclusion of patients on home HD is not a particular threat to external validity unless one analyzes the database in toto. It is possible to at least partly combat this threat by conducting separate statistical analyses on hemodialysis and peritoneal dialysis patients, so as not to misrepresent the makeup of the entire U.S. dialysis population. It is also possible, during statistical analyses, to "weight" the cases to reproportion the database.
Self-selection bias is a problem inherent to any voluntary-response study such as this, since certain types of people are apt to agree to participate, while others are not. The number of patients that were initially asked to take part in DMMS Wave 2 was not reported, and therefore it is not possible to calculate the study's response rate. This represents an aspect of the external validity we cannot address because it relies upon reporting by the original investigators. We can determine, however, that 67.4 percent of patients represented in the database completed the initial patient questionnaire and 42.0 percent completed the followup questionnaire, which is not an unusually low participation rate for such epidemiological studies. It does provide evidence, however, that a patient self-selection bias could be present in this database.
Comparison of DMMS Wave 2 to USRDS
One way to determine the extent of the effects of self-selection bias involves an in-depth comparison of raw data from the DMMS Wave 2 patients to the entire USRDS patient population. Such an analysis, however, was beyond the purview of the present project. Because such comparisons were also not reported by the researchers, we approximated such a comparison by comparing the DMMS data to summary statistics provided in the USRDS Annual Data Report (United States Renal Data System, 1999a), and these results are shown in Table 2. Table 2 depicts group averages that are not the result of an original statistical analysis.
Table 2 presents results, when available, separately for peritoneal and hemodialysis patients because of the disproportionate number of each type of dialysis patient in this database compared to the entire USRDS patient population. From these data, it appears that patients with diabetes are overrepresented in the DMMS Wave 2 data, while patients with glomerulonephritis are underrepresented, thus bringing the generalizability of this database into question.
Loss of patients to followup
The DMMS Wave 2 data shown in Table 2 are those collected at the beginning of the study. A followup questionnaire was administered to these patients approximately 9 to 12 months later (mean = 10.0 months, 95 percent confidence interval = 4.8 months), and all patients who completed the first questionnaire were asked to participate in the followup. However, 2,329 of these patients (58 percent) did not complete the followup patient questionnaire and for 501 (12 percent) the followup medical information was not available. Followup data on the variables "Ability to work full time" and "Employment status," two important variables for this report, are available for 1,670 patients (41.5 percent). Of the 2,376 without this information, 978 (41.2 percent) were not followed due to known occurrence of death. Others were recorded as "lost to followup" (494), or (presumably) chose not to answer the second questionnaire.
Loss of some patients to followup is the rule rather than the exception in longitudinal studies and is often more severe when conducting surveys than when conducting an experimental research study. Data on employment from the second questionnaire are not available for about 58 percent of the patients in the DMMS Wave 2 study (53 percent if those dying are excluded from the dropout rate). It is unknown what factors other than death account for this. This dropout rate is only important, however, if it is determined that there are substantial differences in medical, demographic, or functional measures between the patients who were followed and those that were not. Dropout can also present difficulties if the number of data points remaining is not sufficient to conduct reliable and reproducible statistical analyses. (This is addressed below.)
Comparison of patients with and without followup data
In order to address whether the remaining data were suitable to our purposes, we conducted a de novo analysis of the DMMS Wave 2 data, specifically comparing those patients who would be included in the final analysis (a total of 546) to those who would be excluded (a total of 3,480) (see Table 3 below for a summary of the included subgroup of patients). A better analysis would compare this final data set to the USRDS as a whole, but, as mentioned above, such an analysis was beyond the scope of this report.
Included in the final data set are incident dialysis patients who were younger than 65, who provided followup data on employment status or self-reported ability to work, and who were working at some point before or during initiation of dialysis treatment. This subset allows us to best identify predictors of inability to work because only those who have worked at some point are faced with the decision of whether they can continue working or not. All medical, demographic, and functional measures taken during the first interview were compared for these two groups of patients. To control for age effects, we compared only those under age 65 in both the excluded and included groups and, to control for the effects of severity of disease, we excluded data from all patients who died during the study (even though these patients may be maintained in the final data set, because death is an outcome of significance in the disability assessment process).
The analysis conducted consisted of a series of three types of univariate statistics comparing those in the final data set to those excluded, using SPSS as the statistical software (SPSS 9.0, SPSS, Inc., Chicago, IL). This analysis employed the phi coefficient to compare nominal categorical variables; all data on demographics and most functional status variables were analyzed using this statistic. The second analysis was a Kolmogorov-Smirnov Z Test, a nonparametric test that compares ordinally ranked categorical variables for two groups. This type of analysis was appropriate for several items on the patient questionnaires. The third analysis was a one-way ANOVA, suitable for analyzing the continuous variables in this database. Several items on the medical questionnaire (e.g., weight, blood chemistry measurements) were appropriate for this type of analysis.
These analyses indicated that there were statistically significant differences (as indicated by the p-value) between those included in the final data set and those excluded. While these may seem important, the effect sizes are very low (see Table 4). Effect sizes are expressed similarly to correlations, where a more extreme negative or positive number conveys a stronger effect. None of these effect sizes was above 0.3, a low to moderate effect size, suggesting that these findings may not be clinically significant. Some of these statistically "significant" relationships between variables may also be spurious due to collinearity of other variables not accounted for in these univariate analyses.
All differences between the excluded and included patients were in the direction anticipated, with younger and healthier patients more likely to be included in the final data set.
Two points are worthy of mention regarding our analyses of these differences. First, we did not attempt to correct for the fact that we employed multiple individual statistical tests on the same data set. By not doing so, we minimize the chance that we will overlook any difference between excluded and included patients (i.e., we maximize the statistical power, thus reducing the probability of a Type II error), but increase the probability that at least some of these apparent differences are the result of chance (i.e., there is an inflated Type I error rate in our comparisons). As such, these results are a "worst case" scenario, chosen to illustrate the maximum possible differences that could exist between included and excluded patients.
Second, when examining these results, one should not rely on the p-values to determine the magnitude of these differences. P-values are heavily influenced by the size of a study, and thus are a poor measure of the magnitude of difference between two groups. There are numerous examples in the literature of studies that used large numbers of patients and found that very small differences were statistically significant. An example of this is the putative statistically significant relationship between height and IQ (Dowdney, Skuse, Morris et al., 1998; Downie, Mulligan, Stratford et al., 1997; Wilson, Hammer, Duncan, et al., 1986).
The results of the statistics described in Table 4 were derived from what may be regarded as a relatively large number of patients. This explains why the p-values are relatively low. However, when one examines the effect sizes, which are a more accurate reflection of the magnitude of the differences between these two groups, a different picture emerges. In general, none of them is large. It is also important to note that the 34 variables listed in Table 4 are the significant results of 466 variables on which analyses were performed. Thus, for 431 socioeconomic, demographic, clinical, and laboratory values, there were no differences between these two groups. It is important to remember, however, that because we did not adjust the p-values, one would expect 23 differences (5 percent) to be significant simply due to chance.
Another important between-group comparison would be that comparing previously working patients for whom followup data were available versus previously working patients who were lost to followup. However, this particular analysis would have been most appropriate if it were done after we conducted the main logistic regression analysis. As discussed below, we did not conduct this analysis, and therefore did not perform the group comparison recommended here.
Conclusions about external validity
The results presented in the sections above offer mixed evidence about the external validity of this database. One basic problem is that patients were not chosen completely at random, but rather in a way to make a 50/50 mix of PD and HD patients. This means that this database cannot be analyzed in toto and cannot be expected to represent the characteristics of ESRD patients in the United States. When patient characteristics for PD and HD were examined separately and compared to the USRDS as a whole, differences still emerged in the makeup of these groups, such that certain diseases were disproportionately represented in DMMS Wave 2.
However, we would not use the entire DMMS Wave 2 database for our analysis, but rather a subset of patients for whom followup data were available, who were under 65, and who had worked at some point in the past. This subset of 546 patients appears to be younger and healthier than the patients excluded from the final subset, as might be expected.
Construct Validity
Statistical analyses
As mentioned earlier, construct validity refers to whether a test or questionnaire score represents the concept of interest. This can be assessed by correlating different measures of the same characteristic and seeing how well they match. We conducted several types of statistical analyses to assess the construct validity of the DMMS Wave 2 data. The first was a series of bivariate correlation analyses. This method indexes the degree to which the value of any single variable varies consistently with any other single variable. Both Spearman's rank correlation (rho) and Pearson's r were used; Spearman's rhois calculated the same as Pearson's r except that the value of each variable has been transformed to a rank. Pearson's r is only appropriate for continuous variables, while Spearman's rhois more appropriate for ordinal variables. Because both types of variables were present in this database, both types of correlation analyses were used. If comparing continuous and ordinal data, Spearman's was used. We use this as a method of approximation to check the data for any gross errors, such as coding errors, or illogical correlations (such as divergence between two variables where convergence might logically be expected to occur).
We also performed analyses similar to those that were done for external validity. Thus, we used the phicoefficient to relate nominal categorical variables, the Kolmogorov-Smirnov nonparametric method to relate nominal and ordinal categorical variables, and one-way ANOVA to relate nominal variables with continuous variables. These tests were conducted for cases in which at least one of the variables was nominal categorical. For example, such analyses were used with the variable "treatment modality type," which is coded as 1 = hemodialysis and 2 = peritoneal dialysis. This variable is not easily correlated with such continuous variables as creatinine levels and body-mass index.
Bivariate correlations were calculated for more than 300 variables. These analyses resulted in many statistically significant findings, perhaps because of the large number of cases (more than 3,000 for some variables). As discussed above, when a large number of cases are involved, statistically significant results are likely, not because of a large effect, but because of the large number of patients. Similarly, "false-positive" results (i.e., Type I errors) are likely when one conducts a large number of univariate statistical tests, not taking into account covariation of other variables that may be affecting results. This is partly why the magnitude of the correlation (r or rho) is a better indicator of the magnitude of a relationship than the p-value.
Correlational trends
Table 5 shows a summary of important correlations among different measures of employment status and ability to work. Because of the large number of univariate analyses involved in these correlation matrices and the risk of Type I errors, we are, for the purposes of the present document, defining a "significant" correlation as any one whose p-value was <0.0012 and r or rho-value above 0.2. As can be seen in the correlation matrices and summaries below, few of these correlations were large (defined arbitrarily as above 0.5), and most fell in the 0.2 to 0.4 range. As none of these questions asked exactly the same question about exactly the same point in time, it is to be expected that these variables would be correlated, but not strongly (related, but conceptually distinct measures). It must also be remembered that any statistically significant correlations may be spurious, as bivariate correlations do not account for collinearity of other variables.
Table 6 shows correlations among measures of functional status, and between functional status and employment status. Again, as predicted, these measures are significantly correlated with one another in a logical way, but many are only moderately correlated because they measure conceptually distinct, but related, constructs (such as height and weight). As mentioned above concerning employment-related variables, few of the questions on the patient questionnaire asked about exactly the same phenomenon, characteristic, or symptom. Therefore, they should not be strongly correlated. Those few questions that did ask conceptually similar questions yielded higher correlations (e.g., energy and "pep" had a correlation of about 0.7).
Table 7 shows correlations between laboratory measurements and employment variables. As expected, pre- and postdialysis measures of the same laboratory values correlate with one another moderately to strongly (0.4 to 0.8). When the same laboratory measures are compared at initial interview to those taken at followup interview 9 to12 months later, there is still a correlation, but it is less powerful (0.2 to 0.4). There were few significant correlations between laboratory/medical values and employment measures, and those are shown in Table 7. This may indicate that no single clinical measurement is predictive of employment status or ability to work and indicates the need to simultaneously use several variables to predict employment. Logistic regression allows for such simultaneous consideration, as is discussed later in this document.
In addition to those shown in the tables below, more correlational trends are provided in Appendix B.
The low correlation of hematocrit and hemoglobin is an interesting finding, as these are related measurements of red blood cell count. When hematocrit is graphed against hemoglobin on an x-y axis, the following pattern is seen:
Figure 2. Hematocrit values (percent) versus hemoglobin values (g/dL)
It appears that there is a subgroup of patients (n = 63) for whom hemoglobin is high while hematocrit was low-an unusual and difficult-to-explain finding. This may be due to hemolysis during dialysis (resulting in an artificially low red blood cell, or hematocrit, count). Dialysis facilities were not instructed as to when, in relation to dialysis session, the laboratory measurements were to have been taken. These findings would suggest that for some patients, these readings were taken after dialysis. Alternatively, it could suggest that these were patients in crisis with extremely low hemoglobin, for whom transfusions were given during which hemolysis occurred. However, analysis suggests that these patients were no more likely to have been receiving transfusions than patients with normal blood count readings. It is also possible that this is a miscoding error for some patients in whom hematocrit was recorded as hemoglobin, and vice versa. This, however, is speculation. As a result, these findings are currently unexplainable.
Conclusions about construct validity
Construct validity appears to be adequate in this database with a few exceptions, in particular, that of an apparent coding error with hemoglobin and/or hematocrit. The knowledge of such a discrepancy gives us the option either to discard these apparently invalid data in our final analysis if we feel that the results would be unreliable, or to recode them properly if the source of the error can be determined.
Data Reliability
Analysis protocol
While the above results provide information about whether this database can be generalized to the entire dialysis population and whether individual datapoints measure what they are intended to measure, they do not indicate whether the data set is reliable enough for the particular statistical methods we intended to use. We have noted that for many patients, certain segments of data, especially quality-of-life and employment data, are missing. Large amounts of missing data can cause difficulties in conducting statistical analysis intended to identify variables associated with an inability to work.
The general goal of this reliability analysis was to compare analyses on one randomly selected half of the DMMS Wave 2 database to the results of analyses on the other half. Failure to obtain equivalent results from analyses of both halves of the database would suggest that the results of this analysis are unreliable. The protocol for this analysis was as follows:
- 1.
Randomly assign each patient in the DMMS Wave 2 database to one of two groups.
- 2.
From each resulting half of the database, select only those patients who, according to their medical records, were currently less than age 65 and who were working prior to diagnosis of ESRD. The, latter selection was made using the question "Occupation level before ESRD" of the DMMS Wave 2 Medical Questionnaire. Only patients whose primary occupations were listed as professional, clerical, tradesperson, or manual laborer were included. Patients who were not employed or whose primary occupations were student, other, or homemaker were excluded.
- 3.
Compute basic statistics (mean, median, minimum, and maximum) for the individual variables for each half of the database from the following sections of the medical and patient questionnaires: Patient and Facility Identification, Patient History within 10 Years Prior to Study Start Date, Information at Study Start Date, Laboratory Data, Patient Questionnaire, and Medical Care before Regular Dialysis. Items of the Patient Questionnaire were not examined individually, but as scored subscales as constructed by the developers of the KDQOL TM (see Appendix C).
- 4.
Exclude any variable from further analysis if data were available for less than 50 percent of the patients on this variable. This exclusion is required for technical reasons. Specifically, a logistic regression analysis requires that data from any given patient can be included only if all data from that patient are present; if there are any missing data from that patient, all of that patient's data must be excluded. Consequently, including an item for which there are data from only a few patients causes the entire regression analysis to be based on only a very few patients.
- 5.
Incorporate all questions for which more than 50 percent of the data were present into a logistic regression equation, as performed by SPSS 9.0. Conduct a separate multiple regression for each of the questionnaire subsections described in (3) above for each half of the database. The dependent variable in this regression was death within 1 year, a variable we created to identify all patients who died within 12 months of the study start date. The purpose of using this variable was to maximize the number of patients available for analysis, so that we could determine the maximum reliability provided by any relevant outcome measure. Death is an outcome measure of interest to SSA, and more data are available on this than on the outcome measures of employment status or self-reported ability to work at one year. Predictor variables were automatically entered into each equation (using SPSS statistical software) in a forward stepwise manner. In this method of regression, the variable with the greatest correlation with the dependent variable is entered first, that with the second greatest correlation entered second, and so on. Variables are entered into the regression until the point at which addition of another variable changes the log likelihood by less than 0.01 percent. This statistical technique "selects" only those variables that are correlated with death in 1 year. Variables that are not correlated with the dependent variables are not entered into the equation. Thus, one arrives at a set of variables that "predicts" death.
In some cases, the relationship between a variable that was entered into the equation and the dependent variable was not statistically significant. Such variables were, for the purposes of this analysis, treated as if they were statistically significant and used in the regression described in Step (6), below.
Only dichotomous variables were considered as independent categorical variables. Variables with three or more categories were not considered as categorical. Not classifying these latter variables as categorical has no effect on the statistical calculations.
This step results in 6 multiple regression equations for each randomly selected half of the database.- 6.
For each half of the database, incorporate all variables entered into preceding equations into another forward stepwise multiple logistic regression. The result is another set of variables that "predict" death in 1 year. This set, unlike the set described in the preceding step, is derived from all of the questions described in Step (3), above.
One disadvantage of this approach is that using several regressions and using forward stepwise regression increases the probability of obtaining a chance relationship between an independent variable and the dependent variable that is statistically significant. In statistical parlance, the strategy and techniques applied here maximize the probability of a Type 1 error. It is unlikely that this is a fatal flaw. It does not seem likely that the presence of chance relationships would mask truly large relationships between any given independent variable and the dependent variable.
Another disadvantage of the forward stepwise method that we employed is that it has no theoretical basis. However, as we noted above, published information about which variables one might preferentially wish to include in any regression equation is scarce.
Finally, we stress that we did not attempt to search for nonlinearity. This would not affect the results of comparisons of random halves of the database, but does mean that any results we present are only for purposes of validation.
Results
The following table (Table 8) depicts the results of the comparison of the database halves. The important finding here is that for each random half of the database, each analysis entered different variables into the regression equation, indicating that for each random half of the database, different variables predicted death as an outcome. Such differences could result from the fact that there were a substantial number of patients from whom not all data were available (the impact of this is further discussed below).
Summary of Validity Analyses
Published analyses of the entire USRDS database have demonstrated its reliability (completeness and accuracy) (Completeness and reliability...," 1992; "Improvements in data...," 1992). Our validity analysis of the DMMS Wave 2 database suggest that its external validity (generalizability of the database to the whole dialysis population) may have limitations for generalizing to the ESRD population as a whole. In particular, in this database, it appears that patients with diabetes are overrepresented in the DMMS Wave 2 data, while patients with glomerulonephritis are underrepresented.
Construct validity within this database appears to be acceptable for the purposes of this report. Variables that were expected to correlate strongly with one another did, while those for which no relationship was expected did not, with the exception of a lower than expected correlation between hematocrit and hemoglobin. Followup analyses were conducted to confirm the findings of the correlation trends. These analyses confirmed and strengthened the significant findings revealed in the correlation, and further confirmed the construct validity of this database.
Analyses were conducted to compare the subset of patients to be included in ECRI's proposed final analyses with those patients excluded from this final analysis. Expected differences were found between the included and excluded groups, such that the patients with followup data were younger, healthier, and more likely to be undergoing peritoneal dialysis. These differences were considered by us to be valid, and while results may not be generalizable to the whole ESRD population, it is not intended that they should be. Results will be generalizable only to that population of patients who have worked at some point in the recent past, and thus are applying for SSA disability under Title 2.
The DMMS Wave 2 study is the largest prospective study yet conducted on the topic of quality-of-life among dialysis patients. However, it was not designed to study disability. It is also a study of incident (new) cases, which should maximize the number of patients in the database who are employed at the start of the study. This number, however, was low. Only 1,221 out of 4,026 patients were employed full or part time within 2 years of the study start date, and 670 patients were employed at the study start date. The inclusion criteria for our proposed final analysis of previous employment status, followup information availability, and age under 65 further reduced the number of patients who contributed relevant data to 546. Although still a large number of patients, it is important to note that there are more than 300 variables in this database. A very general criterion for conducting multivariate statistics such as factor analysis is that there be 10 patients for every variable being examined. This final data subset does not meet this criterion, and therefore any multivariate statistical analyses may not have adequate statistical power.
Analysis Reliability
Because of the reduced number of patients who contributed relevant data to our analysis, we undertook an analysis to assess the reliability of any statistical analysis we conducted. One way of accomplishing this is to compare results of randomly selective halves of a database. The results we obtained using such an analysis suggest that we would be unable to obtain reliable results from our planned analyses of these data. This lack of reliability occurred for each of the six questionnaire subsections.
We conducted this reliability analysis after discarding all variables for which fewer than half of the patients contributed data. It is possible that we could have discarded additional poorly represented variables. This would likely increase the number of patients upon which results would be based. However, there is no evidence-based way to ascertain the importance of the discarded variables. Therefore, the generalizability of the results of such an analysis would be suspect.
On the basis of the results of the validity and reliability analyses, we conclude that the proposed statistical analyses cannot be performed using the data currently available. It appears that the patient pool would be too small for the large number of variables, and would have too many missing data points. We therefore can only offer the interested reader tables of descriptive statistics of these patients (see below) and offer suggestions for future research that would enable us to perform the analysis of interest.
Sample Analysis
Although the data above did not provide the reliability required for SSA's purposes, it was valuable to proceed with an analysis that illustrate the statistical methods that might be used if more data were available. In Appendix E, we have outlined the processes of replacing missing data, recoding of data necessary for regression analyses, two sample regression analyses with employment-related outcome measures, and receiver operating characteristic (ROC) curves that illustrate the diagnostic accuracy of the results of the regression analyses.
These analyses serve only as an example and not a definitive central analysis for this project. They serve to guide future research and recommend statistical methods for most accurate identification of predictor variables. The results serve to illustrate that working status is probably not an accurate surrogate measure for ability to work.
Descriptive Statistics
Although we did not perform any definitive statistical analyses on these data, the DMMS Wave 2 database still offers some relevant epidemiological data about the employment status of patients with ESRD. However, some caveats should accompany the following presentation of summary statistics.
Limitations of Univariate Analyses
Throughout this document, we have discussed the limitations inherent in univariate analyses, such that they do not account for multicollinearity among the independent/predictor variables. The following example relevant to this report illustrates the dangers in interpreting univariate statistics at face value.
One may notice that, among patients under age 65, those with a primary diagnosis of glomerulonephritis are significantly more likely to be working full time than are patients with diabetes. According to our statistics, 29.6 percent of patients with glomerulonephritis are working full time at the start of the study, versus 13.1 percent of diabetes patients. A cursory look at this statistic might lead one to prematurely conclude that patients with glomerulonephritis are not as sick as patients with diabetes. This conclusion is likely to be erroneous.
There are many ways in which glomerulonephritis patients differ from diabetes patients that may be affecting the rate of employment. One major difference is the average age of these two patient groups. Patients with glomerulonephritis have a mean age of 44 years, versus 51 years for patients with diabetes. This is a statistically significant difference (p <0.001) on a variable that can have a substantial impact on the likelihood of a person working. It may be the case that the diabetes group includes more patients over the age of 60 who are approaching retirement. Other variables that significantly differentiate these two patient groups include the following:
- Body-mass index (BMI) (glomerulonephritis higher)
- Hematocrit levels (diabetes higher)
- Presence of coronary artery disease (CAD) (more likely in diabetes)
- Presence of cerebrovascular disease (more likely in diabetes)
Of these variables, BMI, presence of CAD and cerebrovascular disease also significantly differentiate whether a person works full time.
All of these intertwined variables make it unclear whether patients with diabetes are less likely to work because of the diabetes itself, more severe symptomatology, coexisting diseases, older age, or a combination of severity of disease and age.
Another difficulty with using univariate tests to address questions about predictors of disability can arise when one analyzes the data as if persons with and without some characteristic were in two separate groups and then attempts to determine whether patients in these "groups" differ in their inability to work (however measured). This difficulty can be illustrated with the DMMS Wave 2 data. For this illustration, we used data from patients who are under 65 years of age, and who were employed or a student sometime during the 2 years prior to the start of the DMMS study. We then divided these patients into a "group" of patients with diabetes and another group of patients without diabetes. We determined whether one group was more likely to continue to work, as indexed by patients' answers on the followup questionnaire. These data are arrayed in the 2 x 2 table shown below:
Limitations of Univariate Analyses
Discontinue working | Continue to work | |
---|---|---|
Patients with diabetes | 102 | 43 |
Patients who do not have diabetes | 158 | 115 |
Subjecting these data to a statistical analysis (here we use the odds ratio, but other statistics could also be used) yields an odds ratio of 1.75 with a 95 percent confidence interval of 1.12 to 2.65. Because this interval does not overlap 1.0, this odds ratio is statistically significant. As a result, it is tempting to conclude that patients with diabetes discontinue working at greater rates than those without diabetes and, therefore, that one can use presence of diabetes as a criterion for determining disability. This conclusion is, however, a poor one.
The flaw lies in the fact that the results of this group-based statistical test do not convey any information about how often this hypothetical criterion will lead one to a "correct" disability determination, or how often it will lead one to an "incorrect" determination. This is because this kind of comparison of two groups does not provide information about the diagnostic performance of this "test" for disability. To obtain information about performance, one needs to look at the results as if this were a diagnostic test. Thus, from the above table, one can compute that the sensitivity of diabetes for predicting disability is 39.23 percent, the specificity of this test is 72.78 percent, its positive predictive value is 70.34 percent, and its negative predictive value is 42.12 percent. Thus, the presence of diabetes is a fair indicator that a patient with ESRD will not continue to work (moderate positive predictive value), but the absence of diabetes is not a good indicator that a patient will continue working (low negative predictive value). In practical terms, using only the presence or absence of diabetes as a criterion for disability would appropriately provide disability benefits to patients with diabetes, but would also tend to inappropriately deny benefits to patients who do not have diabetes and who are also unable to work. To account for these latter patients, additional criteria (used in conjunction with diabetes) are needed.
In choosing these additional criteria, one does not want to choose any criterion that is highly correlated with the presence of diabetes. For example, imagine that there is a characteristic common to all persons with diabetes, and that this characteristic is not found in patients without diabetes (such as high glucose levels in the blood). Were we to use this characteristic as our second criterion for determining disability, the performance of the test would not change (i.e., the sensitivity, specificity, etc., of the test would be the same as if we used only diabetes to predict disability). The hypothetical second criterion does not provide any information beyond that provided by a diabetes diagnosis. Therefore, the second criterion that one chooses must have a low correlation with diabetes, but a high predictive value for inability to continue working.
Choosing a second criterion requires one to simultaneously consider its relationship with diabetes and with inability to continue working. Further, there is no guarantee that adding this second criterion will be sufficient to provide enough predictive value. One might need to use a "test" for disability that consists of three or more criteria. This means that choosing the third criterion involves simultaneously considering its relationship to the first two criteria and to the inability to continue working. These simultaneous considerations are best accomplished by using multivariate statistics.
The need for multivariate statistics is accentuated because choosing multiple criteria for a "test" for disability rapidly becomes very complex. Highlighting this complexity is that there is no characteristic of ESRD patients that is obviously correlated with an inability to continue working. (This is implied by the data shown in Appendix D.) These data also imply that one must examine a minimum of dozens of variables to arrive at a "test" for disability that appropriately awards benefits and does not inappropriately deny them.
Because of these complexities, we performed no univariate inferential statistical comparisons of the data we present below; we wish to minimize the possibility that a reader may come to erroneous conclusions about predictors of ability to work. Thus, we present the summary statistics below only for their potential use in future research.
Summary Statistics
Of particular interest is the small number of patients who were employed full time at any point during the study. Table 9 shows a summary of the number of patients working full time, or reporting that they were able to work full time, during the first interview or the followup interview. Patients over age 65 and under age 18 were excluded, as well as those who had their first maintenance dialysis before 1995, to focus on incident patients. There were 2,260 patients included in our analyses. We then computed descriptive statistics on these data using SPSS 9.0 (SPSS, Inc., Chicago, IL, USA). These computations involved use of the Crosstabs module for categorical variables (which counts the number of instances of an answer in each category), and the Case Summaries feature for quantitative variables.
The third and fourth rows of data in Table 9 show that different results are obtained depending on whether the medical questionnaire or the patient questionnaire was used. The resulting statistics are somewhat different (19.9 percent v.16.9 percent); and it is unclear whether this is because the medical records (obtained from the dialysis center personnel) are inaccurate (or possibly out of date), because of self-selection bias in the patient questionnaire, or because there was a time lag of about 60 days between the collection of medical records data and patient questionnaire data.
Statistics in Table 9 indicate that the number of dialysis patients employed full time dropped dramatically over a 2- to 3-year period, from predialysis to 1 year postdialysis. However, because information was available for such a small proportion of patients at 1-year followup, these statistics cannot be considered reliable. We hypothesized that some of the individuals who could work full time might instead be working part time in order to continue receiving disability benefits, but the statistics shown in Table 9 do not support any such widespread practice.
It is also interesting to track the number of patients who were working before ESRD who continue to work full time once on dialysis, as shown in Table 10. There is an obvious sharp dropoff in the number of patients continuing to work.
The mean age of these patients was 45.7 v. 49.8 for those not maintaining employment. Men were slightly more likely to maintain their jobs than women, as shown in Table 11. The occupations of those continuing employment are shown in Table 12, indicating that white-collar workers are substantially more likely to continue working than blue-collar workers. However, because these are descriptive statistics, such findings are deceptive and may be influenced by factors other than type of employment.
A comparison of education levels of those who continued to work with those who did not is shown in Table 13. The finding that those who are college educated continue to work is consistent with the finding that white-collar workers are more likely to continue working. Again, it is unclear what factors are causing these group differences in working status.
Tables 14 and 15 show the employment status of patients who report that they are able to work full time at the start of the study and 1 year later. At both points in time, a substantial majority of patients self-reportedly able to work are working (77 to 80 percent). Of those who are "able to work full time" but are not in fact doing so, most are either working part time or listed themselves as "disabled," probably indicating not that they are unable to work, but that they are receiving disability benefits. This particular category accounts for about 5 percent of patients reportedly able to work full time, suggesting that only a small percentage of individuals who indicate that they are able to work instead use the system to receive benefits.
Table 16 illustrates some of the coding errors present with regard to employment status. It depicts a comparison of the medical record's information on patient employment status to the patient's self-reported employment status, both at the start of the study. It can be seen that these variables, which should correlate almost perfectly, do not, and that nonsensical patterns of employment are reported for a small number of patients in this database. However, for some patients, these differences may be real, as medical records and self-report were separated by about a 2-month period.
While Tables 9 to 16 provide interesting information about the working status of patients with ESRD, they do not provide information about which patients make up this small subset who continue to work full time. Appendix D provides additional information about these patients. These data are also limited because of their descriptive nature. Thus, it cannot be determined what causes the group differences seen in these tables.
Summary
As anticipated, the DMMS Wave 2 database indicates that only a very small number of patients continue to work full time once on dialysis. Out of almost 2,000 patients for whom data were available, only 114 worked full time continuously throughout the length of this study (more than 1 year). Almost every patient who reported being able to work full time was working full time; however, the significance of this finding is unclear.
There are many different variables in the DMMS Wave 2 database that differentiate those patients who continue to work full time from those who do not, but it is unclear how these predictor variables interact with one another and which of them accounts for the most variance (i.e., which among them is the "strongest" predictor).
Footnotes
- 1
Adapted from the 1998 USRDS Researcher's Guide (United States Renal Data System, 1998).
- 2
Because of the large number of tests (>300), even the p-value of 0.001 is anti-conservative.
- Phase 2: Analysis of USRDS Data - Determinants of Disability in Patients With Ch...Phase 2: Analysis of USRDS Data - Determinants of Disability in Patients With Chronic Renal Failure
Your browsing activity is empty.
Activity recording is turned off.
See more...