Methods

Jeff Andrews; Amanda Yunker; W Stuart Reynolds; Frances E Likis; Nila A Sathe; Rebecca N Jerome

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Andrews J, Yunker A, Reynolds WS, et al. Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Jan. (Comparative Effectiveness Reviews, No. 41.)

This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness

Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness [Internet].

Show details

Contents

< Prev Next >

Methods

Topic Development and Refinement

The topic for this report was nominated in a public process. We drafted the initial Key Questions (KQ) and analytic framework and refined them with input from key informants. After review from the Agency for Healthcare Research and Quality (AHRQ), the questions and framework were posted to a public Web site. The public was invited to comment on these questions.

After reviewing the public commentary, we drafted final KQs and submitted them to AHRQ for review. We identified technical experts on the topic of chronic pelvic pain in women in the fields of gynecology and women's health to provide assistance during the project. The Technical Expert Panel (TEP) contributed to the AHRQ's broader goals of (1) creating and maintaining science partnerships as well as public-private partnerships and (2) meeting the needs of an array of potential customers and users of its products. Thus, the TEP was both an additional resource and a sounding board during the project. The TEP included 5 members serving as technical or clinical experts. To ensure robust, scientifically relevant work, we called on the TEP to provide reactions to work in progress. TEP members participated in conference calls and discussions through e-mail to:

Refine the analytic framework and KQs at the beginning of the project;
Discuss the preliminary assessment of the literature, including inclusion/exclusion criteria;
Provide input on assessing the quality of the literature.

Analytic Framework

We developed the analytic framework (Figure 1) based on clinical expertise and refined it with input from our key informants and TEP members. The framework summarizes the process by which women with noncyclic chronic pelvic pain (CPP) make and modify treatment choices. Treatment choices include surgical or nonsurgical approaches and may lead to outcomes including changes in pain status (e.g., resolution of pain, continuing pain, continued need for pain medication), patient satisfaction, quality of life, or harms/adverse effects.

Figure 1

Analytic framework. Abbreviations: BSO = bilateral salpingo-oopherectomy; CAM = complementary and alternative medicine; KQ = key question

Treatment choices may also not provide pain relief or improvements in functional status or quality of life, and women with CPP may undergo additional interventions after a treatment approach has failed. In addition, outcomes may vary by diagnosis in those patients receiving a confirmed diagnosis for the etiology of their CPP.

Literature Search Strategy

Databases

We employed search strategies provided in Appendix A to retrieve research on the treatment of CPP in women. Our primary literature search employed 4 databases: MEDLINE® via the PubMed interface, PsycINFO (psychology and psychiatry literature), EMBASE Drugs and Pharmacology, and the Cumulative Index of Nursing and Allied Health Literature (CINAHL) database. Our search strategies used a combination of subject heading terms appropriate for each database and key words relevant to CPP (e.g., chronic pelvic pain, pelvic pain). We limited searches to the English language and literature published since 1990, when laparoscopic techniques became more widely used.

We also manually searched the reference lists of included studies and of recent narrative and systematic reviews and meta-analyses addressing CPP. We also invited TEP members to provide additional citations.

Grey Literature

The AHRQ Scientific Resource Center also searched for information on the following specific medications used to treat CPP. We requested grey literature information on these drugs and devices as they are either commonly used and have a number of known side effects or are beginning to be used in the CPP population and have not yet been well-reported in the published literature (e.g., aromatase inhibitors):

Medroxyprogesterone
Gonadotropin releasing hormone (GnRH) agonists (with or without add-back estrogen therapy including buserelin, goserelin, leuprolide, and nafarelin)
Selective progesterone receptor modulators (SERMs) (mifepristone and ulipristal acetate);
Selective estrogen receptor modulators (tibolone, ranitidine, clomiphene, and tamoxifen);
Aromatase inhibitors (anastrozole and letrozole); and
Transcutaneous electrical nerve stimulation (TENS).

The Scientific Resource Center sought grey literature in resources including the websites of the US Food and Drug Administration and Health Canada and clinical trials registries such as ClinicalTrials.gov. We also gave manufacturers of these medications and devices an opportunity to provide additional information.

Ongoing Research

To examine the direction of ongoing and recently completed research, we also searched the ClinialTrials.gov and European Union Clinical Trials Register for CPP intervention studies.

Search Terms

Controlled vocabulary terms served as the foundation of our literature search in each database, complemented by additional keyword phrases. We also employed indexing terms when possible within each of the databases to exclude undesired publication types (e.g., reviews, case reports, news), items from non-peer-reviewed journals, and items published in languages other than English.

Our literature searches were executed between September 2010 and May 2011. Appendix A provides our search terms and the yield from each database. We imported all citations into an electronic database created using EndNote. Our search for ongoing research was conducted in July 2011 using the key words “chronic pelvic pain” in each trial registry and limiting to studies in process.

Process for Study Selection

For this review, the relevant population for all KQ was adult women (≥ age 18) with noncyclic or mixed cyclic/noncyclic CPP, which we defined as pain that has persisted for more than 3 months, is localized to the anatomic pelvis (lower abdomen below the umbilicus), and is of sufficient severity that it causes the patient to become functionally disabled or to seek medical care. Pain may sometimes occur in a cyclic pattern; however, a noncyclic component is always present. CPP as described throughout this review refers to noncyclic or mixed cyclic/noncyclic pelvic pain unless otherwise noted.

Inclusion and Exclusion Criteria

We developed criteria for inclusion and exclusion based on the patient populations, interventions, outcome measures, and types of evidence specified in the KQs and in consultation with the TEP. Table 1 summarizes criteria.

Table 1

Inclusion and exclusion criteria.

Study Population

Studies needed to provide adequate information to ensure that participants fell within the target age range and pain criteria. For studies with populations including women under age 18, we retained the study if we could infer that at least 80 percent of the study participants were over the age of 18. Similarly, some studies included women with cyclic chronic pelvic pain and women with noncyclic chronic pelvic pain. We retained studies with participants with both cyclic and noncyclic/mixed chronic pelvic pain if at least 80 percent of the population was composed of women with noncyclic/mixed chronic pelvic pain.

We also applied this criterion to studies including both women and men, retaining studies that included men if the study population was composed of at least 80 percent women with CPP. We attempted to extract data only on the population of interest (adult women with noncyclic/mixed CPP) where possible. We chose the figure of 80 percent as we considered studies in which a majority of participants were within our target age range (18 and older), or had noncyclic CPP, or included a low proportion of men as providing data applicable to the population of adult women with noncyclic CPP.

The inclusion in the study population of fewer than 20 percent of participants with characteristics outside our inclusion criteria of the review may introduce bias in the results, but not to such a degree that the results would not be useful. As appropriate, we note in our discussion of studies that results apply to a heterogeneous age range or pain group or include data from some male participants.

Sample Size

We excluded studies that included fewer than 50 total participants for studies addressing KQs 2 through 5. We considered the following factors in choosing this study size:

Prevalence of noncyclic CPP (Prevalence varies by population; to maximize acceptable study size, we set prevalence at 100 percent.)
Loss to followup (Loss varies by study; to maximize acceptable sample size, we assumed 0 percent.)
Placebo effect (Placebo effects are known to be from 30 to 50 percent in chronic pain studies.⁶³^-⁶⁷)
Type I error, alpha level, or p value (We set at a standard of 5 percent.)
Desired statistical power level (We set at a standard of 0.80.)
Statistic (We used the two-tailed z-test and the t-test for sample size.)
Clinical effect size anticipated or clinically relevant reduction in pain (We considered 30 percent as a minimum. We selected a target of 30 percent based on published recommendations that propose that reductions in chronic pain intensity of at least 30 percent reflect moderate clinically important differences.⁶⁸)
Sample size
- Considering a null hypothesis of effect size of 30 percent, a study would need 176 subjects per group; a total sample size of 352 would be the smallest acceptable.
- Considering a null hypothesis of effect size of 50 percent, a study would need 64 subjects per study group; a total sample size of 128 would be the smallest acceptable.

Therefore, a single study, with 100 percent of participants with noncyclic CPP, with no loss to follow-up, with a pain reduction in the placebo group of 30 percent, and a pain reduction of at least 60 percent in the intervention group would require a sample size of 350 patients.

Rather than choose a sample size of 350, we set a conservative lower limit for sample size at 50, to account for potential meta-analyses aggregating smaller trials at sufficient power to produce a confidence interval that excludes 1. Studies in the chronic pelvic pain realm rarely have identical patient populations or identical interventions, or identical outcome measures; hence the heterogeneity across studies would be problematic, and it would be important to have studies of sufficient size.

To examine the effects of our sample size requirement of at least 50 participants with CPP, we re-reviewed the randomized controlled trials that were excluded from the review and had fewer than 50 participants with CPP. Most studies were also excluded on another basis as well. Of those studies with an N of less than 50 that otherwise would have met the inclusion criteria at the full-text phase, none matched another in population, comparators, or interventions. None of these small studies used the same intervention; there was significant heterogeneity in the population and in the outcomes reported. Therefore, it would not have been possible to combine any two or more of these small studies and perform a meta-analysis as part of the systematic review. Moreover, these small studies, all addressing different interventions, would not have provided substantive data for the review.

We did not address harms of surgical interventions in this review as we felt that the studies meeting our inclusion criteria would necessarily provide desultory evidence of harms of surgical interventions. Most of the surgical interventions used for CPP are deployed in a broader context for other indications; a systematic review of the harms of the procedures would require a different and much larger search than the current review assignment and protocol, and KQs dictated. Reporting only the harms represented in the select studies meeting our criteria for addressing surgical intervention for CPP would present only a partial picture of potential harms of surgery.

Study Design

We accepted study designs including controlled trials and prospective cohort studies addressing the effectiveness of surgical or nonsurgical approaches (KQ2, KQ4), outcomes if an etiology for CPP is identified (KQ3), or effectiveness of one intervention over another to treat persistent CPP (KQ5). We considered prospective cohort studies to be comparative studies, in which separate groups of participants received different interventions. Prospective cohort study designs could use contemporaneous controls or historic controls. We also accepted prospective or retrospective case series or cross- sectional studies with at least 100 participants with CPP and addressing the prevalence of comorbidities of interest (KQ1) or harms of nonsurgical therapies (KQ4).

We selected the comorbidities of interest based upon reporting in the CPP literature. We extracted data regarding a study's use of validated tools to diagnose comorbidities or the provision of an operational definition for a comorbid condition. As described below, we factored the use of a validated tool into our quality assessment of studies providing data on the selected comorbidities.

Language

To gauge the relevance of research published in other languages, we located non-English literature for the time period of interest using our MEDLINE search strategy and identified 168 citations. Twenty-nine of these citations appeared potentially relevant on a title scan. We reviewed the abstracts of 28 of these, and none met our review criteria. We believed that the one study for which we could not locate an abstract would not substantially alter the findings of the review and excluded non-English studies.

In addition, we excluded studies that:

addressed pelvic pain related to cancer or pregnancy as the etiology of and treatment for these entities is significantly different from CPP related to other or unknown causes;
did not report information pertinent to the KQs;
were published prior to the year 1990 and the widespread use of laparoscopic techniques and introduction of medications such as serotonin reuptake inhibitors used to treat CPP; and
were not original research.

Screening of Studies

Once we identified articles through the electronic database searches, review articles, and bibliographies, we examined abstracts of articles to determine whether studies met our criteria. Two reviewers separately evaluated each abstract for inclusion or exclusion, using an Abstract Review Form (Appendix D). If one reviewer concluded that the article could be eligible for the review based on the abstract, we retained it for full text assessment.

Two reviewers independently assessed the full text of each included study using a standardized form (Appendix D) that included questions stemming from our inclusion/exclusion criteria. Disagreements between reviewers were resolved by a third-party adjudicator. The group of abstract and full text reviewers included expert clinicians (JA, SR, AY, FL) and health services researchers (RJ, NS).

Data Extraction and Data Management

The staff members and clinical experts who conducted this review jointly developed the evidence tables, which were used to extract data from the studies. We designed the tables to provide sufficient information to enable readers to understand the studies, including issues of study design, descriptions of the study populations (for applicability), description of the intervention, and baseline and outcome data on constructs of interest. Our outcomes of interest included:

Pain status (reduction in pain, pain recurrence, subsequent intervention for unresolved or worsening pain);
Functional status (activities of daily living, sexual functioning);
Quality of life;
Patient satisfaction with pain management; and
Harms or adverse effects of nonsurgical interventions.

The team abstracted several articles into evidence tables and then discussed the utility of the table design as a group. We repeated this process through several iterations until we decided that the tables included the appropriate categories for gathering the information contained in the articles. All team members shared the task of initially entering information into the evidence tables. Another member of the team also reviewed the articles and edited all initial table entries for accuracy, completeness, and consistency. The full research team met regularly during the article extraction period and discussed global issues related to the data extraction process.

Where available, we also captured data on potential risk factors related to CPP or conditions thought to occur commonly with CPP. These data included:

History of sexual or physical abuse;
History of pelvic surgery;
Pregnancy-related risk factors (e.g., history of Caesarean births, vaginal births, operative vaginal birth, genital tract trauma, pregnancy termination); and
History of comorbidities of interest (anxiety, depression, dysmenorrhea, fibromyalgia, headache, irritable bowel syndrome (IBS), interstitial cystitis/painful bladder syndrome (IC/PBS), low back pain, and sexual dysfunction).

This list of comorbidities represents conditions thought to occur frequently with CPP and was determined in consultation with our TEP.

The final evidence tables are presented in their entirety in Appendix C. Studies are presented in the evidence tables alphabetically by the last name of the first author within each year. When possible to identify, analyses resulting from the same study were grouped into a single evidence table.

Individual Study Quality Assessment

We used a components approach to assessing the quality of individual studies, following methods outlined in the EPC's Methods Guide for Effectiveness and Comparative Effectiveness Reviews.⁶⁹ Decision rules regarding application of the tools were developed a priori by the research team. We developed separate quality assessment approaches for randomized controlled trials (RCTs), observational studies, and studies addressing the prevalence of comorbidities. Two reviewers independently assessed each study, with disagreements between assessors resolved via a third adjudicator.

We assessed each domain described below individually and integrated them for an overall quality level as described in the Determining Quality Levels section. We assessed studies as having “met” or “not met” a criterion; where relevant, criteria could also be judged as not applicable (NA) to a study. For the final integration of the assessment of quality, 3 levels were possible: good, fair, and poor.

We describe the individual quality components below and report individual quality assessments for each study in Appendix E.

RCTs

We assessed quality factors recommended in the Evidence Based Practice Centers' (EPCs) Methods Guide for Effectiveness and Comparative Effectiveness Reviews and in the Cochrane Handbook.

Sequence generation. We assessed study randomization by considering the following questions:

Was the assignment randomized?
Was the method used to generate the sequence of randomization described and was it appropriate?

We considered the following elements in determining the appropriateness of a study's randomization methods: Were random techniques like computer-generated, sequentially numbered opaque envelope used? Were technically nonrandom techniques, like alternate days of the week used?

Scoring. Studies providing a description of a truly random technique were assessed as “met” for this element.

Blinding. We considered four elements to assess blinding:

Was the allocation to study groups (and interventions) adequately concealed from patients/ participants?
Was the allocation to study groups (and interventions) adequately concealed from investigators?
Was the allocation to study groups (and interventions) adequately concealed from clinical providers/caregivers?
Was the allocation to study groups (and interventions) adequately concealed from outcome assessors?

Scoring. We defined adequate concealment as reasonable attempts (e.g., non-investigators involved in allocation, appropriate sham treatments used, etc.) by investigators to conceal intervention allocation groups. We assessed these criteria as met if the study provided such evidence of blinding.

Incomplete outcome data addressed. We considered four elements to assess the completeness of outcomes data reporting:

Was complete information about participant flow provided, such as CONSORT diagram or equivalent information (numbers at random assignment; numbers receiving intended intervention; numbers completing protocol; and numbers analyzed for primary outcome, drop-out, lost to followup)?
Was an intention-to-treat analysis (as assigned conducted and reported) performed appropriately?
Were incomplete/missing outcome data adequately reported?
Were missing outcome data managed by an accepted method?

Scoring. We considered acceptable methods of missing data management as either last observation carried forward; mean/median imputation; worst outcome imputation; or longitudinal regression imputation.

Selective outcome reporting. We assessed this domain using a single question: Was the primary outcome planned and described in the Methods section?

Scoring. Studies describing an a priori primary outcome determination were assessed as meeting this criterion.

Other bias. We assessed whether the study was largely free of other bias by considering the following elements: Was the trial stopped early for benefit? Was there an extreme baseline imbalance? Was there a substantive conflict of interest which posed a substantive, important threat to validity of the results?

Scoring. We scored studies as meeting this criterion if there was no evidence of such biases.

Sample size and power. We assessed this domain be determining whether an a priori sample size calculation was provided for the primary outcome.

Scoring. We scored studies as meeting this criterion if evidence of a sample size calculation was provided.

Statistical analysis. We considered the suitability of a study's analysis using the following questions:

Was statistical analysis appropriate for the study design performed?
Were the statistical results reliable?

Scoring. We scored studies as having “met” these criteria if our judgment was that the statistical analysis and results were appropriate and reliable for the stated study design and outcome. A glaring inconsistency or statistical error would result in a score of “not met.”

Dropout proportion. We evaluated studies for this domain using the question: What proportion of enrolled participants assigned to an intervention declined to continue the assigned intervention?

Scoring. We considered studies with a dropout rate of less than or equal to 10 percent as having “met” this criteria. We assessed studies with a greater than 10 percent or unreported rate as having “not met” the criterion.

Follow-up. We assessed the adequacy of follow-up by determining what proportion of enrolled population was present or accessible at the time of the primary followup.

Scoring. We considered studies with a rate of less than or equal to 20 percent loss as having “met” this criterion. Studies with greater than 20 percent loss or not reporting the percentage were scored as having “not met” this criterion.

Observational Studies

For observational studies we considered these domains: (1) the selection of the study groups; (2) the comparability of the study groups; and (3) the ascertainment and measurement of either the exposure/intervention or outcome of interest for case-control or cohort studies respectively; (4) avoidance of detection bias; and (5) methods for limiting bias and confounding.

For example, for a cohort study, the fundamental criteria included: representativeness of cohort, selection of nonexposed cohort, ascertainment of exposure, outcome of interest, comparability of cohorts, assessment of outcome, adequate duration of follow-up, and adequate follow-up of cohort. Other sources of bias would include baseline imbalances, source of funding, early stopping for benefit, and appropriateness of crossover design.

Selection of participants in study groups. We considered three elements to evaluate a study's risk of bias in the selection of study group participants:

Were the characteristics of the participants/patients included in the study groups clearly described?
Were the inclusion and/or exclusion criteria described?
Were the criteria applied equally to all groups?

Scoring. We scored studies as having “met” this criterion if related data were provided.

Comparability of the study groups. We used the following questions to assess this domain:

Was there an assessment of baseline comparability, with regard to confounders (disease status, risk factors, prognostic factors, case-mix adjustment) for the most important factors (attempts to balance the groups by design), and did this demonstrate comparability?
Were concurrent controls used?

Scoring. We scored studies as having met these criteria if related data were provided.

Intervention description. We used the following questions to assess this domain:

Was there a clear definition of the intervention?
Was the measurement method of the intervention standard, valid, and reliable?

We considered the following elements in making a determination about these questions: Did all participants receive the same intervention? Were the interventions performed by the same person? Was the intervention measured equally in all study groups?

Scoring. Studies could be assessed as having “met” or “not met” these criteria. For question 2, we scored studies of pharmacologic interventions as NA.

Outcomes. We evaluated a study's measurement of outcomes using the questions:

Was the method of outcome assessment standard, valid, and reliable?
Was the follow-up duration long enough (≥12 weeks) for the outcomes to occur?

We considered whether references for measurement instruments were provided and whether authors indicated testing of an instrument in making determinations about these questions.