U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Smith GCS, Moraitis AA, Wastlund D, et al. Universal late pregnancy ultrasound screening to predict adverse outcomes in nulliparous women: a systematic review and cost-effectiveness analysis. Southampton (UK): NIHR Journals Library; 2021 Feb. (Health Technology Assessment, No. 25.15.)

Cover of Universal late pregnancy ultrasound screening to predict adverse outcomes in nulliparous women: a systematic review and cost-effectiveness analysis

Universal late pregnancy ultrasound screening to predict adverse outcomes in nulliparous women: a systematic review and cost-effectiveness analysis.

Show details

Chapter 13Designing a randomised controlled trial of screening and intervention

Implications of the health economic analysis

The economic analysis demonstrated that although, on average, the most cost-effective approach was to screen all nulliparous pregnant women with a presentation-only scan, this had only a 44% probability of being true, and a scan that included fetal biometry had a ≈ 39% chance of being the most cost-effective. Moreover, if the time scale was increased, it became likely that such a scan in late pregnancy would be the most cost-effective approach. These observations indicate that implementing such a scan could be considered. However, one of the major obstacles to implementing such a policy is that there is no direct evidence from a RCT that this screening and intervention is clinically effective. The Cochrane review of universal late-pregnancy ultrasound failed to show any benefit of this to the mother or infant.21 However, as discussed in the introduction, this review has a number of methodological issues and it is more accurate to state that it does not provide a definite answer the question of whether or not universal late-pregnancy ultrasound reduces the risk of perinatal death.

Interestingly, the VOI analysis highlighted reducing uncertainty about the costs of IOL. Given the above, this may be regarded as somewhat counterintuitive. However, the parameters used in the VOI analysis in relation to the screening performance of ultrasound and the effect of intervention were known with a degree of precision that meant that reducing their uncertainty was not the most cost-effective research question. For example, the ability of ultrasound to predict SGA, the relationship between SGA birthweight and the risk of stillbirth, and the ability of IOL to reduce the risk of stillbirth are all known quite precisely and are based on high-quality data. Consequently, even though there is no direct evidence to indicate that universal late-pregnancy ultrasound would reduce the risk of stillbirth, the model estimates quite a high chance that it is the most cost-effective approach and does not highlight reducing the uncertainty in these parameters in the VOI analysis. By contrast, previous health economic analyses of IOL have generated quite wide CIs,176,194 and hence the model has identified that reducing this uncertainty is the key question.

Case for considering a randomised controlled trial of screening and intervention

In this chapter we consider the practicalities of designing a RCT of screening and intervention using fetal biometry in nulliparous women at 36 weeks’ gestation. We have done this because, even though the parameters in the modelling were reasonably certain, these parameters were calculated from a range of different study designs (i.e. we did not perform the VOI analysis based on the uncertainty of parameters calculated from a large RCT of late pregnancy screening and intervention in nulliparous women). Rather, we performed the analysis using parameters from a range of observational studies and a range of studies of interventions in women who were deemed to be high risk for other reasons. The concern in this case is external validity. The parameters may be reasonably certain in relation to the setting where they were derived but there is an unquantifiable uncertainty in relation to how well they inform our research question. The obvious way to address this would be to perform a study in the setting of interest. Such a study could be the definitive study or it could be a pilot or a proof-of-principle study. The former might be a trial of screening compared with not screening, with perinatal death as the primary outcome. The latter might exploit alternative study designs and use of proxies. Hence, there are a number of important considerations to take into account when designing a RCT of screening and intervention using universal ultrasound, and we will consider each of these in turn.

Candidate primary outcomes

In relation to the primary outcome of a RCT, we believe that the strongest case can be made for perinatal death. First, losing an infant at term is clearly a devastating outcome for a family. In the absence of a lethal anomaly, preventing death would lead to an entire life gained which, from a health-care and health economic perspective, is a gain of unique magnitude. Second, the main intervention available is earlier delivery. There is strong evidence that IOL is effective in reducing the risk of perinatal death. Over two-thirds of perinatal deaths at term are antepartum stillbirths54 (i.e. intrauterine fetal death prior to the onset of labour). Self-evidently, antepartum stillbirth cannot occur after an infant has been delivered.17 Delivery at or after 38–39 weeks’ gestation carries the same risk of intrapartum stillbirth and neonatal death as delivery at a later week of gestation.17,198 These epidemiological observations underlie the 67% reduction in the risk of perinatal death associated with IOL at term.16

Proxies

The main problem with a primary outcome of perinatal death is that the outcome is uncommon, and this will result in major issues of statistical power. Indicators of perinatal morbidity would be an alternative outcome to perinatal death. First, as the same factors might be involved in death and morbidity, the latter could be used as proxies of the former. Second, perinatal morbidity is of importance in its own right. For example, birth asphyxia is one of the major determinants of the burden of litigation in the health service as a result of devastating effects on the later health of the child, such as CP. There is evidence to support the use of a single indicator in both roles. An Apgar score of < 4 at 5 minutes was associated with a relative risk of early neonatal death of ≈ 360174 and a relative risk of CP of > 400.173 Hence, a primary outcome based on perinatal morbidity, such as an Apgar score of < 4, could be clinically important, both as a proxy of death and as a determinant of long-term outcome. Morbidity could be a more pragmatic outcome as rates of severe morbidity are much greater than the risks of death, and hence it may be easier to design a trial with morbidity as the primary outcome.

Subgroups

A further refinement to the primary outcome is to study subgroups of the given event that were actually associated with the infant being born SGA or LGA. It is self-evident that screening for SGA or LGA will primarily have an impact on outcomes related to fetal growth disorder. Many adverse perinatal outcomes, both lethal and non-lethal, are unrelated to fetal growth abnormalities. Consequently, if a screening study of fetal biometry has a primary outcome that includes infants in the full range of birthweight, most of the primary outcomes in both arms of the trial will be unrelated to fetal growth disorder, which is not preventable by screening for fetal growth disorder and intervention. This means that the potential for screening to have an impact on the rate of death is limited and extremely large sample sizes would be required. For example, around one-third of perinatal deaths at term are related to being SGA or LGA.54 The background rate of perinatal death at term is ≈ 2 per 1000. Even if a screening test was perfect (i.e. detected all cases of growth disorder), and even if the intervention was perfect (i.e. prevented all such deaths), a power calculation still indicates that > 100,000 women would have to be recruited to the trial. However, if the primary outcome was perinatal death of a SGA or LGA infant, the sample size would be ≈ 22,000 (note that this is used to illustrate the point that it is not a practical proposition, as the screening and intervention characteristics were assumed to be perfect). An analogy might be a trial of breast cancer screening. Screening reduces deaths related to breast cancer but does not reduce all-cause mortality.199 This is likely to be explained by the fact that no study could be sufficiently powered to detect an effect of screening for breast cancer on all-cause mortality because most deaths are due to other causes. Consequently, one approach to addressing the problems of statistical power in trials of screening using fetal biometry would be to define primary outcomes related to fetal growth abnormalities. An insistance on evidence that shows a reduction in all-cause perinatal death would simply remove the possibility of screening and intervention being implemented, which could lead to avoidable harm that could have been prevented in a cost-effective way.

Early delivery and iatrogenic harm

Routine induction at term had less dramatic effects on the risk of neonatal morbidity, with a 12% reduction in the risk of NICU admission and a 30% reduction in the risk of a low Apgar score. Moreover, these effects may be lost or even reversed in the context of early-term IOL. Most trials in the Cochrane review of term induction were of pregnancies at 41 weeks’ gestation and beyond.16 As post-term pregnancy is associated with an increased risk of neonatal morbidity, preventing this outcome should improve immediate neonatal outcomes as well as preventing stillbirth. In the context of IOL at < 39 weeks’ gestation, epidemiological data indicate that the intervention may actually increase neonatal morbidity.160 The potential for earlier intervention to cause harm is increasingly recognised. The Awareness of fetal movements and care package to reduce fetal mortality (AFFIRM) study200 reported a stepped-wedge RCT of a programme to inform women about reduced fetal movements and to standardise intervention. Although it did not show a significant reduction in stillbirth, the intervention was associated with increased risks of neonatal morbidity.200 This trial has some parallels with the current question. Despite the fact that women were selected on the basis of having a risk factor (i.e. reduced fetal movements, which is associated with stillbirth), it still failed to demonstrate a reduction in stillbirth rates, and the intervention was associated with increased rates of intervention and adverse outcomes. The result of the trial underlines two key issues: (1) the need for better predictors of adverse outcome and (2) the potential for intervention to cause harm.

Current status of screening tests

Unfortunately, the results of our systematic reviews of diagnostic effectiveness and a Cochrane DTA review23 failed to identify any ultrasonic marker that was clearly predictive of the risk of stillbirth in the context of scanning women in late pregnancy using ultrasound. Moreover, if we regard neonatal morbidity as a proxy of stillbirth, again, tests performed very poorly. Finally, actual birthweight in the < 3rd percentile was associated with a 0.9–1% risk of perinatal death at term compared with a background risk of just over 0.2%.54 Hence, even knowing that the actual birthweight was < 3rd percentile would be associated with a positive LR of between 4 and 5. In the POP study, of 562 women whose scan indicates that their infant was SGA, only 12% of women delivered an infant with a birthweight in the < 3rd percentile; a further 23% delivered an infant ≥ 3rd and < 10th percentile but about two-thirds of the women delivered an infant ≥ 10th percentile. Hence, on the basis of the association between the EFW and the actual birthweight, and their relationship between the actual birthweight and the risk of stillbirth, it is highly unlikely that detecting a SGA infant is strongly predictive of the risk of stillbirth. Given the lack of information, we model outcomes with variable incidence and assess different screening test values to establish what characteristics would be required of a test to make a trial of screening and intervention feasible.

Possible trial designs

Broadly speaking, there are two main approaches to trial design (Figure 24).32 First (hereinafter referred to as screen vs. no screen), women might be randomised (1) to be screened, with the offer of intervention if they screen positive, or (2) to receive routine care, which currently requires scanning only if there is a conventional clinical indication. The result of this trial design is a simple comparison between the two groups. In the event of a negative result, it is impossible to determine whether the result was because the screening test did not work or because the intervention did not mitigate the higher risks in screen-positive women. The second approach is to screen the whole population and randomise high-risk women to an intervention or to routine care (masking the result in the latter group), hereafter referred to as ‘screen all’. The advantages of the second approach are that the number of women who need to be recruited is substantially fewer and that the same trial can assess both the diagnostic effectiveness of the screening test and the clinical effectiveness of the intervention. The two approaches are illustrated in Figure 24.

FIGURE 24. Flow charts of possible trial designs: (a) screen vs.

FIGURE 24

Flow charts of possible trial designs: (a) screen vs. no screen; and (b) screen all.

Acceptability of the ‘screen-all’ approach

When discussing the possibility of randomising women with a high-risk screening result, some of the co-applicants expressed concerns. Interestingly, however, when we surveyed pregnant women, they actually preferred a study design that involved all participants being scanned. In the focus group, women tended to be more concerned about being offered interventions. The observations underline the different perspectives of pregnant women and professionals. We envisaged that women who are recruited to a ‘screen all’ approach would have some information revealed irrespective of their randomisation status. For example, we do not feel that it would be practical or ethical not to reveal the presentation of the infant as cephalic or non-cephalic. Hence, this would probably be revealed in a ‘screen all’ trial design. In the POP study, although scans were blinded, breech presentation was revealed. Subsequent interviews with participants were highly positive about this element of the study where the infant was breech [Dacey 2015; www.repository.cam.ac.uk/handle/1810/280595 (accessed June 2019)]. However, a drawback of this approach is that a ‘screen all’ design, which reveals breech presentation, would not capture the health benefits of detecting breech presentation. Other features that should be considered in revealing the result are the presence of previously undiagnosed major congenital anomalies and placenta praevia. In the POP study, there was no cases of placenta praevia, but two patients had major anomalies diagnosed where revealing the result optimised care and, in one case (unilateral hydrothorax with severe mediastinal shift), is likely to have prevented intrauterine fetal demise.

Power calculations

To determine the feasibility of a RCT we performed power calculations using the two different study designs represented above. The sample size calculations are presented in Table 17. All power calculations have been performed for a p-value < 0.05 (two-sided) with 90% power to detect the effect. We selected a range of possible primary outcomes: perinatal death, severe neonatal morbidity, any neonatal morbidity and delivery of a SGA infant with complications. In relation to perinatal death, we found no adequately powered studies of the diagnostic effectiveness of ultrasound to predict this outcome and the Cochrane DTA review23 of SGA also found no data in relation to this question. Therefore, we modelled a series of possible screening performances, varying the screen-positive rate and positive LR. In relation to morbidity, we used two studies reporting data from the POP study, from The Lancet8 and The Lancet Child & Adolescent Health.149 As described above, the POP study was one of only two studies (Perinatal Ireland Genesis study being the other) that performed blinded ultrasound scanning in late gestation in nulliparous women. Unfortunately, the Genesis study did not report the association between SGA and morbidity, and the only publication in relation to LGA is in abstract form only and addresses shoulder dystocia. The two POP study publications8,149 address the relationship between SGA, SGA combined with reduced growth velocity (which was the best-performing predictor of morbidity from a range of candidate predictors of FGR) and the Delphi consensus definition of late FGR.

TABLE 17

TABLE 17

Sample size calculations for different outcomes, screening tests and trial designs

In all of these calculations we assumed that the intervention would reduce the risk of the given event by 50%. Given the lack of data, a range of figures could be considered. We used this figure as we felt that it was conservative in relation to perinatal death. It could be argued, based on the discussion above, that it is optimistic in relation to neonatal morbidity. However, by concentrating the outcome of morbidity on infants that are actually SGA, it is plausible that the combined effect of making the diagnosis and intervening could substantially reduce the rate of adverse events. It should be borne in mind that in the relevant RCT, DIGITAT,99 randomisation occurred after ultrasound scanning led to suspicion of SGA. Hence, the group randomised to expectant management would still have received enhanced monitoring and high-risk care during labour as the infant was known to be SGA. By contrast, routine care in a trial of screening means that neither antenatal nor intrapartum care is tailored to the suspected SGA status of the fetus.

Implications of sample size calculations

We present the data on sample size calculations but we are not recommending a specific trial design. It is also possible that a trial may be considered where the combination of screening parameters, intervention effect and outcome are not listed in Table 17. The exact design of the trial would depend on the resources available and the research question. We do, however, discuss some of the issues that may motivate a choice.

We believe that the calculations above rule out a trial based on either perinatal death or severe neonatal morbidity as the sample size required is so great that the trial may not be feasible, but would inevitably be extremely expensive. Whether the screening test is simply for SGA or one of the FGR indicators is used will depend on the trade-off between labelling much larger numbers of women as screen positive and sample size. In all calculations, the screen-positive rate was higher for SGA, but the sample size was smaller.

Whether a ‘screen versus no screen’ or a ‘screen all’ approach is used will depend on the information required and on the screening test evaluated. A problem with the ‘screen all’ approach is that it would not capture the real world of comparing not doing something with doing it. It would also not capture the health benefits of diagnosing non-cephalic presentation at 36 weeks’ gestation. However, it would provide more information about the evidence base as it would allow the performance of the screening test and the intervention to be quantified separately. Finally, the complicated SGA outcome is delivery of a small infant where either the mother experiences pre-eclampsia or the infant experiences morbidity. This outcome has the attraction of focusing on the cases most likely to reflect true FGR and it is perhaps in this group that the intervention is most likely to yield a positive result. However, a primary outcome that includes morbidity of all infants may be preferred if the priority is to determine the overall effect of screening and intervention. It is also worth noting in the ‘complicated SGA’ outcome that the ‘screen all’ study design would actually involve performing more scans than the ‘screen versus no screen’ design if the screening test was simple SGA or the Delphi consensus definition of FGR.

Copyright © 2021 Smith et al. This work was produced by Smith et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This is an Open Access publication distributed under the terms of the Creative Commons Attribution CC BY 4.0 licence, which permits unrestricted use, distribution, reproduction and adaption in any medium and for any purpose provided that it is properly attributed. See: https://creativecommons.org/licenses/by/4.0/. For attribution the title, original author(s), the publication source – NIHR Journals Library, and the DOI of the publication must be cited.
Bookshelf ID: NBK568291

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.2M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...