Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms

Thomas Schaffter; Diana S M Buist; Christoph I Lee; Yaroslav Nikulin; Dezso Ribli; Yuanfang Guan; William Lotter; Zequn Jie; Hao Du; Sijia Wang; Jiashi Feng; Mengling Feng; Hyo-Eun Kim; Francisco Albiol; Alberto Albiol; Stephen Morrell; Zbigniew Wojna; Mehmet Eren Ahsen; Umar Asif; Antonio Jimeno Yepes; Shivanthan Yohanandan; Simona Rabinovici-Cohen; Darvin Yi; Bruce Hoff; Thomas Yu; Elias Chaibub Neto; Daniel L Rubin; Peter Lindholm; Laurie R Margolies; Russell Bailey McBride; Joseph H Rothstein; Weiva Sieh; Rami Ben-Ari; Stefan Harrer; Andrew Trister; Stephen Friend; Thea Norman; Berkman Sahiner; Fredrik Strand; Justin Guinney; Gustavo Stolovitzky; and the DM DREAM Consortium; Lester Mackey; Joyce Cahoon; Li Shen; Jae Ho Sohn; Hari Trivedi; Yiqiu Shen; Ljubomir Buturovic; Jose Costa Pereira; Jaime S Cardoso; Eduardo Castro; Karl Trygve Kalleberg; Obioma Pelka; Imane Nedjar; Krzysztof J Geras; Felix Nensa; Ethan Goan; Sven Koitka; Luis Caballero; David D Cox; Pavitra Krishnaswamy; Gaurav Pandey; Christoph M Friedrich; Dimitri Perrin; Clinton Fookes; Bibo Shi; Gerard Cardoso Negrie; Michael Kawczynski; Kyunghyun Cho; Can Son Khoo; Joseph Y Lo; A Gregory Sorensen; Hwejin Jung

doi:10.1001/jamanetworkopen.2020.0265

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms

JAMA Netw Open. 2020 Mar 2;3(3):e200265. doi: 10.1001/jamanetworkopen.2020.0265.

Authors

Thomas Schaffter¹, Diana S M Buist², Christoph I Lee³, Yaroslav Nikulin⁴, Dezso Ribli⁵, Yuanfang Guan⁶, William Lotter⁷, Zequn Jie⁸, Hao Du⁹, Sijia Wang¹⁰, Jiashi Feng¹¹, Mengling Feng¹², Hyo-Eun Kim¹³, Francisco Albiol¹⁴, Alberto Albiol¹⁵, Stephen Morrell¹⁶, Zbigniew Wojna¹⁷, Mehmet Eren Ahsen¹⁸, Umar Asif¹⁹, Antonio Jimeno Yepes¹⁹, Shivanthan Yohanandan¹⁹, Simona Rabinovici-Cohen²⁰, Darvin Yi²¹, Bruce Hoff¹, Thomas Yu¹, Elias Chaibub Neto¹, Daniel L Rubin²², Peter Lindholm²³, Laurie R Margolies²⁴, Russell Bailey McBride²⁵, Joseph H Rothstein²⁶, Weiva Sieh²⁷, Rami Ben-Ari²⁰, Stefan Harrer¹⁹, Andrew Trister²⁸, Stephen Friend¹, Thea Norman²⁹, Berkman Sahiner³⁰, Fredrik Strand^{31

32}, Justin Guinney¹, Gustavo Stolovitzky³³; and the DM DREAM Consortium; Lester Mackey³⁴, Joyce Cahoon³⁵, Li Shen³⁶, Jae Ho Sohn³⁷, Hari Trivedi³⁸, Yiqiu Shen³⁹, Ljubomir Buturovic⁴⁰, Jose Costa Pereira⁴¹, Jaime S Cardoso⁴¹, Eduardo Castro⁴¹, Karl Trygve Kalleberg⁴², Obioma Pelka^{43

44}, Imane Nedjar⁴⁵, Krzysztof J Geras⁴⁶, Felix Nensa⁴⁴, Ethan Goan⁴⁷, Sven Koitka^{43

46}, Luis Caballero¹⁴, David D Cox⁴⁸, Pavitra Krishnaswamy⁴⁹, Gaurav Pandey^{26

50}, Christoph M Friedrich⁴³, Dimitri Perrin⁴⁷, Clinton Fookes⁴⁷, Bibo Shi⁵¹, Gerard Cardoso Negrie⁵², Michael Kawczynski⁵³, Kyunghyun Cho³⁹, Can Son Khoo⁵⁴, Joseph Y Lo⁵⁵, A Gregory Sorensen⁷, Hwejin Jung⁵⁶

Affiliations

¹ Computational Oncology, Sage Bionetworks, Seattle, Washington.
² Kaiser Permanente Washington Health Research Institute, Seattle, Washington.
³ University of Washington School of Medicine, Seattle.
⁴ Therapixel, Paris, France.
⁵ Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary.
⁶ Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor.
⁷ DeepHealth Inc, Cambridge, Massachusetts.
⁸ Tencent AI Lab, Shenzhen, China.
⁹ National University of Singapore, Singapore.
¹⁰ Integrated Health Information Systems Pte Ltd, Singapore.
¹¹ Department of Electrical and Computer Engineering, National University of Singapore, Singapore.
¹² National University Health System, Singapore.
¹³ Lunit Inc, Seoul, Korea.
¹⁴ Instituto de Física Corpuscular (IFIC), CSIC-Universitat de València, Valencia, Spain.
¹⁵ Universitat Politecnica de Valencia, Valencia, Valenciana, Spain.
¹⁶ Centre for Medical Image Computing, University College London, Bloomsbury, London, United Kingdom.
¹⁷ Tensorflight Inc, Mountain View, California.
¹⁸ University of Illinois at Urbana-Champaign, Urbana.
¹⁹ IBM Research Australia, Melbourne, Australia.
²⁰ IBM Research Haifa, Haifa University Campus, Mount Carmel, Haifa, Israel.
²¹ Stanford University, Stanford, California.
²² Department of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics), Stanford University, Stanford, California.
²³ Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
²⁴ Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York.
²⁵ Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, New York.
²⁶ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.
²⁷ Department of Population Health Science and Policy, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.
²⁸ Fred Hutchinson Cancer Research Center, Seattle, Washington.
²⁹ Bill and Melinda Gates Foundation, Seattle, Washington.
³⁰ Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland.
³¹ Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden.
³² Breast Radiology, Karolinska University Hospital, Stockholm, Sweden.
³³ IBM Research, Translational Systems Biology and Nanobiotechnology, Thomas J. Watson Research Center, Yorktown Heights, New York.
³⁴ Microsoft New England Research and Development Center, Cambridge, Massachusetts.
³⁵ North Carolina State University, Raleigh.
³⁶ Icahn School of Medicine at Mount Sinai, New York, New York.
³⁷ Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco.
³⁸ Emory University, Atlanta, Georgia.
³⁹ New York University, New York.
⁴⁰ Clinical Persona, East Palo Alto, California.
⁴¹ Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
⁴² KolibriFX, Oslo, Norway.
⁴³ Department of Computer Science, University of Applied Sciences and Arts, Dortmund, Germany.
⁴⁴ Department of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany.
⁴⁵ Biomedical Engineering Laboratory Tlemcen University, Tlemcen, Algeria.
⁴⁶ Department of Radiology, NYU School of Medicine, New York, New York.
⁴⁷ Queensland University of Technology, Brisbane, Australia.
⁴⁸ MIT-IBM Watson AI Lab, IBM Research, Cambridge, Massachusetts.
⁴⁹ Institute for Infocomm Research, A*STAR, Singapore.
⁵⁰ Icahn Institute for Data Science and Genomic Technology, New York, New York.
⁵¹ Carl E. Ravin Advanced Imaging Laboratories, Department of Radiology, Duke University School of Medicine, Durham, North Carolina.
⁵² Satalia, London, United Kingdom.
⁵³ Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco.
⁵⁴ University College London, London, United Kingdom.
⁵⁵ Department of Radiology, Duke University School of Medicine, Durham, North Carolina.
⁵⁶ Korea University, Seoul, Korea.

Abstract

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.

Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.

Design, setting, and participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016.

Main outcomes and measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated.

Results: Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity.

Conclusions and relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Algorithms
Artificial Intelligence
Breast Neoplasms / diagnostic imaging*
Deep Learning*
Early Detection of Cancer
Female
Humans
Image Interpretation, Computer-Assisted / methods*
Mammography / methods*
Middle Aged
Radiologists*
Radiology
Sensitivity and Specificity
Sweden
United States

Abstract

Publication types

MeSH terms

Grants and funding