NCBI logo Computational Biology Branch
PubMed Entrez BLAST OMIM Books TaxBrowser Structure

NCBI
back to NCBI homepage
back to NCBI homepage
SITE MAP

Seminar schedule

Staff Scientists and Postdoctoral Fellows

Research Groups

BACK

spacer gif




Sun Kim
Research Scientist
(Currently at Amazon Alexa AI)
Computational Biology Branch
National Center for Biotechnology Information (NCBI)
National Institutes of Health (NIH)

E-mail: sun.kim at nih.gov
More details at http://echosf.net


Short Bio
Sun Kim is a Research Scientist at the National Center for Biotechnology Information (NCBI), where he joined right after receiving his PhD degree in Computer Science and Engineering from Seoul National University in 2009. His research interests include:
  • Natural language processing
  • Biomedical text mining
  • Information retrieval
  • Machine learning
Academic Services
  • Reviewer: Bioinformatics, Briefings in Bioinformatics, Database, Journal of the American Medical Informatics Association, PLOS One, PLOS Computational Biology, BMC Bioinformatics, Journal of Biomedical Informatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, IEEE Transactions on Knowledge and Data Engineering, Advances in Bioinformatics, Applied Mathematics and Computation, Journal of Information Science, Algorithms, AMIA 2014-2020, PSB 2010, IJCNLP 2013
  • PC Member: ICMLA 2010-2020, ACL 2018-2021, EMNLP 2018-2021, NAACL 2019/2021, AAAI 2020/2022, COLING 2020, AACL-IJCNLP 2020, AAAI Fall Symposium 2012, BioNLP-OST 2019, CSBio 2013-2015/2018/2020, BioDM 2015, ACM-BCB 2017
  • OC/LOC Member: BioCreative III, BioCreative 2012, BioCreative V, BioCreative 2016, BioCreative VI
Datasets
Tools
  • TeamTat: a collaborative text annotation tool.
  • LitSense: a web-based system that specializes in sentence retrieval for PubMed abstracts and PMC full-text articles.
  • ezTag: a web-based annotation tool that allows curators to perform annotation and provide training data interactively.
  • NCBITextLib: a software library for building a large-scale data infrastructure for text mining.
  • Meshable: a web service for searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.
  • BioC Viewer: a web interface for displaying and merging annotations in BioC.
  • BioQRator: a general-purpose user interface for annotating bio-entities and relationships.
  • PIE the search: a web service to find protein-protein interaction informative articles from PubMed.
  • PIE: a configurable web service to extract protein-protein interaction sentences from biomedical literature.
Publications
  • 2021
  • Handling Long-tail Queries with Slice-aware Conversational Systems, C. Wang, S. Kim, T. Park, S. Choudhary, S. Park, Y.-B. Kim, R. Sarikaya, and S. Lee, ICLR 2021 Workshop on Weakly Supervised Learning, 2021. [PDF]
  • NLM-Chem, a New Resource for Chemical Entity Recognition in PubMed Full Text Literature, R. I. Doğan, R. Leaman, S. Kim, D. Kwon, C.-H. Wei, D. Comeau, Y. Peng, D. Cissel, C. Coss, C. Fisher, R. Guzman, P. G. Kochar, S. Koppel, D. Trinh, K. Sekiya, J. Ward, D. Whitman, S. Schmidt, and Z. Lu, Scientific Data, 8, 91, 2021. [PDF]
  • 2020
  • Better Synonyms for Enriching Biomedical Search, L. Yeganova*, S. Kim*, Q. Chen, G. Balasanov, W. J. Wilbur, and Z. Lu, Journal of the American Medical Informatics Association, 27(12), pp. 1894-1902, 2020. [PDF]
  • TeamTat: A Collaborative Text Annotation Tool, R. I. Doğan, D. Kwon, S. Kim, and Z. Lu, Nucleic Acids Research, 48, W5-W11, 2020. [PDF]
  • Deep Learning with Sentence Embeddings Pre-trained on Biomedical Corpora Improves the Performance of Finding Similar Sentences in Electronic Medical Records, Q. Chen, J. Du, S. Kim, W. J. Wilbur, and Z. Lu, BMC Medical Informatics and Decision Making, 20(Suppl 1), 73, 2020. [PDF]
  • BioConceptVec: Creating and Evaluating Literature-based Biomedical Concept Embeddings on a Large Scale, Q. Chen, K. Lee, S. Yan, S. Kim, C.-H. Wei, and Z. Lu, PLOS Computational Biology, 16(4), e1007617, 2020. [PDF]
  • 2019
  • Evaluation of Five Sentence Similarity Models on Electronic Medical Records, Q. Chen, J. Du, S. Kim, W. J. Wilbur, and Z. Lu, ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 533, 2019. [PDF]
  • LitSense: Making Sense of Biomedical Literature at Sentence Level, A. Allot*, Q. Chen*, S. Kim*, R. Vera Alvarez, D. C. Comeau, W. J. Wilbur, and Z. Lu, Nucleic Acids Research, 47, W594-W599, 2019. [PDF]
  • Overview of the BioCreative VI Precision Medicine Track: Mining Protein Interactions and Mutations for Precision Medicine, R. I. Doğan, S. Kim, A. Chatr-aryamontri, C.-H. Wei, D. C. Comeau, R. Antunes, S. Matos, Q. Chen, A. Elangovan, N. C. Panyam, K. Verspoor, H. Liu, Y. Wang, Z. Liu, B. Altınel, Z. M. Hüsünbeyi, A. Özgür, A. Fergadis, C.-K. Wang, H.-J. Dai, T. Tran, R. Kavuluru, L. Luo, A. Steppi, J. Zhang, J. Qu, and Z. Lu, Database, 2019, bay147, 2019. [PDF]
  • 2018
  • Combining Rich Features and Deep Learning for Finding Similar Sentences in Electronic Medical Records, Q. Chen, J. Du, S. Kim, W. J. Wilbur, and Z. Lu, BioCreative/OHNLP Challenge, 2018. [PDF]
  • Efficient Rule-based Approaches for Tagging Named Entities and Relations in Clinical Text, D. Kim, S.-Y. Shin, H.-W. Lim, and S. Kim, BioCreative/OHNLP Challenge, 2018. [PDF]
  • Discovering Themes in Biomedical Literature Using a Projection-Based Algorithm, L. Yeganova, S. Kim, G. Balasanov, and W. J. Wilbur, BMC Bioinformatics, 19, 269, 2018. [PDF]
  • ezTag: Tagging Biomedical Concepts via Interactive Learning, D. Kwon*, S. Kim*, C.-H. Wei, R. Leaman, and Z. Lu, Nucleic Acids Research, 46, W523-W529, 2018. [PDF]
  • PubMed Phrases, an Open Set of Coherent Phrases for Searching Biomedical Literature, S. Kim, L. Yeganova, D. C. Comeau, W. J. Wilbur, and Z. Lu, Scientific Data, 5, 180104, 2018. [PDF]
  • A Fast Deep Learning Model for Textual Relevance in Biomedical Information Retrieval, S. Mohan, N. Fiorini, S. Kim, and Z. Lu, The Web Conference (WWW 2018), pp. 77-86, 2018. [PDF]
  • 2017
  • Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents, S. Kim, N. Fiorini, W. J. Wilbur, and Z. Lu, Journal of Biomedical Informatics, 75, pp. 122-127, 2017. [PDF] (original version in arXiv)
  • Overview of the BioCreative VI Precision Medicine Track, R. I. Doğan, S. Kim, A. Chatr-aryamontri, C.-H. Wei, D. C. Comeau, and Z. Lu, Sixth BioCreative Challenge Workshop, pp. 83-87, 2017. [PDF]
  • The BioCreative VI Precision Medicine Track Corpus, R. I. Doğan, A. Chatr-aryamontri, C.-H. Wei, C. S. Chang, R. Oughtred, J. Rust, L. Boucher, S. Kim, D. C. Comeau, Z. Lu, K. Dolinski, and M. Tyers Sixth BioCreative Challenge Workshop, pp. 88-93, 2017. [PDF]
  • Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs , S. Mohan, N. Fiorini, S. Kim, and Z. Lu, ACL 2017 Workshop on Biomedical Natural Language Processing, pp. 222-231, 2017. [PDF]
  • BioCreative VI Precision Medicine Track: Creating a Training Corpus for Mining Protein-Protein Interactions Affected by Mutations, R. I. Doğan, A. Chatr-aryamontri, S. Kim, C.-H. Wei, Y. Peng, D. C. Comeau, and Z. Lu, ACL 2017 Workshop on Biomedical Natural Language Processing, pp. 171-175, 2017. [PDF]
  • The BioC-BioGRID Corpus: Full Text Articles Annotated for Curation of Protein-Protein and Genetic Interactions, R. I. Doğan*, S. Kim*, A. Chatr-aryamontri*, C. S. Chang, R. Oughtred, J. Rust, W. J. Wilbur, D. C. Comeau, K. Dolinski, and M. Tyers, Database, 2017, baw147, 2017. [PDF]
  • 2016
  • BioCreative V BioC Track Overview: Collaborative Biocurator Assistant Task for BioGRID, S. Kim*, R. I. Doğan*, A. Chatr-aryamontri, C. S. Chang, R. Oughtred, J. Rust, R. Batista-Navarro, J. Carter, S. Ananiadou, S. Matos, A. Santos, D. Campos, J. L. Oliveira, O. Singh, J. Jonnagaddala, H.-J. Dai, E. C. Su, Y.-C. Chang, Y.-C. Su, C.-H. Chu, C. C. Chen, W.-L. Hsu, Y. Peng, C. Arighi, C. H. Wu, K. Vijay-Shanker, F. Aydın, Z. M. Hüsünbeyi, A. Özgür, S.-Y. Shin, D. Kwon, K. Dolinski, M. Tyers, W. J. Wilbur, and D. C. Comeau, Database, 2016, baw121, 2016. [PDF]
  • BioC Viewer: A Web-Based Tool for Displaying and Merging Annotations in BioC, S.-Y. Shin*, S. Kim*, W. J. Wilbur, and D. Kwon, Database, 2016, baw106, 2016. [PDF]
  • PubTermVariants: Biomedical Term Variants and Their Use for PubMed Search, L. Yeganova, W. Kim, S. Kim, R. I. Doğan, W. Liu, D. C. Comeau, Z. Lu, and W. J. Wilbur, ACL 2016 Workshop on Biomedical Natural Language Processing, pp. 141-145, 2016. [PDF]
  • Meshable: Searching PubMed Abstracts by Utilizing MeSH and MeSH-Derived Topical Terms, S. Kim, L. Yeganova, and W. J. Wilbur, Bioinformatics, 32(19), pp. 3044-3046, 2016. [PDF]
  • The DDINCBI Corpus - Towards a Larger Resource for Drug-Drug Interactions in PubMed, L. Yeganova, S. Kim, G. Balasanov, K. Bennett, H. Liu, and W. J. Wilbur, LREC 2016 Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability, pp. 38-41, 2016. [PDF]
  • 2015
  • Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis, S. Kim, L. Yeganova, and W. J. Wilbur, Conference on Empirical Methods on Natural Language Processing (EMNLP 2015), pp. 805-810, 2015. [PDF]
  • Overview of BioCreative V BioC Track, S. Kim, R. I. Doğan, A. Chatr-aryamontri, M. Tyers, W. J. Wilbur, and D. C. Comeau, Fifth BioCreative Challenge Workshop, pp. 1-9, 2015. [PDF]
  • Identifying Genetic Interaction Evidence Passages in Biomedical Literature, R. I. Doğan, S. Kim, A. Chatr-aryamontri, D. C. Comeau, and W. J. Wilbur, Fifth BioCreative Challenge Workshop, pp. 36-41, 2015. [PDF]
  • Extracting Drug-Drug Interactions from Literature Using a Rich Feature-Based Linear Kernel Approach, S. Kim, H. Liu, L. Yeganova, and W. J. Wilbur, Journal of Biomedical Informatics, 55, pp. 23-30, 2015. [PDF]
  • Identifying Named Entities from PubMed for Enriching Semantic Categories, S. Kim, Z. Lu, and W. J. Wilbur, BMC Bioinformatics, 16, 57, 2015. [PDF]
  • 2014
  • Retro: Concept Based Clustering of Biomedical Topical Sets, L. Yeganova, W. Kim, S. Kim, and W. J. Wilbur, Bioinformatics, 30(22), pp. 3240-3248, 2014. [PDF]
  • Assisting Manual Literature Curation for Protein-Protein Interactions Using BioQRator, D. Kwon*, S. Kim*, S.-Y. Shin, A. Chatr-aryamontri, and W. J. Wilbur, Database, 2014, bau067, 2014. [PDF]
  • Author Name Disambiguation for PubMed, W. Liu, R. I. Doğan, S. Kim, D. C. Comeau, W. Kim, L. Yeganova, Z. Lu, and W. J. Wilbur, Journal of the Association for Information Science and Technology (JASIST), 65(4), pp. 765-781, 2014. [PDF]
  • 2013
  • BioQRator: A Web-Based Interactive Biomedical Literature Curating System, D. Kwon, S. Kim, S.-Y. Shin, and W. J. Wilbur, Fourth BioCreative Challenge Workshop, pp. 241-246, 2013. [PDF]
  • 2012
  • Prioritizing PubMed Articles for the Comparative Toxicogenomic Database Utilizing Semantic Information, S. Kim, W. Kim, C.-H. Wei, Z. Lu, and W. J. Wilbur, Database, 2012, bas042, 2012. [PDF]
  • Thematic Clustering of Text Documents Using an EM-Based Approach, S. Kim and W. J. Wilbur, Journal of Biomedical Semantics, 3(Suppl 3), S6, 2012. [PDF]
  • Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers, S. Kim, W. Kim, D. C. Comeau, and W. J. Wilbur, NAACL 2012 Workshop on Biomedical Natural Language Processing, pp. 185-192, 2012. [PDF]
  • System Description for the BioCreative 2012 Triage Task, S. Kim, W. Kim, C.-H. Wei, Z. Lu, and W. J. Wilbur, BioCreative Workshop 2012, pp. 20-24, 2012. [PDF] (ranked first on MAP)
  • PIE the search: Searching PubMed Literature for Protein Interaction Information, S. Kim, D. Kwon, S.-Y. Shin, and W. J. Wilbur, Bioinformatics, 28(4), pp. 597-598, 2012. [PDF]
  • 2011
  • An EM Clustering Algorithm which Produces a Dual Representation, S. Kim and W. J. Wilbur, International Conference on Machine Learning and Applications (ICMLA 2011), pp. 90-95, 2011. [PDF]
  • Classifying Protein-Protein Interaction Articles Using Word and Syntactic Features, S. Kim and W. J. Wilbur, BMC Bioinformatics, 12(Suppl 8), S9, 2011. [PDF]
  • The Protein-Protein Interaction Tasks of BioCreative III: Classification/Ranking of Articles and Linking Bio-Ontology Concepts to Full Text, M. Krallinger, M. Vazquez, F. Leitner, D. Salgado, A. Chatr-Aryamontri, A. Winter, L. Perfetto, L. Briganti, L. Licata, M. Iannuccelli, L. Castagnoli, G. Cesareni, M. Tyers, G. Schneider, F. Rinaldi, R. Leaman, G. Gonzalez, S. Matos, S. Kim, W. J. Wilbur, L. Rocha, A. V. Tendulkar, S. Agarwal, F. Liu, X. Wang, R. Rak, K. Noto, C. Elkan, Z. Lu, R. I. Doğan, J.-F. Fontaine, M. A. Andrade-Navarro, and A. Valencia, BMC Bioinformatics, 12(Suppl 8), S3, 2011. [PDF]
  • 2010
  • Improving Protein-Protein Interaction Article Classification Performance by Utilizing Grammatical Relations, S. Kim and W. J. Wilbur, Third BioCreative Challenge Workshop, pp. 83-88, 2010. [PDF] (ranked first)
  • 2009
  • Evolutionary Hypernetwork Classifiers for Protein-Protein Interaction Sentence Filtering, J. Bootkrajang, S. Kim, and B.-T. Zhang, Genetic and Evolutionary Computation Conference (GECCO 2009), pp. 185-192, 2009. [PDF]
  • Evolving Hypernetwork Models of Binary Time Series for Forecasting Price Movements on Stock Markets, E. Bautu, S. Kim, A. Bautu, H. Luchian, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2009), pp. 166-173, 2009. [PDF]
  • Ensembled Support Vector Machines for Human Papillomavirus Risk Type Prediction from Protein Secondary Structures, S. Kim, J. Kim, and B.-T. Zhang, Computers in Biology and Medicine, 39(2), pp. 187-193, 2009. [PDF]
  • 2008 and before
  • Introducing Meta-Services for Biomedical Information Extraction, F. Leitner, M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.-J. Kuo, C.-N. Hsu, R. T. Tasi, H.-C. Hung, W. W. lau, C. A. Johnson, R. Satre, K. Yoshida, Y. H. Chen, S. Kim, S.-Y. Shin, B.-T. Zhang, W. A. Baumgartner, Jr., L. Hunter, B. Haddow, M. Matthews, X. Wang, P. Ruch, F. Ehrler, A. Ozgur, G. Erkan, D. R. Radev, M. Krauthammer, T. Luong, R. Hoffmann, C. Sander, and A. Valencia, Genome Biology, 9(Suppl 2), S6, 2008. [PDF]
  • PIE: an online prediction system for protein-protein interactions from text, S. Kim*, S.-Y. Shin*, I.-H. Lee, S.-J. Kim, R. Sriram, and B.-T. Zhang, Nucleic Acids Research, 36, W411-W415, 2008. [PDF]
  • Finding Cancer-Related Gene Combinations Using a Molecular Evolutionary Algorithm, C.-H. Park, S.-J. Kim, S. Kim, D.-Y. Cho, and B.-T. Zhang, IEEE International Symposium on Bioinformatics and Biomedical Engineering (BIBE 2007), pp. 158-163, 2007. [PDF]
  • Evolving Hypernetwork Classifiers for microRNA Expression Profile Analysis, S. Kim*, S.-J. Kim*, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2007), pp. 313-319, 2007. [PDF]
  • Use of Evolutionary Hypernetworks for Mining Prostate Cancer Data, C.-H. Park, S.-J. Kim, S. Kim, D.-Y. Cho, and B.-T. Zhang, International Symposium on Advanced Intelligent Systems, pp. 702-706, 2007.
  • Identifying Protein-Protein Interaction Sentences Using Boosting and Kernel Methods, S.-Y. Shin*, S. Kim*, J.-H. Eom, B.-T. Zhang, and R. Sriram, Second BioCreative Challenge Workshop, pp. 187-192, 2007. [PDF]
  • Text Classifiers Evolved on a Simulated DNA Computer, S. Kim, M.-O. Heo, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2006), pp. 2646-2652, 2006. [PDF]
  • Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines, S. Kim and B.-T. Zhang, Lecture Notes in Computer Science (EVOBIO 2006), 3907, pp. 57-66, 2006. [PDF]
  • A Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature, J.-H. Eom, S. Kim, S.-H. Kim, and B.-T. Zhang, Lecture Notes in Bioinformatics (KDLL 2006), 3886, pp. 42-52, 2006. [PDF]
  • Multi-objective Evolutionary Probe Design Based on Thermodynamic Criteria for HPV Detection, I.-H. Lee, S. Kim, and B.-T. Zhang, Lecture Notes in Artificial Intelligence (PRICAI 2004), 3157, pp. 742-750, 2004. [PDF]
  • Genetic Mining of HTML Structures for Effective Web-Document Retrieval, S. Kim and B.-T. Zhang, Applied Intelligence, 18(3), pp. 243-256, 2003. [PDF]
  • Evolutionary Learning of Web-Document Structure for Information Retrieval, S. Kim and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2001), pp. 1253-1260, 2001. [PDF]
  • SCAI Experiments on TREC-9, Y.-H. Kim, S. Kim, J.-H. Eom, and B.-T. Zhang, Text Retrieval Conference (TREC-9), pp. 392-399, 2000.
  • Web-Document Retrieval by Genetic Learning of Importance Factors for HTML Tags, S. Kim and B.-T. Zhang, PRICAI 2000 Workshop on Text and Web Mining, pp. 13-23, 2000.
  • SCAI TREC-8 Experiments, D.-H. Shin, Y.-H. Kim, S. Kim, J.-H. Eom, H.-J. Shin, and B.-T. Zhang, Text Retrieval Conference (TREC-8), pp. 511-518, 1999.
  • Abstracts
  • Sentence Similarity Measures Revisited: Ranking Sentences in PubMed Documents, Q. Chen*, S. Kim*, W. J. Wilbur, and Z. Lu, ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 531-532, 2018. [PDF]
  • Towards seamless format conversion between BioC and PubAnnotation for sharing PubMed/PubMed Central documents and annotations, S. Kim, D. C. Comeau, R. I. Doğan, and Z. Lu, Biomedical Linked Annotation Hackathon 3, 2017.
  • Building a cost-effective gold standard set for enriching PubAnnotation, D. Kwon, C.-H. Wei, S. Kim, R. Leaman, and Z. Lu, Biomedical Linked Annotation Hackathon 3, 2017.
  • BioCconvert: A Conversion Tool Between BioC and PubAnnotation, D. C. Comeau, R. I. Doğan, S. Kim, C.-H. Wei, W. J. Wilbur, and Z. Lu, International Conference on Biological Ontology & BioCreative, 2016.
  • BioCreative V: A Community-wide Effort for the Evaluation of Text Mining and its Relevance for Biomedical Curation, C. N. Arighi, K. B. Cohen, D. C. Comeau, R. I. Doğan, J. Fluck, L. Hirschman, S. Kim, M. Krallinger, Z. Lu, F. Rinaldi, A. Valencia, T. Wiegers, W. J. Wilbur, and C. H. Wu, International Biocuration Conference, 2016.
  • Biocuration and Text Mining: Lessons Learned from Developing an Interoperable Collaborative Biocurator Assistant Tool for BioGRID, R. I. Doğan, S. Kim, A. Chatr-aryamontri, W. J. Wilbur, and D. C. Comeau, International Biocuration Conference, 2016.
  • BioCreative V: A New Challenge in Text Mining for Biocuration, C. N. Arighi, A. Chatr-aryamontri, K. B. Cohen, D. C. Comeau, J. Fluck, R. I. Doğan, L. Hirschman, S. Kim, M. Krallinger, F. Leitner, Z. Lu, J. Oyarzabal, O. Rabal, F. Rinaldi, C. O. Tudor, A. Valencia, T. Wiegers, W. J. Wilbur, and C. H. Wu, International Biocuration Conference, 2015.
  • Analyzing MEDLINE Topics with a Projection Method, L. Yeganova, S. Kim, and W. J. Wilbur, NIPS 2014 Workshop on Modern Machine Learning and Natural Language Processing, 2014.
  • Extracting Drug-Drug Interactions from Literature Using a Rich Feature-Based Linear Kernel Approach, S. Kim, H. Liu, L. Yeganova, and W. J. Wilbur, AMIA 2014 Annual Symposium, 2014.
Publications in Google Scholar


Policies and Guidelines

Revised: Aug 1, 2021