Skip to main page content Skip to main page content

Web APIs

Publication

Chih-Hsuan Wei, Robert Leaman, Zhiyong Lu (2016). Beyond accuracy: Creating interoperable and s calable text mining web services, Bioinformatics, DOI:10.1093/bioinformatics/btv760.

Introduction

We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. The below Figure describes the overall architecture of our web services, which use sta ndard HTTP method calls (often known as RESTful services) and allow two access modes: A) a batch-oriented processing function for any arbitrary text input (abstract, full text, patent, etc), submitted via HTTP POST an d B) instant retrieval of pre-tagged results of PubMed abstracts via HTTP GET. For the batch processing function, users may submit one or multiple documents per batch, and large requests will be sent to a computer clu ster for parallel processing. When retrieving pre-tagged results of PubMed abstracts, the request only requires the PMIDs of the requested abstracts. This option is provided because annotating biomedical literature is the most common use case for such a text-mining service. From a technical stand-point, the preprocessing is made possible by our previous system PubTator, which stores text-mined annotations for every article in PubM ed and keeps in sync with PubMed via nightly updates.

Overview of the NCBI text mining web services Figure 1. Overview of the NCBI text mining web services.

Performance

Taggers Bioconcepts Evaluation corpus Precision Recall F-measure
GNormPlus (Wei et al., 2015) Gene BioCreative II - GN 87.08% 86.41% 86.74%
tmChem (Leaman et al., 2014) Chemical CHEMDNER 89.09% 85.75% 87.39%
DNorm (Leaman et al., 2013) Disease NCBI Disease corpus 80.30% 76.30% 78.20%
tmVar (Wei et al., 2013) Mutation MutationFinder 98.80% 89.62% 93.98%
SR4GN (Wei et al., 2012) Species Linnaeus corpus 85.82% 85.28% 85.55%
Table 1. Results of our individual taggers when benchmarked on public test collections.
Taggers HTTP Submission Method Description Throughput
(seconds per article; averaged over 5000 articles)
PubTator GET

Return Pre-Annotations

0.044s
GNormPlus* POST On-Demand Processing 0.127s
tmChem POST 0.008s
DNorm POST 0.007s
tmVar POST 0.090s
Table 2. Estimated processing time of our web services on NCBI clusters (subject to availability).
* SR4GN is used by GNormPlus. The results of GNormPlus include both gene and species.

Supplementary

We provide below several sample codes to show how to use our RESTful API service via programs.
RESTful sample codes avaliable in Perl, Python and Java.
We also provide a script for input formats (i.e., BioC, PubTator and JSON) checking.