GNormPlus: An Integrative Approach for Tagging Gene, Gene Family and Protein Domain
Authors: Chih-Hsuan Wei, Hung-Yu Kao and Zhiyong Lu (PI)
Research highlights
GNormPlus: an end-to-end system that handles both gene/protein name and identifier detection in biomedical literature, including gene/protein mentions, family names and domain names. Moreover, GNormPlus also integrates several advanced text-mining techniques (i.e., GenNorm, SR4GN, SimConcept, Ab3P and CRF++) for resolving composite gene names. On two public benchmarking datasets, we show that GNormPlus compares favorably to the other state-of-the-art methods.
Method overview
Our proposed approach includes two main steps: mention recognition and concept normalization, respectively. In the mention recognition step, we developed a new module based on CRF++, together with our previous species recognition system (i.e., SR4GN) to recognize gene and species names and match them accordingly. In concept normalization step, we applied our previous system, GenNorm, combined with a composite mention simplification tool (i.e., SimConcept) and an abbreviation resolution tool (i.e., Ab3P) for optimized performance.
Results
The first evaluation is a species-specific experiment where only human genes are considered. GNormPlus was evaluated on the BioCreative II GN test set. We compared GNormPlus with several previously reported systems, including our previous system, GenNorm. In the second experiment, we evaluate GNormPlus in multi-species gene normalization using the BioCreative III GN task data set. GNormPlus presents a competitive performance in both evaluations.
Open source tools | Precision | Recall | F-measure |
GNormPlus | 87.1% | 86.4% | 86.7% |
GenNorm | 78.9% | 81.4% | 80.1% |
GNAT | 90.7% | 82.4% | 86.4% |
Open source tools | TAP-5 | TAP-10 | TAP-20 | F-measure |
GNormPlus | 33.3% | 36.7% | 36.7% | 50.1% |
GenNorm | 32.8% | 35.5% | 35.5% | 46.9% |
GeneTuKit | 29.7% | 31.4% | 32.5% | - |
Downloads
GNormPlus Software in Java or Perl
GNormPlus Corpus
GNormPlus-tagged PubMed results in PubTator
GNormPlus
RESTful API
Please cite
- Wei C-H, Kao H-Y, Lu Z. GNormPlus: An Integrative Approach for Tagging Gene, Gene Family and Protein Domain. BioMed Research International Journal, Text Mining for Translational Bioinformatics special issue, BioMed Research International Journal, Article ID 918710; DOI: dx.doi.org/10.1155/2015/918710 (2015)