DATATOOL - NCBI data conversion tool

Program Description

DATATOOL is a utility program designed to convert ASN.1 specifications into XML DTD and vice versa, and to convert data between ASN.1 and XML formats. DATATOOL makes it possible to convert ASN.1 specification into XML DTD or schema, DTD into ASN.1 (with limitations), and DTD into XML schema. Also, once the specification is known, DATATOOL can convert data from ASN.1 to XML, or from XML to ASN.1 format.

DATATOOL is a part of NCBI C++ toolkit that can be freely downloaded from:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/
For more information please refer to:
http://ncbi.github.io/cxx-toolkit/pages/ch_app

Prebuilt DATATOOL for some platforms can be found at:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/

Basic instructions

DATATOOL can be used to formally convert any ASN.1 or XML data or data specification. For the list of command line arguments please refer to
http://ncbi.github.io/cxx-toolkit/pages/ch_app

Important Note

As DATATOOL performs only formal data conversion so it cannot be used to perform any additional processing on the converted data. If you need an additional data processing you can either:

Example

Converting GenBank ASN.1 data file to XML:

  1. Obtain GenBank ASN.1 data file at: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/. Here daily-nc directory contains individual files for each day's new or updated entries since close-of-data for the last GenBank Release in ASN.1 format.
    Additional documentation:
    /ncbi-asn1/README.asn1
    /ncbi-asn1/daily-nc/README.asn1.daily-nc
  2. Download the appropriate datatool binary for your platform:
    ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/
  3. Download NCBI data specification file:
    https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn
  4. Run the program:
    ./datatool -m NCBI_all.asn -d gbest225.aso -t Bioseq-set -px gbest225.xml
    Here:
    gbest225.aso
    is the name of the source GenBank data file in ASN binary format
    Bioseq-set
    is the name of the data type in the source file
    gbest225.xml
    is the name of the output file in XML format


PLEASE NOTE:


Please email questions at: info@ncbi.nlm.nih.gov

Last updated: Mar 30, 2006