How to submit data to GenBank
The most important source of new data for GenBank ® is direct submissions from scientists. GenBank depends on its contributors to help keep the database as comprehensive, current, and accurate as possible. NCBI provides timely and accurate processing and biological review of new entries and updates to existing entries, and is ready to assist authors who have new data to submit.
Receiving an Accession Number for your Manuscript
Most journals require DNA and amino acid sequences that are cited in articles be submitted to a public sequence repository (DDBJ/ENA/GenBank - INSDC) as part of the publication process. Data exchange between DDBJ, ENA and GenBank occurs daily so it is only necessary to submit the sequence to one database, whichever one is most convenient, without regard for where the sequence may be published. Sequence data submitted in advance of publication can be kept confidential if requested. GenBank will provide accession numbers for submitted sequences, usually within two working days. This accession number serves as an identifier for your submitted your data, and allows the community to retrieve the sequence upon reading the journal article. The accession number should be included in your manuscript, preferably in a footnote on the first page of the article, or as required by individual journal procedures.
Submissions to GenBank
There are several options for preparing and submitting data to GenBank.
Web-based submission tools that are automatically submitted to GenBank:
- BankIt, a WWW-based submission tool with wizards to guide the submission process.
- Submission Portal, a unified system for multiple submission types. Currently only ribosomal RNA (rRNA), rRNA-ITS, metazoan mitochondrial COX1, eukaryotic nuclear mRNA, Influenza, Norovirus, Dengue or SARS-CoV-2 sequences can be submitted with the GenBank component of this tool. Genome and Transcriptome Assemblies can be submitted through the Genomes and TSA portals, respectively. This will be expanded in the future to include other types of GenBank submissions.
Submission preparation tools which require uploading via the Submission Portal or email to gb-sub@ncbi.nlm.nih.gov when relevant:
- table2asn, a command-line program that replaces the older tool tbl2asn, automates the creation of sequence records for submission to GenBank. It is used primarily for submission of annotated genomes and large batches of sequences, and is available by FTP for use on MAC, PC and Unix platforms.
Submissions of Raw Sequence Reads
Runs of unassembled reads, for example from 454 or Illumina or PacBio, can be submitted to the Sequence Read Archive (SRA) .
Updating or Revising a GenBank Sequence
Revisions or updates to GenBank entries can be made by the submitters at any time. Information about the correct format for different types of updates can be found on the Update guidelines page. Send updates and revisions to gb-admin@ncbi.nlm.nih.gov . Be sure to include the accession number of the sequence to be updated in the subject line.
Confidentiality
Some authors are concerned that the appearance of their data in GenBank prior to publication will compromise their work. GenBank will, upon request, withhold release of new submissions for a specified period of time. However, if the accession number or sequence data appears in print or online prior to the specified date, your sequence will be released. In order to prevent the delay in the appearance of published sequence data, we urge authors to inform us of the appearance of the published data. As soon as it is available, please send the full publication data--all authors, title, journal, volume, pages and date--to the following address: update@ncbi.nlm.nih.gov
Privacy
If you are submitting human sequences to GenBank, do not include any data that could reveal the personal identity of the source. It is our assumption that you have received any necessary informed consent authorizations that your organizations require prior to submitting your sequences.