New! RefSeq Release 224

New! RefSeq Release 224

Check out RefSeq release 224, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of May 6, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 435,879,646 records
  • 324,246,652 proteins
  • 62,348,147 RNAs
  • Sequences from 150,742 organisms

The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “New! RefSeq Release 224”

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Recently, NCBI Virus SARS-CoV-2 Variants Overview moved from a manual to an automated process for selecting mutations required to define a lineage (e.g., Omicron, BA.2, JN.1, etc.). With this update, the SARS-CoV-2 Variant Overview provides coverage for all SARS-CoV-2 lineages and is no longer limited to only lineages with CDC status. The SARS-CoV-2 Variants Overview website reports results from analyzing both GenBank and unassembled Sequence Read Archive (SRA) sequence data. It allows you to view geographic and frequency trends of records assigned to Pango lineages and search for sequence records using lineage-defining or other mutations (example shown in Figure 1)  Continue reading “Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview”

NCBI Pathogen Detection Presents the Antibiotic Susceptibility Test (AST) Browser

NCBI Pathogen Detection Presents the Antibiotic Susceptibility Test (AST) Browser

Have you ever wanted to compare antibiotic resistance data and resistance gene calls in bacteria? Now you can! Easily access and browse antibiotic susceptibility testing (AST) data and link to other NCBI resources using the new AST Browser. NCBI has collected AST data for many isolates in the Pathogen Detection system.  

Features and Benefits 
  • Data is in a searchable, tabular format 
  • Download data for further analysis 
  • Use the Cross-browser selection tool to link out to the Isolates Browser or MicroBIGG-E to identify the isolates and the genetic elements associated with each AST result 

Continue reading “NCBI Pathogen Detection Presents the Antibiotic Susceptibility Test (AST) Browser”

GenBank Release 260.0 is Available!

GenBank Release 260.0 is Available!

GenBank release 260.0 (4/19/2024) is now available on the NCBI FTP site. This release has 31.18 trillion bases and 4.46 billion records.

The current release has:

  • 250,803,006 traditional records containing 3,213,818,003,787 base pairs of sequence data
  • 3,333,621,823 WGS records containing 27,225,116,587,937 base pairs of sequence data
  • 741,066,498 bulk-oriented TSA records containing 689,648,317,082 base pairs of sequence data
  • 135,115,766 bulk-oriented TLS records containing 53,492,243,256 base pairs of sequence data  Continue reading “GenBank Release 260.0 is Available!”
Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).

What’s New?
  • 413 species are represented in this collection for the first time
  • 198 species are represented by a better assembly
  • 27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection”

NCBI Hidden Markov Models (HMM) Release 15.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 15.0 Now Available!

Download release 15.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s New?

Release 15.0 contains:

  • 16,667 HMMs maintained by NCBI
  • 279 new HMMs since release 14.0
  • Several hundreds HMMs with better names, EC numbers, Gene Ontology (GO) terms, gene symbols, or publications. 

Continue reading “NCBI Hidden Markov Models (HMM) Release 15.0 Now Available!”

Cleaner BLAST Databases for More Accurate Results

Cleaner BLAST Databases for More Accurate Results

Removing contaminated sequences using NCBI quality assurance tools 

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases.  Continue reading “Cleaner BLAST Databases for More Accurate Results”

Conserved Domain Database Version 3.21 Now Available!

Conserved Domain Database Version 3.21 Now Available!

Check out the newly released Conserved Domain Database (CDD) version 3.21. Updated content is available on the CDD FTP site.

What’s New?

Continue reading “Conserved Domain Database Version 3.21 Now Available!”

Browse Taxonomy Records with NCBI Datasets

Browse Taxonomy Records with NCBI Datasets

New & improved NCBI Datasets Taxonomy pages and command-line service 

NCBI Datasets is excited to introduce new features to our Taxonomy pages making it easier for you to access, browse, and download taxonomic information about organisms at any taxonomic level.  

What’s new?
  • Explore Taxonomy records with an updated look and feel  
  • Access and download taxonomic metadata from the web or with our updated command-line (CLI) tools 

Continue reading “Browse Taxonomy Records with NCBI Datasets”