Search Field Descriptions for Sequence Database

Monica Romiti; Peter Cooper

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Entrez Sequences Help [Internet].

Show details

Contents

< PrevNext >

Search Field Descriptions for Sequence Database

Monica Romiti, M.L.S. and Peter Cooper, Ph.D.

Author Information and Affiliations

Created: December 3, 2010; Last Update: February 9, 2011.

Estimated reading time: 5 minutes

Table 1.

Fields available for all Sequence Databases (Nucleotide, Protein, EST, GSS). Fields only available for the EST and GSS databases are given in Table 2.

Search Field	Short Field SpecifierSequences_help_appe.TF.2	Definition
[Accession]	[ACCN]	The accession number assigned by NCBI. Examples: AF123456[ACCN] Nucleotide NP_000240[ACCN] Protein
[All Fields]	[ALL]	All terms from all search fields in the database. Example: human[All Fields] Nucleotide Protein EST GSS (Compare with human[Organism], see [Organism] entry in this table.)
[Author]	[AU] [AUTH]	All authors from all references in the records. The format is last name [space] first initial(s), without punctuation. Example: venter jc[AUTH] Nucleotide Protein
[EC/RN Number]	[ECNO]	Enzyme Commission (EC) number for an enzyme activity. Example: 5.3.1.9[ECNO]) Protein Nucleotide (glucose-6-phosphate isomerase)
[Feature Key] (Nucleotide, Protein, GSS)	[FKEY]	Biological features listed in the Feature Table of the sequence records. Examples: polya signal[FKEY] Nucleotide nonstdres[FKEY] Protein gene[FKEY] GSS The GenBank feature table definition has more information on available features.
[Filter]	[FILT] [SB]	Filtered subsets of the database. An important kind of filter is based on the presence of links to other records. Other filters create useful subsets of data such as those set as Filters in the Discovery column of search results Examples: Links nucleotide_protein[Filter] Nucleotide protein_structure[Filter] Protein nucest_unigene[Filter] EST nucgss_unists[Filter] GSS Organism or properties subsets all[filter] Nucleotide Protein EST GSS mrna[filter] Nucleotide refseq[filter] Nucleotide Protein mammals[filter] Nucleotide Protein EST GSS
[Gene Name]	[GENE]	Gene names annotated on database records. For NCBI Reference Sequences, these names correspond to official nomenclature guidelines when possible. Submitters provide the gene names on GenBank/GenPept records. Gene names on submitted records may be historical names or vary from official guidelines for other reasons. Example: BRCA1[GENE] Nucleotide Protein
[Genome Project]	-	The numeric unique identifier for the genome project that produced the sequence records. Examples: 13139[Genome Project] Nucleotide Protein (Oryza sativa Japonica) 21117[Genome Project] Nucleotide EST GSS (Pelagic Microbial Assemblages in the Oligotrophic Ocean)
[Issue]	[ISS]	The issue number of the journals cited on sequence records, not generally useful in sequence databases.
[Journal]	[JOUR]	The name of the journals cited on sequence records. Journal names are indexed in the database in abbreviated form although many full titles are mapped to their abbreviations. Journals are also indexed by their by International Standard Serial Number (ISSN). Examples: proceedings of the national academy of sciences of the united states of america[Journal] Nucleotide Protein EST GSS Proc Natl Acad Sci U S A[Journal] Nucleotide Protein EST GSS 0027-8424[Journal] Nucleotide Protein EST GSS
[Keyword]	[KYWD]	Keywords applied by submitter or from controlled vocabularies applied by NCBI or other databases. Except for specific kinds of records, such as the examples given below, the terms in this index are not well controlled. This field is unpopulated for many GenBank/GenPept records. Examples: BARCODE[KYWD] Nucleotide Protein HTG[KYWD] Nucleotide RefSeqGene[KYWD] Nucleotide WGS_MASTER[KYWD] Nucleotide
[Modification Date]	[MDAT]	The date of most recent modification of a sequence record. The date format is YYYY/MM/DD. Only the year is required. The Modification Date is often used as a range of dates. The colon ( : ) separates the beginning and end of a date range. Examples: 2009/01/08[MDAT] Nucleotide Protein EST GSS 1995/09[MDAT] Nucleotide Protein EST GSS 2010/01:2010/12/31[MDAT] Nucleotide Protein EST GSS
[Molecular Weight] (Protein only)	[MOLWT]	The molecular weight in Daltons of the protein chain calculated from the amino acids only. This may not correspond to the molecular weight of the protein obtained from biological samples because of incomplete data or post-translational modifications of the protein in living systems. The colon ( : ) separates the beginning and end of a molecular weight range. Examples: 3039[MOLWT] Protein 25000:75000[MOLWT] Protein
[Organism]	[ORGN]	The scientific and common names for the complete taxonomy of organisms that are the source of the sequence records. This vocabulary includes all available nodes in the NCBI taxonomy database. Examples: cellular organisms[ORGN] Nucleotide Protein EST GSS firmicutes[ORGN] Nucleotide Protein human[ORGN] Nucleotide Protein EST GSS Escherichia coli O157:H7[ORGN] Nucleotide Protein
[Page Number]	[PAGE]	The page numbers of the articles that are cited on the sequence record, not generally useful in sequence databases.
[Primary Accession]	[PACC]	The primary accession number of the sequence record. This is the first one appearing on the ACCESSION line in the GenBank/GenPept format. Many records have additional secondary accessions representing records that have been merged. The Accession field indexes both primary and secondary accessions. Examples: U01317[PACC] Nucleotide M18047[PACC] Nucleotide (Compare: M18047[ACCN] Nucleotide, see [Accession] entry in this table.)
[Primary Organism]	[PORGN]	The primary organism when there is more than one source organism. Examples: human[PORGN] Nucleotide (Compare with human[ORGN], see [Organism] entry in this table.)
[Properties]	[PROP]	Molecular type, source database, and other properties of the sequence record. Terms indexed for this field are a useful classification system for sequence records. Examples: Molecule type biomol crna[PROP] Nucleotide biomol_genomic[PROP] Nucleotide biomol_mrna[PROP] Nucleotide Cellular location gene_in_genomic[PROP] Nucleotide Protein gene_in_mitochondrion[PROP] Nucleotide Protein GenBank division gbdiv_htg[PROP] Nucleotide gbdiv_vrt[PROP] Nucleotide Protein (These GenBank division queries must be combined with srcdb_genbank[PROP] to retrieve only GenBank records.) Database source srcdb_genbank[PROP] Nucleotide Protein EST GSS srcdb_ddbj/embl/genbank[PROP] Nucleotide Protein EST GSS srcdb_refseq_known[PROP] Nucleotide Protein srcdb_refseq_predicted[PROP] Nucleotide Protein srcdb_swiss-prot[PROP] Protein srcdb_pdb[PROP] Nucleotide Protein
[Protein Name]	[PROT]	The names of protein products as annotated on sequence records. The content of this field is not well controlled for GenBank/GenPept records and may contain inaccurate or incomplete information. Examples: aldolase[Protein Name] Nucleotide Protein
[Publication Date]	[PDAT]	The date that records were made public in Entrez. The date format is YYYY/MM/DD. The colon ( : ) separates the beginning and end of a date range. Examples: 2009/01/08[PDAT] Nucleotide EST GSS 2009/01/10[PDAT] Protein 1995/09[PDAT] Nucleotide Protein EST GSS 2010/01:2010/12/31[PDAT] Nucleotide Protein EST GSS
[SeqID String]	[SQID]	The NCBI identifier string for the sequence record. This is a brief structured format used by NCBI software. Example: gnl asm gca 000000215 2 chr3 45328308[SeqID String] Nucleotide
[Sequence Length]	[SLEN]	The total length of the sequence − the number of nucleotides or amino acids in the sequence. The colon ( : ) separates the beginning and end of a length range. Examples: 755[SLEN] Nucleotide Protein EST GSS 100:1000[SLEN] Nucleotide Protein EST GSS
[Substance Name]	[SUBS]	The names of chemical substances associated with a record. This field is only populated for sequences extracted from structure records – PDB derived sequences. The associated residue position is often included. Examples: mg, 1010[Substance Name] Nucleotide atp[Substance Name] Protein
[Text Word]	[WORD]	Text on a sequence record that is not indexed in other fields. Terms indexed here are included in an All Fields search, not generally useful.
[Title]	[TI] OR [TITL]	Words and phrases found in the title of the sequence record. The title is the DEFINITION line of the GenBank/GenPept format of the record. This line summarizes the biology of the sequence and includes the organism, product name, gene symbol, molecule type, and sequence completeness. complete cds[TI] Nucleotide kinesin[TI] Nucleotide Protein liver[TI] Nucleotide Protein EST uncultured[TI] Nucleotide Protein EST GSS
[Volume]	[VOL]	Contains the volume number of the journals in references on the sequence record, not generally useful in the sequence databases.

†: Queries using any term followed by the full name of the indexed field in square brackets will only retrieve records with the term indexed in that field. For example a search with apolipoprotein[Title] finds only records with “apolipoprotein” indexed for their Title field. Some fields have shorter names that can also be used instead of the full name. These are listed in the Abbreviated Field Specifier column of Table 1 when available.

Table 2.

Fields available only for EST and GSS databases.

Index Search Field	Description
[Clone ID]	The clone identifier provided by the submitter of the EST or GSS records. Example: image 1000232[Clone ID] EST ZMMBBb0001G04f[Clone ID] GSS
[EST Name] [GSS Name]	The name given to the EST or GSS record by the submitter. Examples: R-OVA-119[EST Name] EST DKFZP761J17121[GSS Name] GSS
[EST ID] [GSS ID]	Legacy dbEST or dbGSS unique identifier provided by NCBI. Examples: 2081316[EST ID] EST 14283478[GSS ID] GSS
[Library Class] (GSS Only)	Information about the kind of genomic DNA library that was the source of the clone. Examples: bac ends[Library Class] GSS methylation filtered [Library Class] GSS cosmid ends[Library Class] GSS shotgun[Library Class] GSS
[Library Name] (EST Only)	The name given to the cDNA library that is the source of the clone, provided by the submitter and taken verbatim from the record. May contain useful information about the cell, tissue, or organ source. Examples: soares fetal liver spleen 1nfls[Library Name] EST full length enriched swine cdna library, adult adrenal gland[Library Name] EST
[Submitter Name]	Submitter name of EST and GSS records. Unlike [Author Name], the Submitter Name content is not controlled and is verbatim from the EST or GSS record Examples: smith tpl[Submitter Name] EST GSS david severson[Submitter Name] EST da lightfoot and chris town[Submitter Name] GSS

Bookshelf ID: NBK49540

Contents

< PrevNext >

PubReader
Print View
Cite this Page
Romiti M, Cooper P. Search Field Descriptions for Sequence Database. 2010 Dec 3 [Updated 2011 Feb 9]. In: Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.
PDF version of this page (113K)
PDF version of this title (2.8M)

Other titles in this collection

NCBI Help Manual

Recent Activity

Clear Turn Off Turn On

Search Field Descriptions for Sequence Database - Entrez Sequences Help
Search Field Descriptions for Sequence Database - Entrez Sequences Help

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Entrez Sequences Help [Internet].

Search Field Descriptions for Sequence Database

Table 1.

Table 2.

Views

Other titles in this collection

Recent Activity