Decade of Data

LocusLink

Recent Publications

QBlast

Cn3D 2.5 Released

RefSeq

Exhibits &
Workshops

Coffee Break

Masthead

LocusLink: Cross-Referencing Across Databases

Data for a particular genetic locus, such as a gene, may exist in several qualitatively different resources. A phenotype and cytogenetic map position may be registered for a locus in OMIM, official and alternate names may be listed on the Human Gene Nomenclature page, and representative sequence data may be presented in a UniGene cluster. Additionally, GenBank itself may contain multiple sequences for a single locus. LocusLink (www.ncbi.nlm.nih.gov/LocusLink), developed by NCBI’s Donna Maglott, provides an integrated querying and cross-referencing system to facilitate movement from one source to another.


Click on picture to view enlarged version.


LocusLink anchors an official gene name, gene aliases, database IDs, phenotypes, map positions, sequence accession numbers, and other identifiers to a stable LocusID number. The cross-referencing allows locus searches of various types to reliably converge on the same data. Currently limited to human gene loci, the service will add other organisms in the future.


Formulating LocusLink Queries

One way to search LocusLink is to select from an alphabetical list of official gene symbols. In addition, a search box supports diverse queries consisting of official gene names and aliases, accession numbers or other database identifiers, protein names, phenotypes, EC numbers, OMIM numbers, UniGene clusters, or map positions. For example, the accession number “AF053356,” the UniGene cluster number “Hs.74561,” or the EC number “4.1.2.13” is an acceptable search term.

The query syntax supports field restrictions and a wild-card symbol. Two particularly useful fields are chr for chromosome and mim for OMIM number. The query “2[chr]” returns a list of all loci found on human chromosome two. The query “A2*” retrieves a list of all records containing a word beginning with “A2.” A multiword query such as “apolipoprotein hypertriglyceridemia” finds reports containing both words.


LocusLink Reports

Search results are summarized in a browsable list showing, for each entry, the LocusID, locus symbol, gene name, cytogenetic position, and a color-coded array of links to other resources that cite the locus, such as GenBank, RefSeq, UniGene, PubMed, dbSNP, or OMIM.

A full LocusLink report, illustrated in Figure 1 for APOC3, begins with a row of database buttons for resources that contain data on the locus, followed by the organism name and the official gene symbol assigned by the HUGO Human Nomenclature Committee.

Additional information includes the NCBI-assigned LocusID, any known alternative names or symbols, the gene product, cytogenetic location, associated OMIM records and UniGene clusters, phenotype, and a brief summary of gene function if available. Next comes a list of sequences for the locus, including a reference sequence if available. The reference sequence in the APOC3 example lists the accession numbers of the GenBank source sequence, the RefSeq mRNA, and the translated protein reference sequence. A second GenBank sequence for APOC3 is also reported.


LocusLink by FTP

The LocusLink database itself, updated weekly, is available by FTP through a Download link. A README file details the content of each data file. One file contains the basic LocusLink data, and three others contain cross-references from LocusIDs to GenBank, RefSeq, or OMIM numbers.

—DM, DW



Continue