Decade of Data LocusLink |
LocusLink: Cross-Referencing Across Databases
|
|
Data for a particular genetic locus, such as a gene, may exist in several qualitatively different resources. A phenotype and cytogenetic map position may be registered for a locus in OMIM, official and alternate names may be listed on the Human Gene Nomenclature page, and representative sequence data may be presented in a UniGene cluster. Additionally, GenBank itself may contain multiple sequences for a single locus. LocusLink (www.ncbi.nlm.nih.gov/LocusLink), developed by NCBIs Donna Maglott, provides an integrated querying and cross-referencing system to facilitate movement from one source to another. LocusLink anchors an official gene name, gene aliases, database IDs, phenotypes, map positions, sequence accession numbers, and other identifiers to a stable LocusID number. The cross-referencing allows locus searches of various types to reliably converge on the same data. Currently limited to human gene loci, the service will add other organisms in the future. Formulating LocusLink Queries One way to search LocusLink is to select from an alphabetical list of official gene symbols. In addition, a search box supports diverse queries consisting of official gene names and aliases, accession numbers or other database identifiers, protein names, phenotypes, EC numbers, OMIM numbers, UniGene clusters, or map positions. For example, the accession number AF053356, the UniGene cluster number Hs.74561, or the EC number 4.1.2.13 is an acceptable search term. The query syntax supports field restrictions and a wild-card symbol. Two particularly useful fields are chr for chromosome and mim for OMIM number. The query 2[chr] returns a list of all loci found on human chromosome two. The query A2* retrieves a list of all records containing a word beginning with A2. A multiword query such as apolipoprotein hypertriglyceridemia finds reports containing both words. LocusLink Reports Search results are summarized in a browsable list showing, for each entry, the LocusID, locus symbol, gene name, cytogenetic position, and a color-coded array of links to other resources that cite the locus, such as GenBank, RefSeq, UniGene, PubMed, dbSNP, or OMIM. A full LocusLink report, illustrated in Figure 1 for APOC3, begins with a row of database buttons for resources that contain data on the locus, followed by the organism name and the official gene symbol assigned by the HUGO Human Nomenclature Committee. Additional information includes the NCBI-assigned LocusID, any known alternative names or symbols, the gene product, cytogenetic location, associated OMIM records and UniGene clusters, phenotype, and a brief summary of gene function if available. Next comes a list of sequences for the locus, including a reference sequence if available. The reference sequence in the APOC3 example lists the accession numbers of the GenBank source sequence, the RefSeq mRNA, and the translated protein reference sequence. A second GenBank sequence for APOC3 is also reported. LocusLink by FTP The LocusLink database itself, updated weekly, is available by FTP through a Download link. A README file details the content of each data file. One file contains the basic LocusLink data, and three others contain cross-references from LocusIDs to GenBank, RefSeq, or OMIM numbers. DM, DW |