NCBI

Commonly Used Genome Terms

To top of page

Accession number:

An accession number is a unique identifier given to a sequence when it is submitted to one of the DNA repositories (GenBank, EMBL, DDBJ). The initial deposition of a sequence record is referred to as version 1. If the sequence is updated, the version number is incremented but the accession number will remain constant.

To top of page

aCGH:

A technique involving the competitive hybridization of “test” and “reference” DNA probes to target genomic (or cDNA clones) immobilized on a microarray. Most often used for the detection of copy number variation (CNV), aCGH also has applications in gene annotation and diagnostics.

Read more about aCGH

Mantripragada, K.K., et al. (2004) Trends in Genetics 20:87-94.

Wong, K.K., et al. (2007) Am. J. Hum. Genet. 80:91-104.

Redon, R., et al. (2006) Nature 444: 444-454.

To top of page

AGP:

A file that describes how primary sequences can be assembled to make a non-redundant, contiguous sequence. The sequence being assembled may be a contig or a chromosome. This file describes the portion of the component sequence used in the contig, in addition to the location on the contig of the component sequence. For more information about the file specifiction, see the format definition page.

To top of page

Allelic series:

A collection of distinct mutations that affect a single locus. Often, these different mutations will produce different phenotypes, thus providing a powerful genetic tool for the dissection of gene function.

Read more about alleles and complementation

references:

Vivian JL et al. An allelic series of mutations in Smad2 and Smad4 identified in a genotype-based screen of N-ethyl-N- nitrosourea-mutagenized mouse embryonic stem cells. Proc Natl Acad Sci U S A 2002; 99(24):15542-7.

Steingrimsson E et al. Interallelic complementation at the mouse mitf locus. Genetics 2003; 163(1):267-76.

To top of page

Annotation:

Adding biological information to genome sequence. This is a very complex task, and the process for doing this is rapidly evolving. Several groups are doing automated computational annotation of several genomes. Features that are added to the genome often include gene models, SNPs, and STSs.

Annotation at NCBI

Annotation at Ensembl

Annotation UCSC

references:

Reese MG et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 2000; 10(4):483-501.

Hubbard T et al. The Ensembl genome database project. Nucleic Acids Res. 2002; 30(1):38-41..

To top of page

BAC/PAC:

Bacterial Artificial Chromosome/P1 Artificial Chromosome. Commonly used cloning vectors for the human genome project. These vectors can hold large inserts, typically 80-200 kb, and propagate in E. coli as a single copy episome.

Read more about using BACs

Track specific clones at the NCBI Clone Registry

BacPac Resources: for information on the construction and maintenance of several BAC and PAC libraries.

references:

Osoegawa et al. Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 2000; 10(1): 116-28.

Osoegawa et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 2001; 11(3): 483-96.

To top of page

BES

:

BAC end sequence. The ends of BACs are sequenced and the clone association information is retained. In this way, BAC clones that do not have insert sequence can be integrated with other BAC clones, or with WGS assemblies.

Human BAC end sequencing info

Mouse BAC end sequencing info

Rat BAC end sequencing info

references:

Mahairas GG et al. Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. Proc Natl Acad Sci 1999; 96(17): 9739-44.

Zhao S et al. Human BAC ends quality assessment and sequence analyses. Genomics 2000; 63(3): 321-22.

Zhao S et al. Mouse BAC ends quality assessment and sequence analysis. Genome Res 2001; 11(10): 1736-45.

To top of page

BLAST

Basic Local Alignment Search Tool. A method for performing sequence comparisons. Either protein sequences or nucleotide sequences can be used. This algorithm has been extended and now includes a suite of programs including megaBLAST and discontiguous megaBLAST.

choose a BLAST program

Learn more about similarity searching

references:

Altschul et al. Basic local alignment search tool. J Mol Bio 1990; 215:403-10.

Zhang et al.A greedy algorithm for aligning DNA sequences. J. Comput Biol. 2000; 7(1-2):203-14.

To top of page

BLAT

A hashing algorithm developed by Jim Kent to allow rapid searching of large amounts of genome sequence. A hashing algorithm divides the database into words of a prescribed size (often 12-14 bases). The locations of these words are stored in memory. The query sequence is scanned for exact matches to words stored in memory. These types of algorithms tend to be very fast and effective for closely related sequences, but fail as sequences diverge. In addition to nucleotide BLAT, translated BLAT allows for comparison of protein sequences. This sequence aligner also allows for accurate alignment of transcribed sequences by looking at splice site information.

reference:

Kent WJ. BLAT- the BLAST like alignment tool. Genome Res 2002; 12(4):656-64.

To top of page

CAGE:

Cap Analysis Gene Expression. a technique for identifying transcription start sites and quantifying promoter usage in eukaryotic genomes. The method is based on the isolation and concatenation of short sequence tags (~21 bp) from the 5’ ends of individual mRNAs into longer DNA molecules that are subsequently sequenced. Transcriptional start sites are determined via mapping of tags to a reference genome. See also SAGE (Serial Analysis of Gene Expression).

references:

Kodzius, R., et al. (2006) Nat. Methods 3:211-222.

Shiraki, T., et al. (2003) PNAS 100:15776-15781.

To top of page

CDS:

Coding sequence. This is the portion of an mRNA or genomic sequence that encodes for a protein sequence.

To top of page

ChIP:

Chromatin Immunoprecipitation. A method for identifying protein-DNA interactions. Genomic DNA and associated proteins are cross-linked, sheared, and immunoprecipitated with antibodies that recognize specific DNA proteins. Purified DNA fragments are then assayed by various techniques to determine the association of specific sequences with the protein of interest (see below).

Read more about ChIP.

To top of page

ChIP/chip

the hybridization of ChIP purified DNA to microarrays containing genomic DNA sequences to achieve genome-wide identification of protein-DNA interactions.

references

Iyer, V.R., et al. (2001) Nature 409:533-538.

Ren, B. et al. (2000) Science 290: 2306-2309.

To top of page

ChIP/SAGE

The preparation of small tags from ChIP purified DNA and their subsequent SAGE analysis to achieve genome-wide identification of protein-DNA interactions.

reference:

Impey, S., et al. (2004) Cell 119:1041-54.

To top of page

ChIP/SEQ

A technique involving size selection, high throughput sequencing (typically using next generation sequencing technologies that produce millions of reads in a run) and mapping of ChIP purified DNA onto a reference genome to achieve genome-wide identification of protein-DNA interactions.

references:

Johnson, D.S., et al. (2007) Science 316:1497-1502.

Robertson, G. et al. (2007) Nat Methods 4:651-657.

To top of page

Chromosomal rearrangement:

These are events that are mediated by double-strand breaks and subsequent repair occurring in the genome. When these breaks are repaired the location of landmarks in the genome have often changed or have been removed completely. There are many different types of rearrangements:

These events may have no phenotypic consequences, depending upon the amount of DNA involved and the location of the breakpoints. However, there are many well-characterized human syndromes that are associated with these events.

Read more about chromosomal rearrangements.

Learn more about rearrangements associated with cancer.

reference:

Inoue K, Lupski JR. Molecular mechanisms for genomic disorders.Annu Rev Genomics Hum Genet 2002; 3:199-242.

To top of page

Component

A sequence used to construct a larger sequence (a sequence contig or a scaffold). Typically this is a sequence found in GenBank/EMBL/DDBJ, often a clone sequence or a WGS contig but occassionally a PCR product.

To top of page

Contig

This is short for contiguous sequence. When two sequences overlap at their ends (known as a "dove-tail" overlap). The sequences can be collapsed into a single, non-redundant sequence.

Read more about contigs

references:

Jang W et al (1999) Making effective use of human genomic sequence data. Trends Genet. 15(7): 284-6.

Kent WJ and Haussler D (2001) Assembly of the working draft of the human genome with GigAssembler. Genome Res 11(9): 1541-8.

Agarwala R book chapter 9 in Handbook of Computational Molecular Biology, editor Srinivas Aluru, Chapman & Hall/CRC Computer & Information Science Series Volume: 9

To top of page

Copy Number Variation (CNV):

large-scale structural changes in DNA that vary from individual to individual. These include insertions, deletions, duplications and complex multi-site variants that range from kilobases to megabases in size. CNV can influence gene expression, phenotypic variation and alter gene dosage, and in certain instances may be associated with developmental disorders, cause disease or confer susceptibility to complex disease traits.

references:

Egan, C.M., et al. (2007) Nat. Genet. 39:1384-1389.

Korbel, J.O., et al. (2007) Science 318:420-426.

Wong, K.K., et al. (2007) Am. J. Hum. Genet. 80:91-104.

Redon, R., et al. (2006) Nature 444:444-454.

Sharp, A.J., et al. (2005) Am. J. Hum. Genet. 77:78-88.

Tuzun, E., et al. (2005) Nat. Genet. 37:727-732.

Iafrate A.J., et al. (2004) Nat. Genet. 36:949-951.

Sebat, J., et al. (2004) Science 305:525-528.

To top of page

Cosmid

Cloning vector that typically contains insert sizes of 60-120kb. These vectors are hybrids of lambda phages and plasmids and can be propagated as plasmids or packaged like phage. The name comes from the fact that these vectors retain the phage cos sites that are used for lambda head stuffing. These are generally maintained in multiple copies in E. coli.

Read more about cosmids

references:

Evans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20.

Coulsan A et al. The physical map of the Caenorhabditis elegans genome. Methods Cell Biol 1995; 48:533-50.

To top of page

Draft sequence

This term has had several definitions, but generally refers to sequence that is not yet finished but is of generally high quality. In terms of clone based project, Draft sequence refers to a project in which greater than 90% of the bases are of high quality. This means that a clone project will have several fragments connected by Ns. Often, the order and orientation of these fragments is unknown. However, these sequences, in conjunction with other data are a useful substrate for genome assembly and annotation. See HTGS for additional information.

references:

Collins FS et al. New goals for the U.S. Human Genome Project: 1998-2003. Science 1998; 282(5389):682-9.

The Human Genome Sequence Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822):860-921.

To top of page

End Sequence Profiling (ESP):

a method for detecting genome-level variation. End sequences of clones from a library of interest are mapped onto a reference genome (in silico). Analysis of end sequence density and end sequence pair plots can reveal regions containing translocations, inversions, deletions/insertions and other complex structures. See also Paired End Mapping (PEM).

references:

Tuzun, E., et al. (2005) Nat. Genet. 37:727-732.

Volik, S. et al. (2006) Genome Res. 16:394-404.

Volik, S. et al. (2003) PNAS 100:7696-7701.

To top of page

e-PCR

Electronic PCR. A program that searches a given sequence for the presence of primer pairs. These primers must be in the proper orientation and a specified distance apart to define a match. There are currently two versions of this program. Forward e-PCR takes a sequence as a query and searches a database of sequence tag sites (STSs) while reverse e-PCR uses an STS to search a sequence for the presence of the primers in the correct orientation at the specified distance.

reference:

Schuler GD. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol 1998; 16(11):456-9.

To top of page

EST

Expressed sequence tag. These are single-pass sequences of cDNA clones. Databases of EST sequences are highly redundant but quite useful for gene identification. There are many efforts to cluster EST sequences to remove the redundancy and low-quality sequences.

EST clusters in UniGene

Gene indexes at Harvard

EST clusters at Allgenes

references:

Adams MD et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 1991; 252(5013):1651-6.

Marra M et al. An encyclopedia of mouse genes. Nat Genet 1999; 21(2):191-4.

To top of page

ExoFish

A technique that utilizes Whole Genome Shotgun (WGS) reads from the pufferfish, Tetraodan nigroviridis, to identify potential coding sequences in mammalian genomes based on homology. This technique was first used to annotate the Human Genome.

Read more about exoFISH

reference:

Roest Crollius et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 2000; 25(2):235-8.

To top of page

Fingerprint

The pattern of bands produced by a clone when restricted by a particular enzyme, such as HindIII. Clones that are related will have have fingerprint bands in common. The more bands in common, the greater the degree of overlap.

A BAC fingerprint map of the Mouse Genome

Human BAC map information

references:

Marra M et al. High throughput fingerprint analysis of large-insert clones. Genome Res 1997; 7(11):1072-84.

Marra M et al. zA map for sequence analysis of the Arabidopsis thaliana genome. Nat Genet 1999; 22(3):265-70.

McPherson JD et al. A physical map of the human genome. Nature 2001; 409(6822):934-41.

Soderlund C et al. Contigs built with fingerprints, markers and FPC v4.7. Gen. Research 2000; 10:1772-1787.

Soderlund C et al. FPC: a system for building contigs from restriction fingerprinted clones. CABIOS 1997; 13: 523-535.

To top of page

Finished Sequence

A clone insert that has been sequenced with an error rate of <0.01%. These sequence records generally have no gaps.

references:

Collins FS et al. New goals for the US Human Genome Project: 1998-2003. Science 1998; 282(5389):682-9.

The Human Genome Sequence Consortium Initial sequencing and analysis of the human genome. Nature 2001; 409(6822):860-921.

To top of page

FISH:

Fluorescent in situ hybridization. Genomic clones are fluorescently labeled and hybridized to chromosome spreads. In this way a clone can be mapped to a discrete cytogenetic band. If the clone has sequence associated with it, this information can be used to integrate sequence with cytogenetic information.

Read more about FISH

FISH methodology

reference:

Dyer SA and Green EK. Fluorescent in situ hybridization. Methods Mol Biol 2002; 187:73-86.

To top of page

Fosmid:

A cloning system based on the E. coli F factor. These clones have an average insert size of 40 Kb, with a very small standard deviation.

reference:

Birren BW et al. A human chromosome 22 fosmid resource: mapping and analysis of 96 clones. Genomics 1996; 34(1):97-106.

To top of page

Gap:

a region of the genome for which no sequence is currently available. There are two types of gaps: heterochromatic and euchromatic. Heterochromatic gaps consist largely of highly repetitive sequence, while euchromatic gaps are more likely to contain genes. Gaps may occur both within and between genomic scaffolds.

references:

Bovee, D., et al. (2008) Nat. Genet. 40:96-101.

Eichler, E.E., et al. (2004) Nat. Rev. Genet. 5:345-354.

To top of page

Gene targeting:

This is a specific type of transgenesis that targets a particular gene. If a mutated copy of a gene is electroporated into a cell, the inserted DNA will find the endogenous copy of itself and recombination will occur with some frequency (1-25%). If this event occurs in embryonic stem cells, cells carrying the new copy of the gene can be used to generate embryos that can be assessed for the phenotypic consequences of the mutation. This technique is used frequently in mice to study loss-of -function mutations.

Read more about gene targeting

NIH Mouse Knockout project (KOMP)

North American Conditional Mouse Mutagenesis project (NORCOMM)

European Union Conditional Mouse Mutagenesis project (EUCOMM)

references:

Thomas et al. High frequency targeting of genes to specific sites in the mammalian genome. Cell 1986; 44(3):419-28.

van der Weyden L, et al. Tools for targeted manipulation of the mouse genome. Physiol Genomics 2002; 11(3):133-64.

To top of page

Gene trapping

This strategy uses transgenesis to introduce DNA carrying a reporter gene (lacZ or GFP) flanked by various genomic signals (splice donor or acceptor sites, promoters, etc.). Expression of the reporter gene indicates that the DNA has integrated into a region of the genome containing a gene. The gene that has been trapped can be recovered using the DNA sequences associated with the reporter construct. Often, the introduction of the gene trapping vector inactivates the gene into which it was introduced.

Go to the Gene Trap web page

NIH Mouse Knockout project (KOMP)

North American Conditional Mouse Mutagenesis project (NORCOMM)

European Union Conditional Mouse Mutagenesis project (EUCOMM)

references:

Gossler et al. Mouse embryonic stem cells and reporter constructs to detect developmentally regulated genes. Science 1989; 244(4903):463-5.

Stanford et al. Gene-trap mutagenesis: past, present and beyond. Nat Rev Genet 2001; 2(10):756-68.

To top of page

Haplotype (haploid genotype):

a set of closely linked genetic markers present on one chromosome that tend to be inherited together. A haplotype may also refer to a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated with one another.

For more information on haplotypes, visit the HapMap page.

Read more about haplotypes.

To top of page

HTGS

High Throughput Genome Sequence. This is a term to distinguish all genomic sequence generated in a high-throughput manner. In order to release data more rapidly, it became standard for all sequence centers to submit unfinished sequence into public repositories (the "Bermuda Rules"). This sequence is deposited into the HTG division of GenBank/EMBL/DDBJ. In general, these terms are used to describe clone (BAC/PAC/fosmid) based projects.

keywords associated with HTGS:

reference:

Marshall E. Bermuda rules: community spirit, with teeth. Science 2001; 291(5507):1192.

To top of page

Linkage mapping

This type of mapping measures meiotic recombination using polymorphic markers to produce the relative order of markers with respect to each other. Distance between markers is measured in centiMorgans (cM). A centiMorgan is equivalent to a 1% cross-over rate.

Read more about genetic mapping

Read about genetic mapping in mice

references:

Donis-Keller H, et al. A genetic linkage map of the human genome. Cell 1987; 51(2):319-37.

Sturtevant AH. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool 1913; 14:43-59. (see a reprint (pdf) of this paper)

To top of page

Library

A library is a collection of DNA fragments that have been cloned into a vector that can propagate in E. coli. Once the DNA fragments are ligated into the vector, the collection of clones are transformed into a strain of E. coli suitable for propogation. To access individual clones, the E. coli hosts are grown on plates at a low density. The density is so low that individual colonies can be visualized, and each colony should contain only one DNA fragment. Typically, individual colonies are arrayed into microtiter plates (either 96 or 384 wells). This allows for both indexing and easy storage of the bacterial colonies.

To top of page

Mate pair

The sequence obtained from opposite ends of a particular clone are referred to as mate pairs. Knowing that two sequences are derived from the same clone allows these sequences to be linked, even if the full insert of the clone is unavailable. This is key to WGS assemblies.

references:

Weber JL , Myers EW. Human whole-genome shotgun sequencing. Genome Res 1997; 7(5):401-409.

Venter JC et al. The sequence of the human genome. Science 2001; 291(5507):1304-51.

Batzogluo S et al. ARACHNE: a whole-genome shotgun assembler. Genome Res 2002; 12(1):177-89.

Mullikin JC, Ning Z. The phusion assembler. Genome Res 2003; 13(1):81-90.

To top of page

N50

The contig/scaffold length at which have of the bases in a given assembly reside. This provides a measure of continuity. For instance, a scaffold N50 of 15 Mb means that at least half of the bases in the assembly are in a contig that is at least 15 Mb.

To top of page

Mutation

A sequence variation that deviates from the reference, or "wild type", sequence. This variation can be a SNP, an insertion of sequence, or a deletion of sequence. There can be a great deal of sequence variation between individuals in a population. For example, different humans may have as many as 1 basepair difference every 1000 bp. In practice, mutations are distinguished from variation because they have phenotypic consequences. Mutations in the Pax6 gene that lead to a loss of the function of that gene lead to the eyeless mutation in flies, the Small eye mutation in mice, and aniridia in humans.

Read more about mutations and mutant analysis

reference:

Gehring WJ. The master control gene for morphogenesis and evolution of the eye. Genes Cells 1996; 1(1):11-5.

To top of page

Optical Mapping:

A light microscope based technique in which images of single DNA molecules undergoing restriction enzyme digest are recorded and used for the construction of physical maps of large pieces of DNA. Optical maps often serve as scaffolds for the precise alignments of sequence contigs that are generated during genome sequencing projects.

Read more about optical mapping.

references:

Li, H., et al. (2007) J. Comput. Biol. 14:255-266.

Aston, C., et al. (1999) Trends Biotechnol. 17:297-302.

Schwartz D.C., et al. (1993) Science 262:110-114.

To top of page

Paired End Mapping (PEM):

a method for detecting genome-level variation. Paired ends from size-selected sheared genomic DNA fragments are subjected to high-throughput sequencing and then mapped onto a reference genome (in silico). Analysis of paired end spans can reveal regions containing translocations, inversions, deletions/insertions and other complex structures. See also End Sequence Profiling (ESP).

reference:

Korbel, J.O., et al. (2007) Science 318:420-426.

To top of page

Phenotype

An observable characteristic displayed by an organism. These characteristics can be controlled by genes, by the environment, or a combination of both. The characteristic can be directly observable, such as having brown eyes. In some cases, the phenotype will be measurable, such as having high blood pressure.

OMIM is an on-line catalog of human phenotypes

To top of page

Positional cloning

Identification of a gene based on its physical location in the genome. Often, an individual has a phenotype, but the gene underlying this phenotype is unknown. Using linkage mapping, the phenotype can be assigned a position in the genome. Once a phenotype has been localized, overlapping sets of clones (for example, BACs) that cover the region are identified. Genes within the region are identified and compared to DNA from individuals that display the phenotype until the underlying mutation is identified.

Read more about positional cloning

references:

Kerem B, et al. Identification of the cystic fibrosis gene: genetic analysis. Science 1989; 8;245(4922):1073-80.

To top of page

RefSeq

>Reference Sequence. The goal of the RefSeq project is to produce a reference sequence for all naturally occurring molecules from the central dogma (DNA, RNA, Protein).

reference:

Pruitt KD, Maglott DR. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001; 29(1):137-140.

Pruitt et al. (2007) Nucleic Acids Research 33:D61-D65.

To top of page

RFLP

Restriction fragment length polymorphism. A type of polymorphism detectable in a genome by the size differences in DNA fragments generated by restriction enzyme analysis.

Read more about RFLPs

references:

Botstein D, et al. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 1980; 32(3):314-31.

Donis-Keller H, et al. A genetic linkage map of the human genome. Cell 1987; 51(2):319-37.

To top of page

RH mapping

Radiation Hybrid mapping. A physical mapping method that estimates linkage and distance relative to radiation-induced chromosome breaks. This is analogous to genetic mapping.

More on RH mapping

references:

Goss SJ, Harris H. New method for mapping genes in human chromosomes. Nature 1975; 255(5511):680-4.

Cox et al. Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes. Science 1990; 250(4978):245-50.

Hudson TJ, et al. A radiation hybrid map of mouse genes. Nat Genet 2001; 29(2):201-5.

To top of page

SAGE (Serial Analysis of Gene Expression):

a technique for identifying and quantifying transcripts from eukaryotic genomes. This method is based on the isolation and concatenation of short sequence tags (~14 bp) from individual mRNAs into longer DNA molecules that are subsequently sequenced. A tag’s gene origin is determined via mapping of the tag to a reference genome. See also CAGE (Cap Analysis Gene Expression).

Read more about SAGE.

references:

Wang, S.M., et al. (2006) Trends Genet. 23:42-50.

Yamamoto, M., et al. (2001) J. Immunol. Methods 250:45-66.

Velculescu, V.E., et al. (1995) Science 270:484-487.

To top of page

Scaffold

(see supercontig)

To top of page

Segmental Duplication:

a region of genomic DNA ranging from 1 to 400kb that may be found at more than one site in the genome. Segmental duplications often share >90% sequence identity. See also Copy Number Variation (CNV).

Read more about Segmental Duplications.

references:

Sharp, A.J., et al. (2005) Am. J. Hum. Genet. 77:78-88.

Bailey, J.A., et al. (2002) Science 297:1003-1007.

Bailey, J.A. et al. (2001) Genome Res. 11:1005-1017.

To top of page

SNP

Single Nucleotide Polymorphism. A single base difference found when comparing the same DNA sequence from two different individuals.

More on SNPs

reference:

Weiss KM. In search of human variation. Genome Res 1998; 8(7): 691-7.

To top of page

SSAHA

A hashing algorithm developed for rapid searching of large amounts of genome sequence. This program is similar to BLAT but does not use splice information to align mRNA sequences, nor can it perform translated searches.

reference:

Ning et al. SSAHA: a fast search method for large DNA databases. Genome Res 2001; 11(10):1725-9.

To top of page

SSLP

Simple sequence length polymorphisms. Common examples of these in mammalian genomes include runs of dinucleotide or trinucleotide repeats (CACACACACACACACACA).

Read more about SSLPs

reference:

Weissenbach J, et al. A second-generation linkage map of the human genome. Nature 1992; 29:359(6398):777-8.

To top of page

Stem Cells

Most cells of the adult body are terminally differentiated, that is, they are no longer able to replace themselves or to become another cell type. Stem cells are undifferentiated cells that are able to both proliferate and differentiate into numerous cell types. For example, embryonic stem cells are able to differentiate into any cell type found within the embryo, where as hematopoietic stem cells can differentiate into any blood cell.

Read more about stem cells

reference:

Rossant J. Stem cells from the Mammalian blastocyst. Stem Cells 2001; 19(6):477-82

To top of page

STS

Sequence Tag Site. In general, short sequences (200-500 bp) are produced throughout a genome. Oligonucleotide primers are generated such that this sequence can be amplified using PCR to produce a discrete band when analyzed by electrophoresis. STS markers can be polymorphic or monomorphic. They are critical to integrating non-sequence based maps (such as genetic or RH) with sequence based maps.

Read more about STSs

references:

Green ED, Green P. Sequence-tagged site (STS) content mapping of human chromosomes: theoretical considerations and early experiences. PCR Methods Appl 1991; 1(2):77-90.

Hudson TJ et al. An STS-based map of the human genome. Science 1995; 270(5244):1919-20.

To top of page

Supercontig (Scaffold):

A supercontig is formed when an association can be made between two contigs that have no sequence overlap. This commonly occurs using information obtained from paired plasmid ends. For example, both ends of a BAC clone are sequenced. It can be inferred that these two sequences are approximately 150-200 Kb apart (based on the average size of a BAC). If the sequence from one end is found in a particular sequence contig, and the sequence from the other end is found in a different sequence contig, the two sequence contigs are said to be linked. In general, it is useful to have end sequences from more than one clone to provide evidence for linkage.

To top of page

Tiling (Targeting Induced Local Lesions in Genomes):

a reverse genetics technique that permits the directed identification of mutations in genes of interest. A chemical mutagen is used to induce lesions in an individual’s gametes, from which the DNA is recovered and subsequently analyzed by gene specific PCR and enzyme digest to identify genes with mutations.

references:

Gilchrist, E.J. (2006) BMC Genomics 7:262.

Sood, R., et al. (2006) Methods 39:220-227.

Winkler, S. et al. (2005) Genome Res. 15:718-723.

To top of page

TPF

Tiling Path File. This is a simple file that simply lists the order of clones along a chromosome. These files are often used in genome assemblies in an effort to convey mapping information to the assembly program.

To top of page

Transgenesis

The introduction of exogenous DNA into a cell. Typically, this term refers to the introduction of a gene into an embryo or other eukaryotic cell. In general, this DNA will insert into the genome at random, although specific loci can be targeted. The size of the DNA molecule introduce can be small (a few basepairs) to quite large (over 100 Kb).

Read more about transgenesis

reference:

Selbert S, Rannie D. Analysis of transgenic mice. Methods Mol Biol 2002;180:305-41.

To top of page

Whole Genome Assembly Comparison (WGAC):

a computational method that detects identity between long stretches of genomic sequence, revealing regions of segmental duplication. A method complementary to Whole Genome Shotgun Sequence Detection (WSSD).

reference:

Bailey, J.A. et al. (2001) Genome Res. 11:1005-1017.

To top of page

Whole Genome Shotgun Sequencing (WGS):

Whole Genome Shotgun. A sequencing method by which an entire genome is cut into chunks of discrete sizes (usually 2,10, 50 and 150 Kb) and cloned into an appropriate vector. The ends of these clones are sequenced. The two ends from the same clone are referred to as mate pairs. The distance between two mate pairs can be inferred if the library size is known and should have a narrow window of deviation.

references:

Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res 1997; 7(5): 401-409.

Venter JC et al. The sequence of the human genome. Science 2001; 291(5507):1304-51.

Batzogluo S et al. ARACHNE: a whole-genome shotgun assembler. Genome Res 2002; 12(1):177-89.

Mullikin JC, Ning Z. The phusion assembler. Genome Res 2003; 13(1):81-90.

To top of page

Whole Genome Shotgun Sequence Detection (WSSD):

a computational method for the comparison of whole genome shotgun sequence (WGS) to a reference genome, commonly used for the detection of segmental duplications. A method complementary to Whole Genome Alignment Comparison (WGAC).

reference:

Bailey, J.A., et al. (2002) Science 297:1003-1007.

To top of page

YAC

Yeast artificial chromosome. These cloning vectors were developed using yeast centromere and telomere sequences. The average insert size of these clones ranges from 100-1000 Kb. These clones can span large portions of the genome but can be highly unstable.

Read more about YACs

reference:

Burke DT et al. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 1987; 236(4803):806-12.

Page last updated: [an error occurred while processing this directive]January 6, 2008

Valid XHTML 1.0 Transitional