Commonly Used Genome Terms

Accession number:

An accession number is a unique identifier given to a sequence when it is submitted to one of the DNA repositories (GenBank, EMBL, DDBJ). The initial deposition of a sequence record is referred to as version 1. If the sequence is updated, the version number is incremented but the accession number will remain constant.

aCGH:

A technique involving the competitive hybridization of “test” and “reference” DNA probes to target genomic (or cDNA clones) immobilized on a microarray. Most often used for the detection of copy number variation (CNV), aCGH also has applications in gene annotation and diagnostics.

AGP:

A file that describes how primary sequences can be assembled to make a non-redundant, contiguous sequence. The sequence being assembled may be a contig or a chromosome. This file describes the portion of the component sequence used in the contig, in addition to the location on the contig of the component sequence. For more information about the file specifiction, see the format definition page.

Allelic series:

A collection of distinct mutations that affect a single locus. Often, these different mutations will produce different phenotypes, thus providing a powerful genetic tool for the dissection of gene function.

references:

Vivian JL et al. An allelic series of mutations in Smad2 and Smad4 identified in a genotype-based screen of N-ethyl-N- nitrosourea-mutagenized mouse embryonic stem cells. Proc Natl Acad Sci U S A 2002; 99(24):15542-7.

Steingrimsson E et al. Interallelic complementation at the mouse mitf locus. Genetics 2003; 163(1):267-76.

Annotation:

Adding biological information to genome sequence. This is a very complex task, and the process for doing this is rapidly evolving. Several groups are doing automated computational annotation of several genomes. Features that are added to the genome often include gene models, SNPs, and STSs.

Annotation at NCBI

Annotation at Ensembl

Annotation UCSC

references:

Reese MG et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 2000; 10(4):483-501.

Hubbard T et al. The Ensembl genome database project. Nucleic Acids Res. 2002; 30(1):38-41..

BAC/PAC:

Bacterial Artificial Chromosome/P1 Artificial Chromosome. Commonly used cloning vectors for the human genome project. These vectors can hold large inserts, typically 80-200 kb, and propagate in E. coli as a single copy episome.

references:

Osoegawa et al. Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 2000; 10(1): 116-28.

Osoegawa et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 2001; 11(3): 483-96.

BES

BAC end sequence. The ends of BACs are sequenced and the clone association information is retained. In this way, BAC clones that do not have insert sequence can be integrated with other BAC clones, or with WGS assemblies.

Human BAC end sequencing info

Mouse BAC end sequencing info

Rat BAC end sequencing info

references:

Mahairas GG et al. Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. Proc Natl Acad Sci 1999; 96(17): 9739-44.

Zhao S et al. Human BAC ends quality assessment and sequence analyses. Genomics 2000; 63(3): 321-22.

Zhao S et al. Mouse BAC ends quality assessment and sequence analysis. Genome Res 2001; 11(10): 1736-45.

BLAST

Basic Local Alignment Search Tool. A method for performing sequence comparisons. Either protein sequences or nucleotide sequences can be used. This algorithm has been extended and now includes a suite of programs including megaBLAST and discontiguous megaBLAST.

choose a BLAST program

Learn more about similarity searching

references:

Altschul et al. Basic local alignment search tool. J Mol Bio 1990; 215:403-10.

Zhang et al.A greedy algorithm for aligning DNA sequences. J. Comput Biol. 2000; 7(1-2):203-14.

BLAT

A hashing algorithm developed by Jim Kent to allow rapid searching of large amounts of genome sequence. A hashing algorithm divides the database into words of a prescribed size (often 12-14 bases). The locations of these words are stored in memory. The query sequence is scanned for exact matches to words stored in memory. These types of algorithms tend to be very fast and effective for closely related sequences, but fail as sequences diverge. In addition to nucleotide BLAT, translated BLAT allows for comparison of protein sequences. This sequence aligner also allows for accurate alignment of transcribed sequences by looking at splice site information.

reference:

Kent WJ. BLAT- the BLAST like alignment tool. Genome Res 2002; 12(4):656-64.

CAGE:

Cap Analysis Gene Expression. a technique for identifying transcription start sites and quantifying promoter usage in eukaryotic genomes. The method is based on the isolation and concatenation of short sequence tags (~21 bp) from the 5’ ends of individual mRNAs into longer DNA molecules that are subsequently sequenced. Transcriptional start sites are determined via mapping of tags to a reference genome. See also SAGE (Serial Analysis of Gene Expression).

references:

Kodzius, R., et al. (2006) Nat. Methods 3:211-222.

Shiraki, T., et al. (2003) PNAS 100:15776-15781.

CDS:

Coding sequence. This is the portion of an mRNA or genomic sequence that encodes for a protein sequence.

ChIP:

Chromatin Immunoprecipitation. A method for identifying protein-DNA interactions. Genomic DNA and associated proteins are cross-linked, sheared, and immunoprecipitated with antibodies that recognize specific DNA proteins. Purified DNA fragments are then assayed by various techniques to determine the association of specific sequences with the protein of interest (see below).

ChIP/chip

the hybridization of ChIP purified DNA to microarrays containing genomic DNA sequences to achieve genome-wide identification of protein-DNA interactions.

references

Iyer, V.R., et al. (2001) Nature 409:533-538.

Ren, B. et al. (2000) Science 290: 2306-2309.

ChIP/SAGE

The preparation of small tags from ChIP purified DNA and their subsequent SAGE analysis to achieve genome-wide identification of protein-DNA interactions.

reference:

Impey, S., et al. (2004) Cell 119:1041-54.

ChIP/SEQ

A technique involving size selection, high throughput sequencing (typically using next generation sequencing technologies that produce millions of reads in a run) and mapping of ChIP purified DNA onto a reference genome to achieve genome-wide identification of protein-DNA interactions.

references:

Johnson, D.S., et al. (2007) Science 316:1497-1502.

Robertson, G. et al. (2007) Nat Methods 4:651-657.

Chromosomal rearrangement:

These are events that are mediated by double-strand breaks and subsequent repair occurring in the genome. When these breaks are repaired the location of landmarks in the genome have often changed or have been removed completely. There are many different types of rearrangements:

deletion: the removal of a DNA sequence.
insertion: the addition of a DNA sequence
translocation: fusing one part of a chromosome to another
inversion: this is an intra-chromosomal event in which two breaks occur on the chromosome, the piece in the middle is flipped and the ends are then repaired.

These events may have no phenotypic consequences, depending upon the amount of DNA involved and the location of the breakpoints. However, there are many well-characterized human syndromes that are associated with these events.

Learn more about rearrangements associated with cancer.

reference:

Inoue K, Lupski JR. Molecular mechanisms for genomic disorders.Annu Rev Genomics Hum Genet 2002; 3:199-242.

Component

A sequence used to construct a larger sequence (a sequence contig or a scaffold). Typically this is a sequence found in GenBank/EMBL/DDBJ, often a clone sequence or a WGS contig but occassionally a PCR product.

Contig

This is short for contiguous sequence. When two sequences overlap at their ends (known as a "dove-tail" overlap). The sequences can be collapsed into a single, non-redundant sequence.

references:

Jang W et al (1999) Making effective use of human genomic sequence data. Trends Genet. 15(7): 284-6.

Kent WJ and Haussler D (2001) Assembly of the working draft of the human genome with GigAssembler. Genome Res 11(9): 1541-8.

Agarwala R book chapter 9 in Handbook of Computational Molecular Biology, editor Srinivas Aluru, Chapman & Hall/CRC Computer & Information Science Series Volume: 9

Copy Number Variation (CNV):

large-scale structural changes in DNA that vary from individual to individual. These include insertions, deletions, duplications and complex multi-site variants that range from kilobases to megabases in size. CNV can influence gene expression, phenotypic variation and alter gene dosage, and in certain instances may be associated with developmental disorders, cause disease or confer susceptibility to complex disease traits.

references:

Egan, C.M., et al. (2007) Nat. Genet. 39:1384-1389.

Korbel, J.O., et al. (2007) Science 318:420-426.

Wong, K.K., et al. (2007) Am. J. Hum. Genet. 80:91-104.

Redon, R., et al. (2006) Nature 444:444-454.

Sharp, A.J., et al. (2005) Am. J. Hum. Genet. 77:78-88.

Tuzun, E., et al. (2005) Nat. Genet. 37:727-732.

Iafrate A.J., et al. (2004) Nat. Genet. 36:949-951.

Sebat, J., et al. (2004) Science 305:525-528.

Cosmid

Cloning vector that typically contains insert sizes of 60-120kb. These vectors are hybrids of lambda phages and plasmids and can be propagated as plasmids or packaged like phage. The name comes from the fact that these vectors retain the phage cos sites that are used for lambda head stuffing. These are generally maintained in multiple copies in E. coli.

references:

Evans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20.

Coulsan A et al. The physical map of the Caenorhabditis elegans genome. Methods Cell Biol 1995; 48:533-50.

Draft sequence

This term has had several definitions, but generally refers to sequence that is not yet finished but is of generally high quality. In terms of clone based project, Draft sequence refers to a project in which greater than 90% of the bases are of high quality. This means that a clone project will have several fragments connected by Ns. Often, the order and orientation of these fragments is unknown. However, these sequences, in conjunction with other data are a useful substrate for genome assembly and annotation. See HTGS for additional information.

references:

Collins FS et al. New goals for the U.S. Human Genome Project: 1998-2003. Science 1998; 282(5389):682-9.

The Human Genome Sequence Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822):860-921.

End Sequence Profiling (ESP):

a method for detecting genome-level variation. End sequences of clones from a library of interest are mapped onto a reference genome (in silico). Analysis of end sequence density and end sequence pair plots can reveal regions containing translocations, inversions, deletions/insertions and other complex structures. See also Paired End Mapping (PEM).

references:

Tuzun, E., et al. (2005) Nat. Genet. 37:727-732.

Volik, S. et al. (2006) Genome Res. 16:394-404.

Volik, S. et al. (2003) PNAS 100:7696-7701.

e-PCR

Electronic PCR. A program that searches a given sequence for the presence of primer pairs. These primers must be in the proper orientation and a specified distance apart to define a match. There are currently two versions of this program. Forward e-PCR takes a sequence as a query and searches a database of sequence tag sites (STSs) while reverse e-PCR uses an STS to search a sequence for the presence of the primers in the correct orientation at the specified distance.

reference:

Schuler GD. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol 1998; 16(11):456-9.

EST

Expressed sequence tag. These are single-pass sequences of cDNA clones. Databases of EST sequences are highly redundant but quite useful for gene identification. There are many efforts to cluster EST sequences to remove the redundancy and low-quality sequences.

EST clusters in UniGene

Gene indexes at Harvard

EST clusters at Allgenes

references:

Adams MD et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 1991; 252(5013):1651-6.

Marra M et al. An encyclopedia of mouse genes. Nat Genet 1999; 21(2):191-4.

ExoFish

A technique that utilizes Whole Genome Shotgun (WGS) reads from the pufferfish, Tetraodan nigroviridis, to identify potential coding sequences in mammalian genomes based on homology. This technique was first used to annotate the Human Genome.

reference:

Roest Crollius et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 2000; 25(2):235-8.

Fingerprint

The pattern of bands produced by a clone when restricted by a particular enzyme, such as HindIII. Clones that are related will have have fingerprint bands in common. The more bands in common, the greater the degree of overlap.

A BAC fingerprint map of the Mouse Genome

Human BAC map information

references:

Marra M et al. High throughput fingerprint analysis of large-insert clones. Genome Res 1997; 7(11):1072-84.

Marra M et al. zA map for sequence analysis of the Arabidopsis thaliana genome. Nat Genet 1999; 22(3):265-70.

McPherson JD et al. A physical map of the human genome. Nature 2001; 409(6822):934-41.

Soderlund C et al. Contigs built with fingerprints, markers and FPC v4.7. Gen. Research 2000; 10:1772-1787.

Soderlund C et al. FPC: a system for building contigs from restriction fingerprinted clones. CABIOS 1997; 13: 523-535.

Finished Sequence

A clone insert that has been sequenced with an error rate of <0.01%. These sequence records generally have no gaps.

references:

Collins FS et al. New goals for the US Human Genome Project: 1998-2003. Science 1998; 282(5389):682-9.

The Human Genome Sequence Consortium Initial sequencing and analysis of the human genome. Nature 2001; 409(6822):860-921.

FISH:

Fluorescent in situ hybridization. Genomic clones are fluorescently labeled and hybridized to chromosome spreads. In this way a clone can be mapped to a discrete cytogenetic band. If the clone has sequence associated with it, this information can be used to integrate sequence with cytogenetic information.

reference:

Dyer SA and Green EK. Fluorescent in situ hybridization. Methods Mol Biol 2002; 187:73-86.

Fosmid:

A cloning system based on the E. coli F factor. These clones have an average insert size of 40 Kb, with a very small standard deviation.

reference:

Birren BW et al. A human chromosome 22 fosmid resource: mapping and analysis of 96 clones. Genomics 1996; 34(1):97-106.

Gap:

a region of the genome for which no sequence is currently available. There are two types of gaps: heterochromatic and euchromatic. Heterochromatic gaps consist largely of highly repetitive sequence, while euchromatic gaps are more likely to contain genes. Gaps may occur both within and between genomic scaffolds.

references:

Bovee, D., et al. (2008) Nat. Genet. 40:96-101.

Eichler, E.E., et al. (2004) Nat. Rev. Genet. 5:345-354.

Gene targeting:

This is a specific type of transgenesis that targets a particular gene. If a mutated copy of a gene is electroporated into a cell, the inserted DNA will find the endogenous copy of itself and recombination will occur with some frequency (1-25%). If this event occurs in embryonic stem cells, cells carrying the new copy of the gene can be used to generate embryos that can be assessed for the phenotypic consequences of the mutation. This technique is used frequently in mice to study loss-of -function mutations.

references:

Thomas et al. High frequency targeting of genes to specific sites in the mammalian genome. Cell 1986; 44(3):419-28.

van der Weyden L, et al. Tools for targeted manipulation of the mouse genome. Physiol Genomics 2002; 11(3):133-64.

Gene trapping

This strategy uses transgenesis to introduce DNA carrying a reporter gene (lacZ or GFP) flanked by various genomic signals (splice donor or acceptor sites, promoters, etc.). Expression of the reporter gene indicates that the DNA has integrated into a region of the genome containing a gene. The gene that has been trapped can be recovered using the DNA sequences associated with the reporter construct. Often, the introduction of the gene trapping vector inactivates the gene into which it was introduced.

Go to the Gene Trap web page

NIH Mouse Knockout project (KOMP)

North American Conditional Mouse Mutagenesis project (NORCOMM)

European Union Conditional Mouse Mutagenesis project (EUCOMM)

references:

Gossler et al. Mouse embryonic stem cells and reporter constructs to detect developmentally regulated genes. Science 1989; 244(4903):463-5.

Stanford et al. Gene-trap mutagenesis: past, present and beyond. Nat Rev Genet 2001; 2(10):756-68.

Haplotype (haploid genotype):

a set of closely linked genetic markers present on one chromosome that tend to be inherited together. A haplotype may also refer to a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated with one another.

For more information on haplotypes, visit the HapMap page.

HTGS

High Throughput Genome Sequence. This is a term to distinguish all genomic sequence generated in a high-throughput manner. In order to release data more rapidly, it became standard for all sequence centers to submit unfinished sequence into public repositories (the "Bermuda Rules"). This sequence is deposited into the HTG division of GenBank/EMBL/DDBJ. In general, these terms are used to describe clone (BAC/PAC/fosmid) based projects.

keywords associated with HTGS:

HTGS_phase0: A project that has very light coverage, generally 1-2 fold coverage of the clone. This initial light coverage is produced to ensure that the clone is not redundant to other sequence.
HTGS_phase1: An unfinished project, usually representing 3-6 fold coverage of the clone. The fragments within the clone are not ordered or oriented with respect to each other.
HTGS_phase2: An unfinished project, usually representing 5-10 fold coverage of the clone. The fragments within the clone are ordered and oriented.
HTGS_phase3: A finished project. A single fragment of very high quality.
HTGS_draft: A draft project is either a phase 1 or phase 2 project that has exceeded a specified quality standard. Generally, this translates to 3-4 fold sequence coverage of the BAC clone in high-quality bases.
HTGS_fulltop: Added to a record when the center responsible for finishing the clone has added sufficient new shotgun coverage for their finishing process to begin.
HTGS_activefin: Added when the center responsible for finishing actually begins the process of finishing the sequence
HTGS_cancelled: Added to clones that will never be finished.

reference:

Marshall E. Bermuda rules: community spirit, with teeth. Science 2001; 291(5507):1192.

Linkage mapping

This type of mapping measures meiotic recombination using polymorphic markers to produce the relative order of markers with respect to each other. Distance between markers is measured in centiMorgans (cM). A centiMorgan is equivalent to a 1% cross-over rate.

references:

Donis-Keller H, et al. A genetic linkage map of the human genome. Cell 1987; 51(2):319-37.

Sturtevant AH. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool 1913; 14:43-59. (see a reprint (pdf) of this paper)

Library

A library is a collection of DNA fragments that have been cloned into a vector that can propagate in E. coli. Once the DNA fragments are ligated into the vector, the collection of clones are transformed into a strain of E. coli suitable for propogation. To access individual clones, the E. coli hosts are grown on plates at a low density. The density is so low that individual colonies can be visualized, and each colony should contain only one DNA fragment. Typically, individual colonies are arrayed into microtiter plates (either 96 or 384 wells). This allows for both indexing and easy storage of the bacterial colonies.

Mate pair

The sequence obtained from opposite ends of a particular clone are referred to as mate pairs. Knowing that two sequences are derived from the same clone allows these sequences to be linked, even if the full insert of the clone is unavailable. This is key to WGS assemblies.

references:

Weber JL , Myers EW. Human whole-genome shotgun sequencing. Genome Res 1997; 7(5):401-409.

Venter JC et al. The sequence of the human genome. Science 2001; 291(5507):1304-51.

Batzogluo S et al. ARACHNE: a whole-genome shotgun assembler. Genome Res 2002; 12(1):177-89.

Mullikin JC, Ning Z. The phusion assembler. Genome Res 2003; 13(1):81-90.

N50

The contig/scaffold length at which have of the bases in a given assembly reside. This provides a measure of continuity. For instance, a scaffold N50 of 15 Mb means that at least half of the bases in the assembly are in a contig that is at least 15 Mb.

Mutation

A sequence variation that deviates from the reference, or "wild type", sequence. This variation can be a SNP, an insertion of sequence, or a deletion of sequence. There can be a great deal of sequence variation between individuals in a population. For example, different humans may have as many as 1 basepair difference every 1000 bp. In practice, mutations are distinguished from variation because they have phenotypic consequences. Mutations in the Pax6 gene that lead to a loss of the function of that gene lead to the eyeless mutation in flies, the Small eye mutation in mice, and aniridia in humans.

reference:

Gehring WJ. The master control gene for morphogenesis and evolution of the eye. Genes Cells 1996; 1(1):11-5.

Optical Mapping:

A light microscope based technique in which images of single DNA molecules undergoing restriction enzyme digest are recorded and used for the construction of physical maps of large pieces of DNA. Optical maps often serve as scaffolds for the precise alignments of sequence contigs that are generated during genome sequencing projects.

references:

Li, H., et al. (2007) J. Comput. Biol. 14:255-266.

Aston, C., et al. (1999) Trends Biotechnol. 17:297-302.

Schwartz D.C., et al. (1993) Science 262:110-114.

Paired End Mapping (PEM):

a method for detecting genome-level variation. Paired ends from size-selected sheared genomic DNA fragments are subjected to high-throughput sequencing and then mapped onto a reference genome (in silico). Analysis of paired end spans can reveal regions containing translocations, inversions, deletions/insertions and other complex structures. See also End Sequence Profiling (ESP).

reference:

Korbel, J.O., et al. (2007) Science 318:420-426.

Phenotype

An observable characteristic displayed by an organism. These characteristics can be controlled by genes, by the environment, or a combination of both. The characteristic can be directly observable, such as having brown eyes. In some cases, the phenotype will be measurable, such as having high blood pressure.

OMIM is an on-line catalog of human phenotypes

Positional cloning

Identification of a gene based on its physical location in the genome. Often, an individual has a phenotype, but the gene underlying this phenotype is unknown. Using linkage mapping, the phenotype can be assigned a position in the genome. Once a phenotype has been localized, overlapping sets of clones (for example, BACs) that cover the region are identified. Genes within the region are identified and compared to DNA from individuals that display the phenotype until the underlying mutation is identified.

references:

Kerem B, et al. Identification of the cystic fibrosis gene: genetic analysis. Science 1989; 8;245(4922):1073-80.

RefSeq

>Reference Sequence. The goal of the RefSeq project is to produce a reference sequence for all naturally occurring molecules from the central dogma (DNA, RNA, Protein).

reference:

Pruitt KD, Maglott DR. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001; 29(1):137-140.

Pruitt et al. (2007) Nucleic Acids Research 33:D61-D65.

RFLP

Restriction fragment length polymorphism. A type of polymorphism detectable in a genome by the size differences in DNA fragments generated by restriction enzyme analysis.

references:

Botstein D, et al. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 1980; 32(3):314-31.

Donis-Keller H, et al. A genetic linkage map of the human genome. Cell 1987; 51(2):319-37.

RH mapping

Radiation Hybrid mapping. A physical mapping method that estimates linkage and distance relative to radiation-induced chromosome breaks. This is analogous to genetic mapping.

references:

Goss SJ, Harris H. New method for mapping genes in human chromosomes. Nature 1975; 255(5511):680-4.

Cox et al. Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes. Science 1990; 250(4978):245-50.

Hudson TJ, et al. A radiation hybrid map of mouse genes. Nat Genet 2001; 29(2):201-5.

SAGE (Serial Analysis of Gene Expression):

a technique for identifying and quantifying transcripts from eukaryotic genomes. This method is based on the isolation and concatenation of short sequence tags (~14 bp) from individual mRNAs into longer DNA molecules that are subsequently sequenced. A tag’s gene origin is determined via mapping of the tag to a reference genome. See also CAGE (Cap Analysis Gene Expression).

references:

Wang, S.M., et al. (2006) Trends Genet. 23:42-50.

Yamamoto, M., et al. (2001) J. Immunol. Methods 250:45-66.

Velculescu, V.E., et al. (1995) Science 270:484-487.

Scaffold

(see supercontig)

Segmental Duplication:

a region of genomic DNA ranging from 1 to 400kb that may be found at more than one site in the genome. Segmental duplications often share >90% sequence identity. See also Copy Number Variation (CNV).

references:

Sharp, A.J., et al. (2005) Am. J. Hum. Genet. 77:78-88.

Bailey, J.A., et al. (2002) Science 297:1003-1007.

Bailey, J.A. et al. (2001) Genome Res. 11:1005-1017.

SNP

Single Nucleotide Polymorphism. A single base difference found when comparing the same DNA sequence from two different individuals.

reference:

Weiss KM. In search of human variation. Genome Res 1998; 8(7): 691-7.

SSAHA

A hashing algorithm developed for rapid searching of large amounts of genome sequence. This program is similar to BLAT but does not use splice information to align mRNA sequences, nor can it perform translated searches.

reference:

Ning et al. SSAHA: a fast search method for large DNA databases. Genome Res 2001; 11(10):1725-9.

SSLP

Simple sequence length polymorphisms. Common examples of these in mammalian genomes include runs of dinucleotide or trinucleotide repeats (CACACACACACACACACA).

reference:

Weissenbach J, et al. A second-generation linkage map of the human genome. Nature 1992; 29:359(6398):777-8.

Stem Cells

Most cells of the adult body are terminally differentiated, that is, they are no longer able to replace themselves or to become another cell type. Stem cells are undifferentiated cells that are able to both proliferate and differentiate into numerous cell types. For example, embryonic stem cells are able to differentiate into any cell type found within the embryo, where as hematopoietic stem cells can differentiate into any blood cell.

reference:

Rossant J. Stem cells from the Mammalian blastocyst. Stem Cells 2001; 19(6):477-82

STS

Sequence Tag Site. In general, short sequences (200-500 bp) are produced throughout a genome. Oligonucleotide primers are generated such that this sequence can be amplified using PCR to produce a discrete band when analyzed by electrophoresis. STS markers can be polymorphic or monomorphic. They are critical to integrating non-sequence based maps (such as genetic or RH) with sequence based maps.

references:

Green ED, Green P. Sequence-tagged site (STS) content mapping of human chromosomes: theoretical considerations and early experiences. PCR Methods Appl 1991; 1(2):77-90.

Hudson TJ et al. An STS-based map of the human genome. Science 1995; 270(5244):1919-20.

Supercontig (Scaffold):

A supercontig is formed when an association can be made between two contigs that have no sequence overlap. This commonly occurs using information obtained from paired plasmid ends. For example, both ends of a BAC clone are sequenced. It can be inferred that these two sequences are approximately 150-200 Kb apart (based on the average size of a BAC). If the sequence from one end is found in a particular sequence contig, and the sequence from the other end is found in a different sequence contig, the two sequence contigs are said to be linked. In general, it is useful to have end sequences from more than one clone to provide evidence for linkage.

Tiling (Targeting Induced Local Lesions in Genomes):

a reverse genetics technique that permits the directed identification of mutations in genes of interest. A chemical mutagen is used to induce lesions in an individual’s gametes, from which the DNA is recovered and subsequently analyzed by gene specific PCR and enzyme digest to identify genes with mutations.

references:

Gilchrist, E.J. (2006) BMC Genomics 7:262.

Sood, R., et al. (2006) Methods 39:220-227.

Winkler, S. et al. (2005) Genome Res. 15:718-723.

TPF

Tiling Path File. This is a simple file that simply lists the order of clones along a chromosome. These files are often used in genome assemblies in an effort to convey mapping information to the assembly program.

Transgenesis

The introduction of exogenous DNA into a cell. Typically, this term refers to the introduction of a gene into an embryo or other eukaryotic cell. In general, this DNA will insert into the genome at random, although specific loci can be targeted. The size of the DNA molecule introduce can be small (a few basepairs) to quite large (over 100 Kb).

reference:

Selbert S, Rannie D. Analysis of transgenic mice. Methods Mol Biol 2002;180:305-41.

Whole Genome Assembly Comparison (WGAC):

a computational method that detects identity between long stretches of genomic sequence, revealing regions of segmental duplication. A method complementary to Whole Genome Shotgun Sequence Detection (WSSD).

reference:

Bailey, J.A. et al. (2001) Genome Res. 11:1005-1017.

Whole Genome Shotgun Sequencing (WGS):

Whole Genome Shotgun. A sequencing method by which an entire genome is cut into chunks of discrete sizes (usually 2,10, 50 and 150 Kb) and cloned into an appropriate vector. The ends of these clones are sequenced. The two ends from the same clone are referred to as mate pairs. The distance between two mate pairs can be inferred if the library size is known and should have a narrow window of deviation.

references:

Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res 1997; 7(5): 401-409.

Venter JC et al. The sequence of the human genome. Science 2001; 291(5507):1304-51.

Batzogluo S et al. ARACHNE: a whole-genome shotgun assembler. Genome Res 2002; 12(1):177-89.

Mullikin JC, Ning Z. The phusion assembler. Genome Res 2003; 13(1):81-90.

Whole Genome Shotgun Sequence Detection (WSSD):

a computational method for the comparison of whole genome shotgun sequence (WGS) to a reference genome, commonly used for the detection of segmental duplications. A method complementary to Whole Genome Alignment Comparison (WGAC).

reference:

Bailey, J.A., et al. (2002) Science 297:1003-1007.

YAC

Yeast artificial chromosome. These cloning vectors were developed using yeast centromere and telomere sequences. The average insert size of these clones ranges from 100-1000 Kb. These clones can span large portions of the genome but can be highly unstable.

reference:

Burke DT et al. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 1987; 236(4803):806-12.

Page last updated: [an error occurred while processing this directive]January 6, 2008