U.S. flag

An official website of the United States government

Interpretation of VecScreen Results

The VecScreen output lists all segments of the query sequence that closely match any of the sequences in the UniVec database. The origin of such segments should be questioned because they are likely to have been derived from foreign DNA. This document provides guidance for evaluating the significance of matches reported by VecScreen and also for decontaminating the query by removing the foreign sequences.

Contents

  1. Definitions
  2. Judging the Significance of a Match

    • Strong and Moderate Matches to Vector
    • Weak Matches to Vector
    • Internal Matches
    • mRNA 3' End Sequences
    • Corroborating Evidence of Contamination
    • When VecScreen May Underestimate the Extent or Significance of Contamination
  3. Identifying the Foreign Sequence

  4. Identifying the Boundaries of the Foreign Sequence
  5. Decontaminating the Query Sequence

    • Removing Terminal Segments of Foreign Sequence
    • Removing Internal Segments of Foreign Sequence
    • Final Check
  6. Exceptions

  7. How to Relate a UniVec Database Segment to Its Parent Sequence

Definitions

Source DNA/RNA
Nucleic acid, e.g. genomic DNA or mRNA, from the biological source organism being studied.

Native sequence
Sequence representing the genetic information from the biological source organism.

Foreign DNA
Any DNA that does not originate from the biological source organism.

Foreign sequence
Sequence derived from foreign DNA.

Contamination
A query sequence containing one or more segments of foreign sequence is said to be contaminated because the foreign sequence does not represent the genetic information from the biological source organism.

Judging the Significance of a Match

Strong and Moderate Matches to Vector

By definition, strong matches very rarely occur by chance, and moderate matches rarely occur by chance. Consequently, strong and moderate matches usually indicate that the segment originated from foreign DNA (vector, adapter, linker, or primer) that was attached to the source DNA/RNA during the cloning process. The occasional moderate match that occurs by chance will lack any corroborating evidence of contamination. Sometimes, however, there is a valid reason why part of the query sequence should match a sequence in the UniVec database (see Exceptions).

Weak Matches to Vector

Weak matches identify sequence segments that are potentially of foreign origin. Although weak matches often occur by chance, they indicate foreign sequence whenever there is corroborating evidence of contamination.

Internal Matches

Adapters, linkers, PCR primers, and vectors are all attached to the ends of the source DNA/RNA. Foreign sequences are therefore much more commonly found near the ends of a query. Occasionally, however, foreign DNA segments can be found in the middle of a query sequence. This can happen when a chimeric insert (an insert assembled from several separate pieces of DNA) is sequenced or when multiple contaminated sequences are assembled into a longer sequence. Internal matches should therefore be checked for the presence of cloning sites or other corroborating evidence that would support the conclusion that the segment is foreign. If no corroborating evidence is found, or if a large portion of the query sequence matches vector, or if the matching segment lies in an open reading frame or other critical region, see Exceptions for possible alternative explanations for the match.

mRNA 3' End Sequences

Because mRNAs end with a polyA tail, any sequence following a stretch of polyA in an mRNA (cDNA) sequence almost certainly originates from foreign DNA added during the cloning process.

Corroborating Evidence of Contamination

Additional signs that a segment of DNA is foreign include:

  1. The sequence of the segment is that of a vector, adapter, linker or primer that was used to clone the source DNA/RNA

    • The presence of sequence matching a foreign DNA used anytime in the cloning history of the source DNA/RNA is extremely strong evidence that the segment has a foreign origin. (See the section on Identifying the Foreign Sequence.)
  2. Cloning site(s) within or near the match

    • The presence of cloning sites, especially of specific site(s) known to have been used in cloning the source DNA/RNA, is strong evidence that the matching segment comes from a vector, linker, adapter, or primer. The location of potential cloning sites can be found by a restriction site analysis of the query sequence.
  3. The UniVec sequence matching the query segment includes the cloning site or adjacent sequence of a vector

    • Even short matches to a multiple cloning site (MCS), or to the sequence flanking a cloning site of a vector, are strong evidence that the segment has a foreign origin. To determine whether a UniVec segment includes the cloning site region, the base coordinates of the UniVec segment must be related to the location of the cloning(s) as shown on a vector map, or as listed in the annotation for the full vector sequence.
  4. The match is to an adapter, linker, or PCR primer

    • Even short matches to adapter, linker, or PCR primer sequences are strong evidence that the segment has a foreign origin.
  5. Nearby stronger matches

    • A weak match adjacent to a stronger match usually indicates than the segment of foreign sequence is longer than indicated by the stronger match alone. A weak match within such a pattern may arise from adapter, linker, or primer sequences that are present in addition to vector sequences or from contamination with a variant of the vector in the UniVec database.

When VecScreen May Underestimate the Extent or Significance of Contamination

If the origin of the foreign sequence is a vector, adapter, linker, or PCR primer that is not represented in the UniVec database, VecScreen may still report matches to similar sequences that are represented. However, in such cases the reported matches will underestimate the full extent and significance of the contaminating sequences.

Identifying the Foreign Sequence

Although the query sequence may be decontaminated without knowing the origin of the segments of foreign sequence, if the identity of the foreign DNA is known, then the boundaries of the contamination can be located more precisely.

Although the alignments shown in the VecScreen output identify the UniVec database entry that matches the query, the full extent of the match to any individual vector will not be apparent because the sequence for most vectors in UniVec is not present as one contiguous piece (see UniVec description). These alignments, therefore, do not indicate which vector has the best overall match to the query sequence.

The best way to identify the most likely source(s) of foreign sequence is to review the cloning history of the sequenced DNA/RNA. If you obtained the clone, library, cDNA, etc. from another source, the full history will include all previous cloning, subcloning, and modification of the material. Note which cloning vectors, linkers, adapters, and PCR primers were used to clone the source DNA/RNA and for any subsequent manipulation of the DNA prior to sequencing. The segments of foreign sequence identified by VecScreen can then usually be matched to one or more of the vectors and oligonucleotides used for cloning. Sometimes, however, the foreign sequence may come from an unexpected source, such as contamination with another DNA present in the laboratory.

If the cloning history is not known, it may be possible to identify the vector that has the best match to the foreign sequence segment by performing a BLAST search using a database that contains a contiguous sequence for each vector, such as the artificial sequences subset of NCBI's nr/nt database.

Identifying the Boundaries of the Foreign Sequence

The matches reported by VecScreen may not always locate the exact junction between the foreign sequence and the native sequence for the following reasons: (a) the full extent of the foreign sequence may not be recognized because it originates from a variant MCS, adapter, linker, or primer that is not represented in the UniVec database; (b) sequencing errors may cause the alignment to be truncated before the true junction; and (c) chance similarity to vector sequence may extend a match a few bases into the native sequence.

The precise boundary between foreign and native sequence should be easy to locate if the foreign DNA can be identified and the full cloning history of the sequenced DNA is known. However, the expected sequence across the junction is not always observed because of trimming of the cloning sites by nuclease activity, insertion of multiple linkers, or other aberrant cloning events. If the cloning history is unknown, a restriction site analysis on the matching segment and the flanking sequence may locate cloning sites that are good candidates for the boundary of the foreign DNA.

If the junction between the foreign sequence and the native sequence cannot be located accurately, the foreign sequence segment(s) identified by VecScreen and any intervening segments of suspect origin should be removed.

Decontaminating the Query Sequence

Removing Terminal Segments of Foreign Sequence

A segment of foreign sequence close to either end of the query sequence should be removed, along with any additional sequence between the foreign sequence and the end of the query. The one exception to this rule is that the polyA tail of a mRNA (cDNA) sequence should never be trimmed (even if it matched a UniVec sequence) because it provides a useful landmark. However, any sequence following the polyA tail sequence should always be removed.

Removing Internal Segments of Foreign Sequence

A segment of foreign sequence in the middle of the query sequence usually indicates that two discontinuous pieces of native sequence have been joined, either at the cloning stage or during sequence assembly. In most cases, the foreign segment should therefore be removed and the query sequence split into two separate sequences.

Occasionally, an internal segment of foreign DNA originates from a transposon or insertion sequence that was inserted into the cloned source DNA while it was being propagated in the Escherichia coli or yeast host. If the sequence is intended to represent the content of a particular clone, e.g., a BAC clone from a genome sequencing project, the transposable element sequence should be preserved but should be clearly annotated to indicate the location and identity of the transposon or insertion sequence. The sequence from transposable elements should, however, be removed during the assembly of composite sequences that are intended to represent the genetic information of the biological source organism, such as complete chromosome or genome sequences.

Final Check

Re-run VecScreen on the revised query sequence to check that all the foreign sequence has been removed.

Exceptions

A positive VecScreen result may not always indicate vector contamination. Exceptions arise when there is a rational explanation for similarity between the query sequence and an element found in vectors. Such cases can usually be discerned by comparing the source and function of the query segment to the source and function of the vector element that it matches.

The most common reason for a strong match to vector sequences, other than contamination, is that the query is related to the source of an element that has been incorporated into a vector. Strong matches to vector may be expected if the query contains sequences related to any of the following:

Virus genomes (including endogenous proviruses)
A few viruses have been modified for use as vectors, including: adenovirus, several baculoviruses, vaccinia, and various retroviruses. Promoters, enhancers, and polyadenylation signals from viruses are commonly used to provide vectors with expression cassettes, e.g., transcriptional signals derived from cytomegalovirus, cauliflower mosaic virus, herpes simplex virus, simian virus 40, and retrovirus long-terminal repeats. Viral replication origins are also incorporated into some vectors.

Yeast and bacterial biosynthetic genes
Used to provide vectors with selectable auxotrophic markers.

Transcriptional signals from bacteria, yeast, a few mammals, and other model organisms
Many different promoters, operators, enhancers, control elements, transcription terminators, and polyadenylation signals have been incorporated into vectors to enable them to express proteins.

Bacterial and yeast repressor and activator genes
Incorporated into vectors to enable protein expression to be regulated.

Bacterial genes mediating antibiotic resistance
Used to provide vectors with selectable markers.

Bacterial plasmid genes and replication origins
Used to enable vectors to replicate in bacterial hosts.

Yeast replication elements
Replication origins, centromeres, and telomeres are used to enable certain vectors to replicate in yeast.

Bacteriophage genomes
Modified bacteriophage are used as cloning vectors, and bacteriophage genes are also used to add functionality to many plasmid vectors.

Transposons and Insertion Sequences
These elements are found in the natural plasmid backbones of many vectors and also are a common source of antibiotic resistance genes. Some vectors also contain functional transposable elements by design.

Other specialized elements from a variety of sources (including a few from humans)
Used to provide additional functions, such as: reporter genes, targeting signals, and tags to facilitate protein purification or detection.

How to Relate a UniVec Database Segment to Its Parent Sequence

Example UniVec Definition Line

gnl|uv|L08786.1:609-673 BlueScribe SK Minus cloning vector ------ 1 ------|-- 2 --|--------------- 3 ----------------

The definition line for a segment in the UniVec database is composed of three parts.

  1. The UniVec identifier for the parent sequence. The first section (gnl|uv) indicates that the sequence is part of the UniVec database. The last section is either the GenBank identifier (Accession number.version) for the parent sequence or an identifier of the form NGBxxxxx.x if the parent sequence is not in GenBank.
  2. A span specifying the location of the segment within the parent sequence. Double spans, e.g., 2964-3005-49, indicate that the segment crosses the end/beginning junction of a parental sequence that was pseudo-circularized.
  3. A short description of the sequence.

The VecScreen output may include an alignment of the query sequence against a segment of vector contained in the UniVec database. To determine the position of the output alignment within the parent vector, use the following relationship: Location of alignment on parent sequence is ab + sb - 1 to ae + sb - 1 Where: ab is the first base of the UniVec segment (subject sequence) in the alignment. ae is the last base of the UniVec segment (subject sequence) in the alignment. sb is the coordinate of the beginning of the UniVec segment taken from the span in the segment's definition line (see 2 above). (Note that the alignment may extend across the end/beginning of a circular sequence.)

Support Center

Last updated: 2017-11-28T15:52:14Z