VAST+ Help Document

	This help document describes how to use VAST+, an enhanced version of VAST that finds macromolecular structures that are similar in 3D shape, and groups them as complete or partial matches to the biological unit ("biounit") of a query structure, making it possible to find similar macromolecular complexes.Original VAST, which focuses on 3D similarities between individual protein molecules or 3D domains (substructures) rather than biological units, is described in the original VAST Help document. The VAST Search tool, which accepts input of a query structure's 3D coordinates and returns original-style VAST results, is described in the VAST Search Help document.

What is VAST+?

Vector Alignment Search Tool Plus
Comparison of original VAST vs. VAST+
How can VAST+ be used to learn more about proteins?
Illustrated examples of VAST+ results for:

Glutamate Dehydrogenase (Thermotoga maritima)
Deoxyhemoglobin (Homo sapiens)

Precalculated results

Input Options

PDB ID or MMDB ID

Enter ID directly into search box on VAST Home page

Structure or Protein database search:

3D structure → similar structures
protein sequence → structure→ similar structures
protein sequence → related structures→ similar structures

3D coordinates

input 3D coordinates of query structure into VAST Search page

Output

Query structure (summary information)

Interactions schematic
Molecular graphic
MMDB ID, PDB ID
Biological Unit
Source Organism
Number of proteins
Number of nucleotides
Number of chemicals

Similar structures

Folder tabs to select 3D alignment type:

All matching molecules superposed
Invariant substructure superposed

Filters to subset the search results by:

Number of aligned protein molecules

complete match to biological unit of query structure
partial match to biological unit of query structure

Taxonomy

Filter results by superkingdom

Ranked list of hits

Sort options

Aligned proteins
RMSD
Aligned residues
Sequence identity
Taxonomy

View details for a similar structure
Search within results

"Original VAST" button

View original style VAST results (by individual protein molecule)

"No matched structure found" (null result)
"New Search" button

Enter a new PDB ID or MMDB ID to find similar structures

Detailed Views/Graphical Displays for Individual Hits

Detailed view comparing the query structure and a matched structure

Aligned molecules
Interaction schematic of query structure vs. matched structure
List of matching protein molecules

Sequence Alignment Views

Align one pair of matching molecules
Display all aligned molecule sequences

3D Views

Superposition of query and neighbor in 3D
Cn3D version for viewing VAST+ superpositions (important note)

Original style VAST display (example)

similarities between individual protein molecules or 3D domains

VAST+ Algorithm

Overview of method used to identify similarly shaped macromolecular complexes
Illustration: VAST+ algorithm
Additional Details

3D alignment of biological units (macromolelcular complexes)

initial alignment (all matching molecules superposed)

refined alignment (invariant substructure superposed)

Size limits

minimum size of structures that are neighbored

maximum size of structures that are neighbored

Log of Changes to VAST+

References

Citing VAST and VAST+

Additional References

BRIEF TABLE OF CONTENTS


	What is VAST+? Vector Alignment Search Tool Plus Compare VAST vs. VAST+ How can VAST+ be used? Illustrated examples of VAST+ results: Glutamate Dehydrogenase Deoxyhemoglobin Precalculated results Input options PDB ID or MMDB ID Structure or Protein database search 3D coordinates Output Query structure (summary info) Similar structures Folder tabs to select 3D alignment type all matching molecules superposed invariant substructure superposed Filters to subset the search results Ranked list of hits Sort options View details for a similar structure Search within results "Original VAST" button "No matched structure found" (null result) "New search" button Detailed Views/Graphical Displays Query structure vs. matched structure Sequence Alignment Views 3D Views 3D Structure Superpositions Original style VAST display VAST+ Algorithm Overview Illustration: VAST+ algorithm Additional Details 3D alignments (initial, refined) Size limits (minimum, maximum) Log of changes to VAST+ References

What is VAST+ ?

definition | comparison of original VAST vs. VAST+ | how can VAST+ be used to learn more about proteins | illustrated examples of VAST+ results for: glutamate dehydrogenase, deoxyhemoglobin | precalculated results

VAST+ (Vector Alignment Search Tool Plus) Definition

VAST+ is a tool designed to compare 3-dimensional structures, with an emphasis on finding those with similar macromolecular complexes. The similarities are calculated using purely geometric criteria, without regard to sequence similarity, and therefore can identify distant homologs.

VAST+ is built upon the original Vector Alignment Search Tool (VAST), and expands the capabilities of that program by making it possible to now find macromolecular structures that have similarly shaped biological units (also referred to as "biounits"), not just those that share similarly shaped individual protein molecules or fragments.

The similar structures found by the programs are often referred to as "neighbors." VAST neighbors are structures that contain similarly shaped individual protein molecules or 3D domains, and VAST+ neighbors are structures that have similarly shaped biological units.

Comparison of original VAST vs. VAST+

The original VAST identifies 3D domains (substructures) within each protein structure in the Molecular Modeling Database (MMDB), and then finds other protein structures that have one or more similar 3D domains, using purely geometric criteria. The original VAST output reflects comparisons between individual protein molecules, which can share a similar shape along their entire length, or only along a fraction of their length, such as a single 3D domain (illustrated example of original VAST results). That output is available by clicking the "Original VAST" button near the upper right hand corner of a VAST+ results page. Additionally, that output is shown by default if you enter a query on the Original VAST home page, or if a particular query structure does not have any VAST+ neighbors but does have original VAST neighbors.

VAST+, on the other hand, focuses primarily on finding other macromolecular structures that have a similar biological unit, rather than those that are similar at the level of an individual protein molecule or 3D domain (illustrated examples of VAST+ results). To do this, VAST+ takes into consideration the complete set of 3D domains that VAST identified within a query structure, throughout all of its component protein molecules, and finds other macromolecular structures that have a similar set of proteins/3D domains. In this way, it is an extension of the original VAST, and reveals macromolecular complexes that are similar to each other by taking their quaternary structure into account. VAST+ output ranks the similar structures based on the extent of their similarity to the query structure, first listing complete matches to the query structure's biological unit, followed by partial matches, and ending with matches to individual protein molecules. Matches to individual 3D domains within protein molecules are not displayed in VAST+, but are still accessible by following the "Original VAST" link that is provided on a VAST+ results page.

Use Original VAST if you want to: Use VAST+ if you want to:

Focus on 3-dimensional similarities between individual protein molecules, or between individual 3D domains.
(Original VAST lists each protein molecule and 3D domain in the asymmetric unit of the query structure, and links to structures that are similar in shape to the protein molecule or 3D domain you select.)
illustrated example...

Focus on 3-dimensional similarities between macromolecular complexes (biological units).
illustrated examples...

View the superposition of two *or more* protein molecules at the same time.

View the superposition of two macromolecular complexes
(The latest version of NCBI's structure viewing program, Cn3D 4.3.1, is needed in order to do this.)

Align (in 3D space) two or more proteins from the same structure record.

Rank structures by their degree of similarity to the query protein's biological unit.

Align (in 3D space) two or more domains from the same molecule to each other, for example, to compare repeats.
(For example, compare the first and second immunoglobulin (Ig) domain in a protein molecule.)

Compare different states of the same oligomer. (For example, compare the conformation of the deoxygenated and oxygenated states of hemoglobin.)

Live example of original VAST results for
1HBB, deoxyhemoglobin (Homo sapiens).

Live example of VAST+ results for
1HBB, deoxyhemoglobin (Homo sapiens).

Illustrated/annotated example of original VAST results
for 1PTH, prostaglandin H2 synthase-1 (Ovis aries).

Illustrated/annotated examples of VAST+ results for
1B26, glutamate dehydrogenase (Thermotoga maritima), and
1HBB, deoxyhemoglobin (Homo sapiens).

How can VAST+ be used to learn more about proteins?

Because VAST+ emphasizes the identification of similar biological units, it is now possible to:

Find protein complexes that have the same oligomeric state and similarly shaped proteins, regardless of the degree of sequence similarity.

Example: view the VAST+ neighbors for 4MGG: Crystal Structure of an Enolase (Mandelate Racemase Subgroup) From Labrenzia Aggregata IAM 12614 (Target Nysgrc-012903) With Bound MG, Space Group P212121.
(Included among the VAST+ results are structures such as 3SN4, which has remarkable conservation of the overall complex structure, despite very low sequence similarity.)

Example: view the VAST+ neighbors for 1B26: Crystal structure of glutamate dehydrogenase from the hyperthermophilic eubacterium thermotoga maritima at 3.0 Å resolution.
(Included among the VAST+ results are structures such as 1GTM, which has a similarly shaped proteins and a hexameric biological unit, but a somewhat different configuation of the hexamer when compared with the query structure, as shown in the illustrated example below.)

Compare the conformation of different states of the same oligomer, such as oxy- and deoxyhemoglobin.

Example: view the VAST+ neighbors for 1HBB: High-resolution X-ray Study of Deoxyhemoglobin Rothschild 37beta Trp-> Arg: a Mutation That Creates an Intersubunit Chloride-binding Site.
(Included among the VAST+ results are structures such as 1GZX, an oxy T State hemoglobin, which can be viewed as a 3D superposition on the query structure, to compare configurations, using Cn3D 4.3.1.)

Compare the binding interfaces of a query structure and its nearest "neighbors," such as the differences in the binding interface among similarly shaped immunocomplexes.

Example: view the VAST+ neighbors for 3O6F: Crystal Structure of a Human Autoimmune TCR Ms2-3c8 Bound to MHC Class II Self-ligand Mbphla-dr4.
(Included among the VAST+ results are structures such as 1J8H, which contains a complex between HLA-DR3, an Influenza hemagglutinin peptide, and a human alpha/beta T-cell receptor. This demonstrates how well the auto-reactive T-cell receptor complex mimics complexes that include foreign peptides.)

Illustrated examples of VAST+ results:

glutamate dehydrogenase (Thermotoga maritima) | deoxyhemoglobin (Homo sapiens)

Illustrated Example: VAST+ results for Glutamate Dehydrogenase (Thermotoga maritima)

*VAST+ search results for 1B26 Glutamate Dehydrogenase (Thermotoga maritima)*, as of 02 October 2014, with detailed view of match to 1GTM. Click anywhere on the image to open a live web page with current VAST+ results for 1B26.
Open a live web page with current VAST+ results for 1B26		Read more about: Input options PDB ID or MMDB ID Structure or Protein database search 3D coordinates Output Query structure (summary info): interactions schematic molecular graphic MMDB ID, PDB ID biological unit source organism number of proteins number of nucleotides number of chemicals Similar structures Folder tabs to select 3D alignment type all matching molecules superposed invariant substructure superposed Filters to subset the search results complete biounit matches partial biounit matches taxonomy Ranked list of hits Sort options View details for a similar structure Search within results "Original VAST" button "No matched structure found" (null result) "New search" button Detailed Views/Graphical Displays Query Structure vs. Matched Structure Sequence Alignment Views 3D Views 3D Structure Superpositions Original style VAST display

Illustrated Example: VAST+ results for Deoxyhemoglobin (Homo sapiens)

VAST+ search results for 1HBB High-resolution X-ray Study of Deoxyhemoglobin Rothschild 37beta Trp-> Arg: a Mutation That Creates an Intersubunit Chloride-binding Site, as of 02 October 2014, with detailed view of match to 1FSL. Click anywhere on the image to open a live web page with current VAST+ results for 1HBB.
Open a live web page with current VAST+ results for 1HBB		Read more about: Input options PDB ID or MMDB ID Structure or Protein database search 3D coordinates Output Query structure (summary info): interactions schematic molecular graphic MMDB ID, PDB ID biological unit source organism number of proteins number of nucleotides number of chemicals Similar structures Folder tabs to select 3D alignment type all matching molecules superposed invariant substructure superposed Filters to subset the search results full biounit matches partial biounit matches taxonomy Ranked list of hits Sort options View details for a similar structure Search within results "Original VAST" button "No matched structure found" (null result) "New search" button Detailed Views/Graphical Displays Query Structure vs. Matched Structure Sequence Alignment Views 3D Views 3D Structure Superpositions Original style VAST display

VAST+ results are precalculated for structures in MMDB

VAST+ results are precalculated for structures that are publicly available in the Molecular Modeling Database (MMDB), as part of the MMDB data processing pipeline, and the results are updated each week as new structures are added to the database.

To see the VAST+ neighbors for a structure of interest, you can either input the (a) PDB ID or MMDB ID for the query structure on the VAST+ home page, or (b) start with a Structure or Protein database search and then follow links for "similar structures." In either case, VAST+ output will display a list of similar structures, ranking them by the extent of their similarity to the query structure's biological unit.

If your query structure is not yet available in MMDB, then you can input your structure's 3D coordinates into the VAST Search program, which will find 3D similar structures and display them in the original VAST output format.

Input Options

PDB ID or MMDB ID | Structure or Protein database search | 3D coordinates

PDB ID or MMDB ID

If you know the identifier for a structure that is currently in the Molecular Modeling Database (MMDB) and would like to find its VAST+ similar structures, simply:

Enter the PDB ID or MMDB ID for the structure of interest directly into the VAST+ home page.

VAST+ will then display your query structure followed by a list of geometrically similar structures, ranking them by the extent of their similarity to the query structure's biological unit.

A detailed view for each retrieved structure shows the correspondence between individual protein molecules in the query and subject structures. The "3D View" button enables you to view a superposition of the aligned regions.

Please note that the latest release of Cn3D (version 4.3.1) is necessary in order to properly view the superpositions.

Structure or Protein database search

If you don't know the identifier of a specific structure record, you can start by searching for a term(s) of interest (e.g., tumor suppressor, immunoglobulin) in the Entrez Structure or Entrez Protein database, then follow the links on the search results page from a structure or protein sequence of interest to "similar structures." There are several different ways this can be done:

3D structure → similar structures

Search the Entrez Structure database for a text word or phrase of interest.
For example, retrieve structures that contain the term immunoglobulin, or the phrase "tumor suppressor", anywhwere in the record. If desired, you can limit your query to a specific search field of the record, such as the title of the article describing the structure, for example: "tumor suppressor"[title] or immunoglobulin[title]. (The MMDB Help document provides search tips for the structure database.)

On the Structure search results page, click on the thumbnail image or title of a structure of interest to open its summary page, then press the "Similar Structures: VAST+" button near the upper right hand corner of the page.

The resulting page will display the VAST+ search results, listing structures that are geometrically similar to the structure you checked or viewed, ranked by the extent of their similarity to the query structure's biological unit.

protein sequence → structure → similar structures

Search the Entrez Protein database for a text word or phrase of interest. (For tips on searching the Entrez Protein database, see Entrez Help, Entrez Sequences Help, and Entrez Nucleotide and Entrez Protein FAQs.)

On the Protein search results page, click on the title of a protein of interest to open its sequence record, then scroll down to the "Related Information" section in the right hand margin of the page. If the protein sequence was derived from a 3D structure record, the Related Information section will include "Structure." Select that option.

The resulting page will display the 3D structure record from which the protein sequence was derived. From there, follow the link for "Similar Structures: VAST+" to see a list of geometrically similar structures identified by VAST+.

Note: If the "Related Information" section of a protein sequence record does not list "Structure" as an option, then use the next method described below, to traverse from protein sequence → related structure → similar structures.

protein sequence → related structures → similar structures

Only a small percentage of sequence records in the Entrez Protein database are derived from 3D structure records. However, it is still possible to find 3D structure records for the majority of protein sequence records by retrieving structures that are sequence-similar to your protein of interest. These are called "Related Structures" and they have been identified by a program called CBLAST. From there, you can find geometrically similar structures that have been identified by VAST. To do this, follow these steps:

Search the Entrez Protein database for a text word or phrase of interest. (For tips on searching the Entrez Protein database, see Entrez Help, Entrez Sequences Help, and Entrez Nucleotide and Entrez Protein FAQs.)

On the Protein search results page, click on the title of a protein of interest to open its sequence record, then scroll down to the "Related Information" section in the right hand margin of the page. An option for "Related Structures" will be present if there are 3D structures that are sequence-similar to the protein currently displayed. Select that option.

The resulting page will display a list of 3D structures that are sequence-similar to your protein of interest. Click on the thumbnail of any structure to open its summary page. From there, follow the link for "Similar Structures: VAST+" to see a list of geometrically similar structures identified by VAST+.

Note: If the "Related Information" section of a protein sequence record also lists "Structure" as an option, that means the protein sequence was derived from a 3D structure record. In that case, you can also use the previous method for accessing 3D structures, if desired, to traverse from protein sequence → structure → similar structures.

3D coordinates of a query structure

The methods described above for accessing similar structures (enter a PDB ID or MMDB ID, or start with a Structure or Protein database search) will work if your query structure of interest is already available in the Molecular Modeling Database (MMDB), and they retrieve pre-computed VAST+ results.

If your query structure is not yet available in MMDB, then you can use VAST Search to find 3D similar structures. VAST Search allows you to input the 3D coordinates of a newly resolved structure (in PDB format) and compare it, in a live search, against all structures in MMDB to find its neighbors. A separate VAST Search Help document provides additional details.

Please note that, at this time, VAST Search still returns results in the original VAST format, listing structures that have similarities to individual protein molecules in your query structure. Although some of the returned structures might also have a biological unit that is similar to the query, the original style VAST results does not group results by biological unit similarity.

If you eventually submit your query structure to the Protein Data Bank (PDB), once your structure is publicly released by PDB, it will also become available in MMDB. At MMDB, your structure's VAST neighbors will be pre-computed (as part of the MMDB data processing procedure), grouped by biological unit similarity (complete or partial biological unit similarity), and ranked by the number of protein molecules in the query that simultaneously match the 3D shape of protein molecules in the VAST+ neighbor. Its VAST+ neighbors will then become accessible through the methods described above, and will be presented in the new style VAST+ display described in this document.

Output

query structure | similar structures | filters ( complete biounit matches, partial biounit matches, taxonomy) | folder tabs to select 3D alignment type (all matching molecules superposed, invariant substructure superposed) | ranked list of hits (sort options, view details for a similar structure, search within results) | "original VAST" button | null result: "no matched structure found" | "new search" button

The VAST+ search results page displays summary information about your query structure, followed by a list of similar structures that are ranked based on their degree of similarity to the query structure's macromolecular complex (biological unit) (illustrated examples). By default, all similar structures are listed, starting with complete matches to the query structure's biological unit, followed by partial matches, and ending with matches to individual protein molecules. If desired, you can activate a subset of filters in order to display only the subset of similar structures that meet the specified criteria. It is also possible to change the sort order of the similar structures, and to search within results to see if/where a specific structure falls within the search results. Click on the plus button beside any structure to see a detailed view comparing the query structure to a matched structure. (If you prefer to view your search results in the original style VAST display, which lists structures that have similarities to individual protein molecules, or similarities to individual 3D domains, in your query structure, click the "Original VAST" button.)

A VAST+ search results page includes the following components:

Query structure (summary information)

The top section of a VAST+ results display provides the following summary information about the query structure, including:

interactions schematic | molecular graphic | MMDB ID, PDB ID | biological unit | source organism | number of proteins | number of nucleotides | number of chemicals

Interactions schematic

The interactions schematic shows the molecular components of the query structure's biological unit and the interactions among them. The molecular components of the biological unit can include the following:

Proteins, if present, are shown as circles: etc.

Nucleotide sequences (DNA, RNA), if present, are shown as squares: etc.

Chemicals, if present, are shown as diamonds: etc.

If any protein or nucleotide molecules in the structure were generated by applying transformations from crystallographic symmetry, their labels are shown as alphanumeric combinations (for example, or ), indicating the source molecule from which they were generated and the copy number. Chemicals that interact only with such molecules were also generated by applying transformations from crystallographic symmetry.

The protein and nucleotide icons are scaled to show the relative sizes of those molecular components, so they are roughly comparable to each other based on molecular weight. All chemical icons are the same size.

Interactions among components are shown as lines, and an interaction is displayed only if there are at least 5 contacts at a distance of 4 Å or less between the heavy atoms of the molecules. (There is no meaning to the length of the lines in the interaction schematic. After the interactions are drawn, the diagram is flattened out to fit into the square, lengthening or shortening lines as needed.)

Because of the latter thresholds, ions that are part of the biological unit may be missing from the interaction diagram, but they will be listed in the table of molecular components and interactions on the query structure's MMDB summary page. Interactions for short peptides, or for molecule types other than protein, DNA/RNA, and chemical, are not calculated. Molecules, such as crystallization agents, etc., that are not part of the biologically active molecule are absent from both the interaction schematic and the molecular components list.

Mouse over any node in the schematic to view the molecule name.

Molecular graphic

The 3D molecular graphic shows a single static snapshot of the query structure's 3D shape, generated by the Cn3D program. In general, it shows the default biological unit of the structure.

To view the query's 3D structure interactively, click on its MMDB ID. That will open the corresponding structure summary page in MMDB, where you can choose to "View structure."

To view a 3D superposition of the query structure and any one of the VAST+ hits, Click on the plus button beside a hit to open the detailed view comparing the query structure to a matched structure, then click the "Visualize 3D structure superposition with Cn3D" button at the bottom of that view.

In either case, the interactive 3D view will open in iCn3D, NCBI's WebGL-based viewer.

There is no need to install a separate application in order to use iCn3D; you just need to use a web browser that supports WebGL. If your browser doesn't support WebGL, you might need to modify the settings in the browser to enable WebGL, or update your web browser to a newer version that supports WebGL. (See the WebGL site for more information about compatibility with various web browsers.)

MMDB ID and PDB ID MMDB ID is the unique identifier of the structure record in the Molecular Modeling Database (MMDB). It is a string of digits (e.g., 50885 for sheep prostaglandin H2 synthase) that are assigned consecutively to each structure record processed by NCBI.

PDB ID is the accession of number of the Protein Data Bank (PDB) record from which the corresponding MMDB record was derived. It is generally an alphanumeric combination (e.g., 1PTH, which served as the source record for MMDB ID 50885).

Biological Unit The biological unit ("biounit") of a structure is the biochemically active form of a biomolecule. It can range from a monomer (single protein molecule) to an oligomer of 100+ protein molecules, and is sometimes also referred to as the "biological assembly" or "macromolecular complex," reflecting the molecule's quaternary structure.

The query structure's biological unit, if available, is included in the summary information at the top of a VAST+ results page. (To see the biological unit of any individual VAST neighbor, and other details about the match, click on the plus button beside the structure of interest.)

The data processing section of the MMDB help document provides additional information about biological units, incuding the procedures used to identify biological units.

If the query structure does not have a biological unit defined in its source PDB file, it is only available as an asymmetric unit (i.e., raw structure data). In that case, the structure is not processed by VAST+, since VAST+ focuses on finding similarities among biological units. However, such structures are processed by the Original VAST algorithm, which may have found structures that are geometrically similar to individual proteins, or individual 3D domains, in your query. (Read more about the occasional cirsumstances in which a query structure might generate a null result in VAST+).

Source Organism The source organism(s) of the protein and/or nucleotide molecules in the structure record. If a structure record contains protein or nucleotide sequences from more than one organism (e.g., human AND HIV1), each source organism is listed and links to the corresponding taxonomic information in the NCBI Taxonomy database, including the organism's Taxonomy ID (TaxID) and lineage.

Number of proteins The total number of protein molecules in the biological unit of the query structure.
For brevity, only the first protein name is displayed by default.
Click on the down arrow to view the complete list of proteins in the biological unit.

The MMDB help document provides additional information about the molecular components of a structure, incuding its proteins.

Number of nucleotides The total number of nucleotide molecules (DNA and/or RNA) in the biological unit of the query structure.
For brevity, only the first nucleotide molecule's name is displayed by default.
Click on the down arrow to view the complete list of nucleotide molecules in the biological unit.

The MMDB help document provides additional information about the molecular components of a structure, incuding its nucleotide sequences.

Number of chemicals The total number of chemicals in the biological unit of the query structure.
For brevity, only the first chemical name is displayed by default.
Click on the down arrow to view the complete list of chemicals in the biological unit.

The MMDB help document provides additional information about the molecular components of a structure, incuding its chemicals.

Similar Structures

The VAST+ search results appear immediately beneath the query structure summary information, and include the following features and functions:

Folder tabs to select 3D alignment type:

All matching molecules superposed

Invariant substructure superposed

Filters to subset the results by:

number of matching molecules

complete match to biological unit of query structure

partial match to biological unit of query structure

taxonomy

Ranked list of hits: a table listing similar structures ranked by the extent of their similarity to the query structure's biological unit.

Sort Options

Aligned proteins
RMSD

Aligned residues

Sequence identity

View details for a similar structure

Search within results

"Original VAST" button

"No matched structure found"

"New Search" button

Folder tabs to select 3D alignment type

The VAST algorithm calculates two different types of 3D alignments between a pair of similar structures: an initial alignment that shows all matching molecules superposed, and a refined alignment that shows the invariant substructure superposed.

Each of the following folder tabs on the VAST+ output page includes the complete set of search results, listing all of structures that VAST+ found to be similar to your query structure.

However, a given structure might have a different rank in the first tab versus the second tab, because the statistics generated for that match (e.g., RMSD and number of aligned residues) generally differ between the two alignment types, and those statistics are used to determine the sort order of the search results.

All matching molecules superposed (initial alignment)

When you click on the plus button beside any structure to see a detailed view/graphical display comparing the query structure to the matched structure, the "visualize 3D structure superposition" button will open the initial 3D alignment, with all matching molecules superposed. The VAST algorithm section of this document provides more details about the initial alignment as well as an example.

Invariant substructure superposed (refined alignment)

When you click on the plus button beside any structure to see a detailed view/graphical display comparing the query structure to the matched structure, the "visualize 3D structure superposition" button will open the refined 3D alignment, with the invariant substructure superposed. The VAST algorithm section of this document provides more details about the refined alignment as well as an example.

Filters to subset the search results

An option to Display Filters appears above the ranked list of similar structures. This option generates bar graphs that: (1) provide aggregate information about the structures; (2) categorize the retrieved structures by their degree of similarity to the query structure and their taxonomic distribution; and (3) enable you to subset the results.

By default, all similar structures are listed in the results display. Activate the check boxes beside one or more bar graphs to view only the subset of similar structures that meet the criteria you selected. (Note: Selecting all of the check boxes is the same as selecting none (default), and will display the complete set of results. Selecting a subset of the checkboxes will display the subset of results that meet the desired criteria.)

Filter by number of matching molecules

Complete match to biological unit of query structure -- Each of the biopolymer molecules in the biological unit of the query structure have a match (i.e., are similar in 3D shape) to a biopolymer molecule in the subject. The illustrated examples of VAST+ results include complete matches, which are indicated by solid red circles.

Partial match to biological unit of query structure -- A subset of the biopolymer molecules, or an individual biopolymer molecule, from the biological unit of the query structure have a match (i.e., are similar in 3D shape) to a biopolymer molecule in the subject. The illustrated examples of VAST+ results include partial matches, which are indicated by half-filled red circles.

Additional notes about matching molecules:

Biopolymers refer to protein, DNA, and RNA molecules.

At this time, VAST (and VAST+) operates only on protein molecules. Some biological units might contain other types of biopolymers as well, such as DNA or RNA molecules. VAST and VAST+ do not currently compute similarities between the 3D shapes of DNA molecules, or similarities between the 3D shapes of RNA molecules.

Example: 1TUP, "Tumor Suppressor P53 Complexed With DNA," has a pentameric biological unit composed of three proteins and two nucleotide sequences. VAST will find other structures that have similarly shaped proteins, but will not take the DNA molecules into consideration. Therefore, there will be no "complete match" structures in the VAST+ search results for 1TUP, as it is not currently possible to find a match for all five molecules in the biological unit. The top scoring hits, which match all three protein molecules in 1TUP, are nevertheless easy to find as they are displayed at the top of the "partial match" results, based on the default sort order.

Mimimum size of protein: VAST does not act upon protein molecules that have fewer than three secondary structure elements (SSE's: alpha helices and/or beta sheets) because they can generate spurious hits. Such short proteins are therefore not considered in VAST+ calculations or results. For example, if a pentameric structure contains four large proteins, plus one small protein that doesn't meet the VAST size requirements, VAST will only operate on the four large proteins from the biological unit. Therefore, there will be no "complete match" structures in the VAST+ search results, and the top hits will be, at the most, structures that match all four long proteins.
Example: 2YPL, "Structural Features Underlying T-cell Receptor Sensitivity to Concealed MHC Class I Micropolymorphisms," contains four human proteins, plus one very small HIV protein. None of the structures in the VAST+ search results for 2YPL are listed as a complete hit, because the fifth protein in the 2YPL's biological unit does not meet the minimum size requirements for VAST. So the best possible match that VAST (and VAST+) can find is to 4 out of 5 proteins in the query structure's biological unit, and the top hits will be structures that, at the most, match all four of the long proteins. Each of those hits will be listed as a "partial match."

Pairwise alignments between individual protein components of the query structure and subject are compared to each other for compatibility, and compatible/matching alignments are clustered into sets of alignments that together constitute a biological unit match. Pairwise alignments are compatible: (1) if they do not share the same macromolecules, i.e., an individual protein molecule from the query structure structure cannot be aligned to two or more protein molecules from the subject at the same time; and (2) if they generate similar instructions (spatial transformation matrices) for the superpositions of co-ordinate sets. A simple distance metric can be used to compare transformation matrices, and it lends itself to cluster alignment sets efficiently.

Filter by Taxonomy

Filter results by domain (superkingdom) -- The "Filter by Taxonomy" options enable you to view the subset of similar structures from the "Eukaryota," "Bacteria," or "Archaea" domains (superkingdoms), from "Viruses," or from "Other" groups (such as synthetic constructs). Activate the check box(es) beside the group(s) of interest to see only the desired subset(s) of similar structures.

Note that a taxonomy filter will retrieve structures in which all biomolecules in the structure are from the selected superkingdom. For example, the "Viruses" filter will retrieve structures that only contain molecules from viruses, and not molecules from any other superkingdom. Structures that are composed of synthetic constructs, or composed of molecules from two or more superkingdoms, are included in the "Others" taxonomy category. For example, PDB ID 4N9F would be classified with a taxonomy of "Others" because it includes molecules from Human immunodeficiency Virus 1 and Homo sapiens, which fall into the superkingdoms of Viruses and Eukaryota, respectively.

Ranked list of hits

The VAST+ search results page lists all structures that were found to be similar in 3D shape to the query.

Default sort order: The similar structures are shown as a ranked list, based on the following scores:
(1) number of aligned proteins, (2) RMSD, (3) number of aligned residues, and (4) sequence identity, in that order.
It is also possible to sort by taxonomy, if desired.

The blue arrow that appears to the right of the column header indicates the primary sort criterion, and the direction (ascending/descending) of the sort.

If desired, you can override the default sort order by clicking any arrow beside a column header in the output table in order to choose that column as the primary sort criterion. In that case, your selected criterion is applied first, and the remaining sort criteria applied afterward (in their usual order). The arrow beside the selected column header will turn blue, indicating it is now the primary sort criterion, and indicating the direction of the sort (ascending or descending).

Sort Options (Scores)

Aligned proteins -- The number of protein molecules in the query structure have a match (i.e., similar 3D shape) to a protein molecule in the subject.

If all of the molecules in the query structure match molecules in the subject, then a solid red circle icon indicates there is a complete match to biological unit of query structure. (illustrated examples of VAST+ results)

If a subset of the molecules, or an individual molecule, from the query structure have matching molecules in the subject, then a half-filled red circle icon indicates there is a partial match to biological unit of query structure. (illustrated examples of VAST+ results)

RMSD -- The average root mean square deviation (RMSD), in Angstroms (Å), of all aligned residues between the query structure and the subject.

This number is calculated after optimal superposition of two structures, as the square root of the mean square distances between equivalent C-alpha atoms. Smaller RMSD values indicate greater structural similarity. (Note that the RMSD value scales with the extent of the structural alignments and that this size must be taken into consideration when using RMSD as a descriptor of overall structural similarity.)

The RMSD values displayed in VAST+ results might be much higher than those displayed in Original VAST results because the 3D alignment is optimized for the whole assembly rather than for individual protein molecules. Specifically, VAST+ takes all of the individual 3D domain alignments and calculates a global superposition. To do this, individual protein molecules might be shifted from their ideal pairwise superposition, in order to be able to generate a global superposition of all matched molecules between the query structure and the subject.

For example, Original VAST might give RMSD values of about 3-4 Å at the worst, and rarely up to 6 Å for very large structures. In contrast, VAST+ results might generate more hits with RMSD values of 5-6 Å or more, and a number of cases with large values such as 20 Å or more. This can be due to flexibility in the biological unit, which might indicate something interesting. An example is the chaperone protein 3J1B: Cryo-em Structure of 8-fold Symmetric Ratcpn-alpha in APO State. The VAST+ neighbors for 3J1B include 3J03: Lidless Mm-cpn in the Closed State With Atpalfx, with an RMSD value of 13.41 Å.

Aligned residues -- The total number of amino acids, from all protein molecules in the query structure's biological unit, that are spatially aligned to the subject.

Sequence identity -- The percent of aligned residues that are identical, averaged over all of the aligned protein molecules (rounded to the nearest whole number on the VAST+ results display).

Taxonomy -- Click on the up/down arrows in the taxonomy header to sort hits alphabetically by genus name.
If you choose to sort by taxonomy, the similar structures are sorted based on the following criteria:
(1) alphabetically by genus, then within a genus, the structures are sorted based on (2) number of aligned proteins, (3) RMSD, (4) number of aligned residues, and (5) sequence identity, in that order.
(Note: If you would like to only see the subset of search results from a specific superkingdom, use the taxonomy filter to select the desired superkingdom, then click on the arrows in the "Taxonomy" column header of the search results table to sort the structures from the superkingdom alphabetically by genus.)

View details for a similar structure

Click on the plus button beside any structure to see a detailed view comparing the query structure to the matched structure. A separate section of this help document provides a description of the detailed view.

Search within results

The "Search within results" text box enables you to enter a PDB ID, MMDB ID, or text word from the title of a structure of interest to find out if it among the similar structures retrieved by VAST, and if so, to instantly take you to the page of results on which it appears, with your structure of interest highlighted {in yellow}.

If you are viewing only a subset of the search results (for example, because you have checked a box beside a bar graph of interest to filter the results), and your structure of interest is not in that subset but is in the overall search results, VAST will display a relevant message that includes a link which opens the complete set of search results, automatically highlighting your structure of interest.

If, on the other hand, you enter a PDB ID of a structure that is not listed among the VAST results, you will see an error message indicating that your structure of interest was not among the results.

Example: Retrieve the VAST results for 1HBB: "High-resolution X-ray Study of Deoxyhemoglobin Rothschild 37beta Trp-> Arg: a Mutation That Creates an Intersubunit Chloride-binding Site (human hemoglobin)." Let's say that you suspect 1UMO ("The Crystal Structure Of Cytoglobin: The Fourth Globin Type Discovered In Man") will be one of the structures retrieve and you quickly want to find where that structure falls within the ranked list of similar structures. To do that, simply enter the PDB ID 1UMO (or its corresponding MMDB ID 66087) in the "search within results" text box, and you will be taken immediately to the appropriate page of search results, with 1UMO highlighted in yellow. You can then click on the plus button to the left of that structure in order to see more details about the comparison between 1HBB and 1UMO. If you enter the PDB ID or MMDB ID of a structure that is not similar (as determined by VAST+) to the query structure, a pop-up message will appear that says, "XXX is not a simlar structure of 1HBB."

"Original VAST" button (on the search results page)

An "Original VAST" button is located near the top of a VAST+ search results pageIt gives you the option to view your search results in the original style VAST display, which lists structures that have similarities to individual protein molecules or individual 3D domains in your query structure, and allows you to view their sequence alignments and 3D superpositions. An illustrated example of the original-style VAST results is provided in the Graphical Displays section of this document.

(In contrast, the new VAST+ search results rank similar structures based on their biological unit similarity to the query structure (illustrated examples of VAST+ results), and provide the ability to view sequence alignments and superpositions of the biological units)

A separate section of this document describes the difference between Original VAST and VAST+.

"No matched structure found" (null result)

VAST+ results are available for most, but not all, structures. If your query generates the message, "No matched structure found," it can be for one of the reasons listed in the table below. If your query does not have VAST+ neighbors, but does have Original VAST neighbors, follow the link provided in the error message to view the available results. If your query structure is relatively new, try your search again at a later date, as VAST (and VAST+) neighbors are calculated/updated weekly.

Circumstances under which a structure has neither VAST+ nor Original VAST results:

VAST neighbors not yet calculated VAST neighbors are currently being calculated for your query structure and are therefore not yet available. Try your search again in several days, as VAST (and VAST+) neighbors are calculated/updated on a weekly basis.

structure is too small In order to be processed by VAST, a protein molecule must have at least three secondary structure elements (alpha helices and/or beta sheets). If an individual protein in a structure contains only one or two secondary structure elements, that protein will not be processed by VAST. Structures that are composed entirely of such proteins will therefore have no VAST neighbors.

structure has no protein Currently, VAST works only on protein molecules. If a structure does not contain any proteins, for example, if it contains only DNA or RNA molecules, it will not be processed by VAST, and will therefore have no VAST neighbors.

structure is unique The structure is large enough to be processed by VAST+ (i.e., has at least three secondary structure elements), but has a unique 3D shape which has not been found among any of the other publicly available structures.

Circumstances under which a structure has no VAST+ results, but does have Original VAST results:

VAST+ neighbors not yet calculated VAST+ neighbors are currently being calculated for your query structure and are therefore not yet available. As noted in the overview section of this document, the VAST+ is an extension of, and built upon, the original VAST system. Therefore, there may be some lag between the availability of original VAST data and the subsequent appearance of VAST+ data in the system. Try your search again in several days, as VAST (and VAST+) neighbors are calculated/updated on a weekly basis.

structure has no biological unit defined If a structure record does not have a biological unit defined in its source PDB file, it is only available as an asymmetric unit (i.e., raw structure data). In that case, the structure is not processed by VAST+, since VAST+ focuses on finding similarities among biological units. However, such structures are processed by the Original VAST algorithm, which may have found structures that are geometrically similar to individual proteins, or individual 3D domains, in your query structure.

merged structure Merged structures, which have been assembled from PDB split files, do not have a biological unit defined, and are therefore not processed by VAST+. However, such structures are processed by the original VAST algorithm, which may have found structures that are geometrically similar to individual proteins in your query structure. In that case, view the original VAST results for your query structure.

structure is too large Structures that have biological units composed of 60 or more protein molecules are not currently processed by VAST+. However, such structures are processed by the original VAST algorithm, which may have found structures that are geometrically similar to individual proteins in your query structure. In that case, view the original VAST results for your query structure.

structure has no full length
protein neighbor The structure has been neighbored by VAST, but did not generate VAST results for the any full-length protein molecule in the query, so the structure has not been considered for VAST+.

VAST+ results only list structures that have a complete match or partial match to your query structure's biological unit, or have a match to at least one full protein molecule in the query structure.

Even if no full length protein matches are found, the query structure might still have hits in Original VAST, because it is still possible that a portion of a protein in the query structure, or an individual 3D domain, has a match to another structure. In that case, view the original VAST results for your query structure.

"New Search" button (on the search results page)

To retrieve the VAST+ results for a different structure, simply enter its PDB ID or MMDB ID in the text box in the upper right hand corner of a VAST+ results page and press the "New Search" button. Or, enter either ID on the VAST+ query page.

Detailed Views/Graphical Displays for Individual Hits

detailed view comparing the query structure and a matched structure | sequence alignment views | 3D views | original style VAST display

Detailed View Comparing the Query Structure and a Matched Structure

The VAST+ search results page provides a concise list of the structures that are similar in 3D shape to your query structure, listing the PDB ID and description of each one, along with statistical information for the match. Click on the plus button beside any structure to see a detailed view/graphical display comparing the query structure to the matched structure.

Each of the illustrated examples of VAST+ search results features a detailed view for one of the matched structures:
IB26 (Glutamate Dehydrogenase, Thermotoga maritima vs. 1GTM (Glutamate Dehydrogenase, Pyrococcus furiosus)
1HBB (Deoxhemoglobin, Homo sapiens) vs. 1FSL (Leghemoglobin, Glycine max).

The detailed view includes:

Aligned Molecules

The detailed view of a VAST+ hit shows a table of "Aligned Molecules," which provides a graphical overview of the extent of the 3D similarity between the query structure and the matched structure (i.e., the VAST+ "neighbor"). It includes a clickable interaction schematic for each structure, and a clickable list of matching proteins, as described below.

The MMDB ID for each structure links to the corresponding MMDB summary page, which provides additional details about the structure's molecular components (proteins, nucleotides, chemicals), links to associated publications, and more. The summary page also provides a "View structure" link that opens an interactive 3D view of the individual structure. To view a 3D superposition of the query structure and the matched structure, click the "Visualize 3D structure superposition with Cn3D" button beneath the lists of Aligned Molecules. In either case, NCBI's free Cn3D program (version 4.3.1 or later) must be installed on your computer and configured as a helper application for your browser in order for the "View structure" or "Visualize 3D structure superposition with Cn3D" buttons to open the 3D view.

Interaction schematic of query structure vs. matched structure

The interaction schematics for both the query structure and the matched structure (VAST+ neighbor) are displayed side by side. Protein molecules that have a match are designated by a broken (dashed) outline. Protein molecules that do not have a match are shown with a solid outline. Click on the icon of any protein molecule in either one of the interaction schematics to see the protein's name and to highlight the corresponding molecule in the other structure. When you click on an icon, its broken (dashed) outline will change from red to black, and its protein name will be highlighted in yellow. The matching protein in the other structure will be highlighted in the same way.

The molecule that is highlighted in the query structure's interaction schematic and protein list is referred to as the selected protein, and is the focal point of the 3D and sequence alignment displays. For example, if you select a protein from the query structure and then press the button for "Visualize 3D structure superposition (with Cn3D app)" or "View aligned sequences," the 3D view and sequence alignment windows will be centered on the selected protein. You can then rotate the 3D view and scroll through the sequence alignment window to see other proteins, if desired, but the initial view will be centered on the selected protein.

List of matching protein molecules

Beside each interaction schematic is a list of the proteins that are spatially aligned to a protein in the matching structure. Click on the name of any protein to highlight that protein in both the list and in the interaction schematic. This action will also automatically highlight the name and icon of the matching protein in the other structure.

Regardless of how many proteins are present in the biological unit of the query structure or subject, the list of protein names shows only the subset of proteins that have 3D similarity to each other. (You can still see the names of other proteins in the query or subject, however, by clicking on the icons in the interaction schematics.)

Sequence Alignment Views

VAST does two things for each pair of similar proteins: it calculates an optimal 3D superposition for the conserved core, and constructs a sequence alignment based on the correlation of the 3D structures.

The detailed view of a query vs. subject structure comparison (which opens when you click on the plus button beside any hit) allows you to view the pairwise sequence alignments of matching protein pairs. To do this, click on the "View aligned sequences" button that appears beneath the list of matching protein molecules. A separate window will open, showing the pairwise sequence alignments for all matching protein molecules. If you selected a protein from the query structure's interaction schematic or protein list before pressing the "View aligned sequences" button, the sequence alignment display will automatically scroll to the selected protein.

The sequence alignments shown in VAST use the following conventions:

UPPER CASE LETTERS represent amino acids that are aligned in 3D space,

LOWER CASE LETTERS represent unaligned amino acids, and

RED LETTERS represent identical amino acids.

Because the protein alignments produced by VAST do not require sequence similarity (but instead simply require a common spatial position of amino acids), these views do not necessarily represent alignments of evolutionarily conserved sequence blocks. However, the 3D alignments often coincide with evolutionarily conserved sequence blocks.

3D Views

Superposition of query and similar structure in 3D

The detailed view of a query vs. subject structure comparison (which opens when you click on the plus button beside any hit) provides a "Visualize 3D structure superposition" button at the bottom of the table of aligned molecules. That button opens a menu that allows you to choose which program to use for interactively viewing the superposition:

iCn3D (Web) - NCBI's WebGL-based viewer that provides interactive views of macromolecular structures and chemicals without the need to install a separate application.

Note: To use iCn3D, you just need to use a web browser that supports WebGL. If your browser doesn't support WebGL, you might need to modify the settings in the browser to enable WebGL, or update your web browser to a newer version that supports WebGL. (See the WebGL site for more information about compatibility with various web browsers.)

iCn3D's 3D view window will show the complete biological unit of the query structure and its VAST+ neighbor.

Click on iCn3D's "View Sequence" button, if desired, to open a sequence view window that shows all pairs of aligned sequences from the query structure and its VAST+ neighbor.

Note that iCn3D can show two different types of 3D superpositions: (1) initial alignment, in which all matching molecules are superposed, and (2) refined alignment, in which an invariant substructure is superposed).

Cn3D (App) - a helper application for your web browser, available for Windows and Macintosh, which simultaneously displays structure, sequence, and alignment, and has powerful annotation and alignment editing features

Note: In order for this selection to work, you must have Cn3D 4.3.1 installed on your computer.

Cn3D's 3D view window will show the complete biological unit of the query structure and its VAST+ neighbor. The 3D view will be centered on the selected protein (i.e., on the protein that was highlighted in the query structure's interaction schematic before you pressed the "Visualize 3D structure superposition" button)

Cn3D's "Sequence/Alignment Viewer" window will open automatically, showing the selected protein that you chose in the query structure's interaction schematic (or showing structure's first pair of aligned sequences from the query structure and its VAST+ neighbor, if you didn't click on any protein in the interactions schematic).
To view a different pair of aligned sequences, you can either:

go back to the VAST+ search results page and click on the protein of interest in the query structure's interaction schematic (and then follow the options again to "Visualize 3D structure superposition > Cn3D (app)" or

choose the "Select > Pick Structures" menu option in Cn3D and then use CTRL+Click in the list of proteins to select the new protein you'd like to see in the "Sequence/Alignment Viewer" window.

The folder tabs on the VAST+ search results page affect which type of alignment will be shown in the 3D view.
For example, when you click on the "Visualize 3D structure superposition" button, you will either see:

all matching molecules superposed (initial alignment)
OR

invariant substructure superposed (refined alignment),

depending on which folder tab you have selected.

The VAST algorithm section of this document provides more details about the initial alignment and refined alignment, as well as an example of each.

Original style VAST display

The original style VAST display:

lists each protein molecule and 3D domain in the asymmetric unit of the query structure, and

retrieves structures that are similar in shape to any individual protein molecule or 3D domain, and enables you to view their sequence alignments and 3D superpositions.

An example of the original style VAST display is illustrated below.

Part 1 of the illustration shows colored bars that represent the compact substructures, or 3D domains, detected by VAST in the query structure's protein molecule(s). These 3D domains serve as the fundamental unit of structure comparison. (The data processing:geometrical features section of the MMDB help document provides more information about how the 3D domains and similar structures are identified.) To view the 3D domains that have been identified in a protein molecule, open the MMDB structure summary page for a structure of interest, sroll down to the table of molecules and interactions, and click on the "show annotation" link for the protein of interest. (For example, open the structure summary page for 1PTH.)

Part 2 of the illustration shows the original-style VAST display, which consists of a table summarizing the protein molecules and 3D domain in the query structure, and the number of structures that are geometrically similar to each individual protein molecule or 3D domain in your query structure. For any protein molecule or 3D domain of interest, click on the link in the "# of Related Structures" column to view a list of similar structures. The list includes a graphical display of alignment footprints and provides options to view sequence alignments and superposed structures.

To explore the example interactively, you can open a live web page with original-style VAST results for 1PTH, or click on the image below to open an interactive view of the 3D alignment of 1PTH's protein A, domain 1 and a sample similar structure, 1EQG (Ovine Cox-1 Complexed With Ibuprofen). (Please note that Cn3D 4.3.1 must be installed in your computer in order for the file to open. The Cn3D Tutorial provides additional details about viewing structure alignments in Cn3D.)

The original style VAST display is still accessible by clicking the "Original VAST" button near the top of a VAST+ search results display. The features, functions, and graphics of the original style VAST display are described in the original VAST help document.

Example - Original style VAST display (as of 07 November 2013) for 1PTH, "The Structural Basis of Aspirin Activity Inferred From the Crystal Structure of Inactivated Prostaglandin H2 Synthase" (sheep prostaglandin H2 synthase)

Open a live web page with original-style VAST results for 1PTH.

In contrast to the original VAST display shown above, which focuses on similarities between individual protein molecules or 3D domains, the newer VAST+ display groups 3D-similar structures based on their degree of similarity (complete or partial) to the macromolecular complex (biological unit) of the query structure; ranks them by the number of protein molecules in the query that simultaneously match the 3D shape of protein molecules in the VAST neighbor; and enables you to view the sequence alignments and 3D superpositions of the biological units. The VAST+ help document provides illustrated examples of VAST+ results and additional details about the difference between VAST and VAST+.

VAST+ Algorithm

overview | illustration | additional details | 3D alignments (initial alignment (all matching molecules superposed), refined alignment (invariant substructure superposed)) | size limits (minimum, maximum)

Overview of method used to identify similarly shaped macromolecular complexes

VAST+ is a tool designed to compare 3-dimensional structures, with an emphasis on finding those with similar macromolecular complexes of more than one protein. The similarities are calculated using purely geometric criteria, without regard to sequence similarity, and therefore can identify distant homologs.

VAST+ is built upon the original Vector Alignment Search Tool (VAST). While original VAST finds 3D similarities between individual protein molecules or individual 3D domains, VAST+ expands the capabilities of that program by making it possible to find macromolecular structures that have similarly shaped biological units (also referred to as "biounits"), using the method illustrated below. The section after the illustration provides additional details about the various steps in the process. The similar structures found by VAST and VAST+ are often referred to as "neighbors."

Illustration: VAST+ algorithm



	NOTE: You can use iCn3D to open an interactive view of the initial alignment of 1HHO and 4N7N, and an interactive view of the refined alignment of 1HHO and 4N7N, which are the sample structures for human oxy- and deoxyhemoglobin that are featured in the illustration above.

Additional details about the method used to identify similarly shaped macromolecular complexes

The process of identifying similar macromolecular complexes includes the following steps:

MMDB Data Processing:

Identifies geometrical features within a structure:

secondary structures

3D domains

Identifies biological units within a structure

Original VAST:

Compares the shape of individual 3D domains and individual protein molecules throughout the Molecular Modeling Database. (Details about the original VAST algorithm are provided in the references noted at the end of this file and in the original VAST help document.)

VAST+:

Compares the shape of oligomers throughout the Molecular Modeling Database, using the method illustrated above.

Finds structures that have a complete match to the query structure's biological unit

Finds structures that have a partial match to the query structure's biological unit

Calculates scores, and use these to rank and sort hits by:

number of aligned proteins

RMSD

number of aligned residues

percent sequence identity

Generates a 3D alignment of biological units (macromolelcular complexes), as shown in the 4th and 5th sections of the illustration above

Initial alignment (all matching molecules superposed):

The initial superposition of a query structure and its VAST+ neighbor uses the complete set of individually aligned macromolecules and corresponding matching amino acids to calculate a superposition of the complex structures.

The spatial distance between the structures is then calculated as the root-mean-square deviation (RMSD), which is a measure of the average distance between the atoms (equivalent C-alpha atoms) of the superposed proteins.

For example, when comparing the structures of human hemoglobin in the oxy and deoxy form (1HHO and 4N7N, respectively), a total of 572 amino acids are aligned, with an RMSD of 1.02 Angstroms, as shown in the 4th section of the illustration above.

You can use iCn3D to open an interactive view of the initial alignment of 1HHO and 4N7N, which are the sample structures for human oxy- and deoxyhemoglobin that are featured in the illustration above.

Note: After opening the interactive view, notice that the iCn3D URL ends with an "atype" parameter, which enables you to specify the desired alignment type. A value of "atype=0" loads the initial alignment. The iCn3D help document includes additional details about the URL format that can be used to open structures directly in iCn3D, including a list of optional parameters, such as "atype" and more.

Refined alignment (invariant substructure superposed):

Using the RMSD from the initial alignment as a threshold, VAST+ identifies a subset set of amino acids that have highly similar 3D positions in the query and subject complex structure, and uses it to create a refined alignment that allows identification and visualization of the most similar portion of the structures (the structurally invariant core of the assembly), as well as the differences between the structures.

To do this, it finds all amino acids pairs that have a distance equal to, or less than, the original RMSD, and then regenerates the superposition to create a more finely tuned alignment for that subset of amino acids.

As a result, the total number of aligned amino acids and the RMSD generally decrease compared to the initial alignment.

For example, the refined alignment of human hemoglobin in the oxy and deoxy form (1HHO and 4N7N, respectively), has a total of 221 amino acids aligned (compared to the initial 572), and an RMSD of 0.39 Angstroms (compared to the initial 1.02 Angstroms), as shown in the 5th (last) section of the illustration above.

The interpretation of the refined alignment is that the alpha-beta dimers are the most stable part of the tetramer, in the oxy- vs. deoxy- tetramers. That is, the main conformational change is a shift in the relative positions of the alpha-beta dimers upon oxygen binding (or unbinding).

This can be readily visualized in iCn3D by opening an interactive view of the refined alignment of 1HHO and 4N7N and then toggling between the two structures by clicking the "Alternate Selection" button.

Note: After opening the interactive view, notice that the iCn3D URL ends with an "atype" parameter, which enables you to specify the desired alignment type. A value of "atype=1" loads the refined alignment. The iCn3D help document includes additional details about the URL format that can be used to open structures directly in iCn3D, including a list of optional parameters, such as "atype" and more.

In general, there is not going to be a unique subset of amino acids that have the best alignment. For example, in the hemoglobin tetramer case above, one subset (solution to the problem) is going to be (roughly) one of the alpha-beta dimers, another possibility is the other alpha-beta dimer.

In either case, the refined alignment allows identification and visualization of the most similar portion of the structures (the structurally invariant core of the assembly), as well as the differences between the structures.

Size limits

Minimum size of structures that are neighbored:

VAST, and therefore VASTS+, only operates on protein molecules that contain three or more secondary structure elements (SSEs)

Maximum size of structures that are neighbored:

VAST+ does not work on MMDB merged structures, because the biological unit of the complete structure is not specified in a computer readable way in the source PDB files. (However, structures that are similar to the individual protein molecules within the merged structure have been calculated by original VAST. To access them, open the MMDB structure summary page for the merged structure of interest and click on the "Similar Structures: VAST+" button near the top of the page. The resulting page will display a note that "no VAST+ neighbors are available," along with a table showing Original VAST results.)

Log of changes to VAST+

27 July 2016 A new version of VAST+ was released. It provides a refined structure-based alignment of similar macromolecular complexes, and displays the 3D superpositions in the recently released iCn3D, a WebGL-based structure viewer. The initial alignment of similar macromolecular complexes, which became available with the first release of VAST+, uses the complete set of individually aligned macromolecules and corresponding matching amino acids to calculate a superposition of the complex structures. In contrast, the new release of VAST+ identifies a subset set of amino acids that have highly similar 3D positions in the query and subject complex structure, and uses it to create a refined alignment that allows identification and visualization of the most similar portion of the structures (the structurally invariant core of the assembly), as well as the differences between the structures.

An illustration summarizes the general concepts in the VAST+ algorithm and includes an example of the initial alignment, as well as the refined alignment, for 1HHO and 4N7N (oxy- and deoxy- forms of human hemoglobin).

iCn3D provides an interactive view of the initial alignment, and an interactive view of the refined alignment, of 1HHO and 4N7N, which are the sample structures for human oxy- and deoxyhemoglobin featured in the illustration mentioned above.

01 NOV 2013 Initial release of VAST+, a tool that identifies macromolecules that have similar 3-dimensional structures, with an emphasis on finding similar macromolecular complexes. The similarities are calculated using purely geometric criteria, without regard to sequence similarity, and therefore can identify distant homologs. VAST+ is built upon the original Vector Alignment Search Tool (VAST), and expands the capabilities of that program by taking into account the biological unit ("biounit") of each structure, not just individual protein molecules or their substructures. A recent publication provides details and this VAST+ help document includes a comparison of original VAST and VAST+, as well as examples of how can VAST+ be used to learn more about proteins. (Please note: in order to view the 3D superpositions of similar biological units, you must install the most recent version of the NCBI molecular viewing software, Cn3D 4.3.1.)

The VAST Search tool, which accepts input of a query structure's 3D coordinates and returns original-style VAST results, continues to be available.

References

Citing VAST:

Citing VAST+

Madej T, Lanczycki CJ, Zhang D, Thiessen PA, Geer RC, Marchler-Bauer A, Bryant SH. MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res. (Epub 2013 Dec 6) doi: 10.1093/nar/gkt1208 [PubMed PMID: 24319143] [Full Text]

Citing Original VAST

Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun; 6(3): 377-85. [PubMed PMID: 8804824]

Citing similarity scores

Panchenko AR, Madej T. Analysis of protein homology by assessing the (dis)similarity in protein loop regions. Proteins 2004 Nov 15; 57(3): 539-47. [PubMed PMID: 15382231] [Free Author Manuscript in PubMedCentral]

Citing homologous cores and loops

Madej T, Panchenko AR, Chen J, Bryant SH. Protein homologous cores and loops: important clues to evolutionary relationships between structurally similar proteins. BMC Struct Biol. 2007 Apr 10; 7(1): 23. [PubMed PMID: 17425794] [Full Text] more articles on VAST...

Additional References

A separate page lists all publications about NCBI's 3D Macromolecular Structures Resources, including those listed here plus articles by the NCBI Structure group describing the results of computational biology research on the Molecular Modeling Database.

Revised 17 November 2017