U.S. flag

An official website of the United States government

Identical Protein Groups FAQs

What is the difference between the Identical Protein Groups resource (IPG) and the Identical Protein Reports?

Identical Protein Reports present information about sets of nucleotide protein coding region annotations and protein accessions, for which the protein sequence is 100% identical per amino acid residue and in length. The IPG resource is the searchable collection of Identical Protein Reports. IPG search results are the Identical Protein Reports. Joining the reports into a single resource provides a smaller search result set to investigators, making it easier to identify a protein sequence of interest.

Why is a particular accession number included in the header of an IPG report?

The accession number in the header of an IPG is the 'best' member of the group. The hierarchy for determining the 'best' member is:  RefSeq  >  SwissProt > PIR,PDB > GenBank > Patent

How is the product name for a particular IPG determined?

The product name displayed for an IPG is the product name of the 'best' member of that group. The hierarchy for determining the 'best' member is:  RefSeq  >  SwissProt  > PIR,PDB > GenBank > Patent

How do I find proteins from a particular source, such as RefSeq?

The sources of proteins in the Identical Protein Groups are RefSeq, SwissProt, PIR, PDB, GenBank, and Patents. You can select for a particular source either on the search results themselves or within a specific report.

(1)    Search results: Once you have completed your search, you can use the facet “SourceDB” in the left-hand side of the search page to select only those IPG groups that contain proteins from that source.

(2)    Report page: Within a particular IPG report, you can select the proteins from a particular source by choosing that source in the column labeled “Source”.

Why can an IPG report contain more rows than coding regions?

For GenBank and RefSeq proteins, each protein exists as annotation from a nucleotide record, so each protein accession has a corresponding coding region. However, the Identical Protein Groups resource includes proteins that are not just annotations; these protein-only accessions come from SwissProt, PDB, and patent (PAT) records which do not have corresponding nucleotide coding regions.

How can a single protein accession be included in multiple coding regions?

The non-redundant RefSeq proteins whose accessions begin with WP_ are generally present in multiple nucleotide records since each WP_ accession represents a unique protein translation across all the RefSeq prokaryotic genomes. See https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ for more information about this non-redundant protein set.

Another way that a single protein accession can be included in multiple coding regions is when the annotated nucleotide is a component of a CON record (a scaffold or chromosome). In that case, when the scaffold or chromosome is not annotated itself, then the annotation from its components bubbles up for display on the CON record, and both component and scaffold coding regions are presented in the Identical Protein Report. For example, EEX54505.1 is annotated on ACZS01000003.1, which is a component of scaffold GG704829.1, and EEX54505.1 is also displayed up on scaffold GG704829.1.

Why are some proteins present in multiple species?

For each group in the IPG resource the lowest taxonomic point that is common to all members of the group is presented. Some proteins are present in a single species, but many are found in sequences from different species or genera, and the BLAST name of that taxonomic node is presented. BLAST names provide an abbreviated, vernacular view of the classification and are used for display purposes (for example, in BLAST and in the Taxonomic Groups trees in various resources) when a species name might not be generally recognizable. Some examples of BLAST names are firmicutes, enterobacteria, ascomycetes, mammals, and primates.

Why are some proteins present in multiple kingdoms?

Proteins that are present in multiple kingdoms are generally mobile elements, or viral or phage proteins. In some cases it is not unexpected to find those proteins in both the host and the virus or phage. In other cases the sequence is a commonly used selective marker from bacteria so may be present in eukaryotic synthetic constructs (eg, chloramphenicol acetyltransferase). In other cases the presence of the same protein across kingdoms may indicate sample mix-up, misclassification, or contamination.

How do I subset a large IPG by taxonomic group?

To limit the results within an IPG report, you can use the Taxonomic Groups tree in the upper right-hand corner of the page. The tree represents all of the proteins within that IPG group. Clicking on a particular branch of the tree will limit the report to only those proteins in that taxonomic lineage. To return back to the full list, click the “Show all” link that appears at the top of the table.

Support Center

Last updated: 2017-06-30T08:22:48-04:00