NCBI Home Page NCBI Site Search page NCBI Guide that lists and describes the NCBI resources
Conserved domains on  [gi|12321254|gb|AAG50698|]
View 

copia-type polyprotein, putative [Arabidopsis thaliana]

Protein Classification

Graphical summary

 Zoom to residue level

show extra options »

Show site features     Horizontal zoom: ×

List of domain hits

Name Accession Description Interval E-value
RVT_2 super family cl06662
Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually ...
832-1075 1.61e-97

Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses. This Pfam entry includes reverse transcriptases not recognized by the pfam00078 model.


The actual alignment was detected with superfamily member pfam07727:

Pssm-ID: 400190  Cd Length: 243  Bit Score: 311.83  E-value: 1.61e-97
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    832 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVeRYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIH 911
Cdd:pfam07727    1 NETWTLVKLPKNVKPIGTTWVHTHKINDLKEV-QYKARLVAQGFRQIAGEDYDKVFSPVIRLSSVRLLLAIAAEYEWPVH 79
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    912 QMDVKSAFLNGDLEEEVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKE 991
Cdd:pfam07727   80 HMDVSSAFLNGDIDEEIYVKQPPGFNIDNESGKVWQLNKSLYGLKQAPYMWNTCITKVLMDLNFEPDTAESGMYCRGFGE 159
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    992 DILIACLYVDDLIFTGNNPSMFEEFKKEMTKEFEMTDIGLMSYYLGIEVKQEDNGIFITQEGYAKEVLKKFKMDDSNPVC 1071
Cdd:pfam07727  160 NKLIVGLYVDDMFITGSDITIINDFKLELAKHFKMKDLGDISEFLGIEFIQIAGGIRLSQHNYLNSVIKKFNLTNNNGKY 239

                   ....
gi 12321254   1072 TPME 1075
Cdd:pfam07727  240 TPII 243
RNase_HI_RT_Ty1 cd09272
Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) ...
1159-1298 1.30e-81

Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) enzymes are divided into two major families, Type 1 and Type 2, based on amino acid sequence similarities and biochemical properties. RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner in the presence of divalent cations. RNase H is widely present in various organisms including bacteria, archaea, and eukaryotes. RNase HI has also been observed as adjunct domains to the reverse transcriptase gene in retroviruses, in long-term repeat (LTR)-bearing and non-LTR retrotransposons. RNase HI in LTR retrotransposons perform degradation of the original RNA template, generation of a polypurine tract (the primer for plus-strand DNA synthesis), and final removal of RNA primers from newly synthesized minus and plus strands. The catalytic residues for RNase H enzymatic activity, three aspartatic acids and one glutamic acid residue (DEDD) are unvaried across all RNase H domains. Phylogenetic patterns of RNase HI of LTR retroelements is classified into five major families, Ty3/Gypsy, Ty1/Copia, Bel/Pao, DIRS1, and the vertebrate retroviruses. The Ty1/Copia family is widely distributed among the genomes of plants, fungi, and animals. RNase H inhibitors have been explored as an anti-HIV drug target because RNase H inactivation inhibits reverse transcription.


:

Pssm-ID: 260004  Cd Length: 140  Bit Score: 263.56  E-value: 1.30e-81
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1159 VGYSDSDWGGDVDDRKSTSGFVFYIGDTAFTWMSKKQPIVTLSTCEAEYVAATSCVCHAIWLRNLLKELSLPQEEPTKIF 1238
Cdd:cd09272    1 EGYSDADWAGDPDDRRSTSGYVFFLGGGPISWKSKKQTTVALSSTEAEYIALAEAAKEALWLRRLLEELGIPLDGPTTIY 80
                         90       100       110       120       130       140
                 ....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1239 VDNKSAIALAKNPVFHDRSKHIDTRYHYIRECVSKKDVQLEYVKTHDQVADIFTKPLKRE 1298
Cdd:cd09272   81 CDNQSAIALAKNPVFHSRTKHIDIRYHFIREKVEKGEIKVEYVPTEDQLADILTKPLPRP 140
Retrotran_gag_2 super family cl26047
gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains ...
72-196 3.09e-20

gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains LTR-polyproteins, or retrotransposons of the copia-type.


The actual alignment was detected with superfamily member pfam14223:

Pssm-ID: 464108  Cd Length: 130  Bit Score: 88.06  E-value: 3.09e-20
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254     72 LIYQGLDEDTFEKVVEATSAKEAWEKLRTSYKGADQVKKVrlqTLRGEFEALQMKEGELVSDYFSRVLTVTNNLKRNGEK 151
Cdd:pfam14223    3 LIVLSLSDSLLRLVRNADTAKEAWDKLESTYERKSPANKL---TLRRQLHSLKMKEGESVLEHINKFEELVNKLSALGVE 79
                           90       100       110       120
                   ....*....|....*....|....*....|....*....|....*
gi 12321254    152 LDDVRIMEKVLRSLDLKFEHIVTVIEETKDLeaMTIEQLLGSLQA 196
Cdd:pfam14223   80 ISDEDLVVKLLRSLPESYENFVTAIESSSDK--ITLEELISKLLD 122
transpos_IS481 super family cl41329
IS481 family transposase; null
523-684 1.92e-18

IS481 family transposase; null


The actual alignment was detected with superfamily member NF033577:

Pssm-ID: 468094 [Multi-domain]  Cd Length: 283  Bit Score: 87.26  E-value: 1.92e-18
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   523 AQKPLELIHTDVC--GPIKPKslGKSnYFLLFIDDFSRktWVYFLKEKSEVFEIFKKFKAHVEKESGLVIKTMRSDRGGE 600
Cdd:NF033577  124 RAHPGELWHIDIKklGRIPDV--GRL-YLHTAIDDHSR--FAYAELYPDETAETAADFLRRAFAEHGIPIRRVLTDNGSE 198
                          90       100       110       120       130       140       150       160
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   601 FTSK--EFLKYCEDNGIRRQLTVPRSPQQNGVAERKNRTILE--MARsmlkskRLPKELWAEAVACAVYLL---NRSPTK 673
Cdd:NF033577  199 FRSRahGFELALAELGIEHRRTRPYHPQTNGKVERFHRTLKDefAYA------RPYESLAELQAALDEWLHhynHHRPHS 272
                         170
                  ....*....|.
gi 12321254   674 SVSGKTPQEAW 684
Cdd:NF033577  273 ALGGKTPAERF 283
gag_pre-integrs pfam13976
GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements ...
443-510 2.09e-16

GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements and lies just upstream of the integrase region on the polyproteins.


:

Pssm-ID: 372857  Cd Length: 67  Bit Score: 74.71  E-value: 2.09e-16
                           10        20        30        40        50        60
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....
gi 12321254    443 RMFVLNIRNDIAQCLKMCYKE-ESWLWHLRFGHLNFGGLELLSRKEMVRGLPCINhpNQVCEGCLLGKQ 510
Cdd:pfam13976    1 GLYLLDLSSVANSSIAVASKDdETWLWHRRLGHPSFKGLKKLVKKGLLPGLPISK--DLVCESCQLGKQ 67
DUF4219 pfam13961
Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal ...
13-39 2.28e-08

Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal of many Gag-pol polyprotein and related proteins. There is a highly conserved YxxWxxxM sequence motif.


:

Pssm-ID: 433608  Cd Length: 27  Bit Score: 50.97  E-value: 2.28e-08
                           10        20
                   ....*....|....*....|....*..
gi 12321254     13 LTKSNYDNWSLRMKAILGAHDVWEIVE 39
Cdd:pfam13961    1 LDGDNYETWKLRMKLYLQAQDLWEVVE 27
ZnF_C2HC smart00343
zinc finger;
280-295 1.58e-04

zinc finger;


:

Pssm-ID: 197667 [Multi-domain]  Cd Length: 17  Bit Score: 39.73  E-value: 1.58e-04
                            10
                    ....*....|....*.
gi 12321254     280 KCYNCGKFGHYASECK 295
Cdd:smart00343    1 KCYNCGKEGHIARDCP 16
 
Name Accession Description Interval E-value
RVT_2 pfam07727
Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually ...
832-1075 1.61e-97

Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses. This Pfam entry includes reverse transcriptases not recognized by the pfam00078 model.


Pssm-ID: 400190  Cd Length: 243  Bit Score: 311.83  E-value: 1.61e-97
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    832 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVeRYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIH 911
Cdd:pfam07727    1 NETWTLVKLPKNVKPIGTTWVHTHKINDLKEV-QYKARLVAQGFRQIAGEDYDKVFSPVIRLSSVRLLLAIAAEYEWPVH 79
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    912 QMDVKSAFLNGDLEEEVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKE 991
Cdd:pfam07727   80 HMDVSSAFLNGDIDEEIYVKQPPGFNIDNESGKVWQLNKSLYGLKQAPYMWNTCITKVLMDLNFEPDTAESGMYCRGFGE 159
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    992 DILIACLYVDDLIFTGNNPSMFEEFKKEMTKEFEMTDIGLMSYYLGIEVKQEDNGIFITQEGYAKEVLKKFKMDDSNPVC 1071
Cdd:pfam07727  160 NKLIVGLYVDDMFITGSDITIINDFKLELAKHFKMKDLGDISEFLGIEFIQIAGGIRLSQHNYLNSVIKKFNLTNNNGKY 239

                   ....
gi 12321254   1072 TPME 1075
Cdd:pfam07727  240 TPII 243
RNase_HI_RT_Ty1 cd09272
Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) ...
1159-1298 1.30e-81

Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) enzymes are divided into two major families, Type 1 and Type 2, based on amino acid sequence similarities and biochemical properties. RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner in the presence of divalent cations. RNase H is widely present in various organisms including bacteria, archaea, and eukaryotes. RNase HI has also been observed as adjunct domains to the reverse transcriptase gene in retroviruses, in long-term repeat (LTR)-bearing and non-LTR retrotransposons. RNase HI in LTR retrotransposons perform degradation of the original RNA template, generation of a polypurine tract (the primer for plus-strand DNA synthesis), and final removal of RNA primers from newly synthesized minus and plus strands. The catalytic residues for RNase H enzymatic activity, three aspartatic acids and one glutamic acid residue (DEDD) are unvaried across all RNase H domains. Phylogenetic patterns of RNase HI of LTR retroelements is classified into five major families, Ty3/Gypsy, Ty1/Copia, Bel/Pao, DIRS1, and the vertebrate retroviruses. The Ty1/Copia family is widely distributed among the genomes of plants, fungi, and animals. RNase H inhibitors have been explored as an anti-HIV drug target because RNase H inactivation inhibits reverse transcription.


Pssm-ID: 260004  Cd Length: 140  Bit Score: 263.56  E-value: 1.30e-81
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1159 VGYSDSDWGGDVDDRKSTSGFVFYIGDTAFTWMSKKQPIVTLSTCEAEYVAATSCVCHAIWLRNLLKELSLPQEEPTKIF 1238
Cdd:cd09272    1 EGYSDADWAGDPDDRRSTSGYVFFLGGGPISWKSKKQTTVALSSTEAEYIALAEAAKEALWLRRLLEELGIPLDGPTTIY 80
                         90       100       110       120       130       140
                 ....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1239 VDNKSAIALAKNPVFHDRSKHIDTRYHYIRECVSKKDVQLEYVKTHDQVADIFTKPLKRE 1298
Cdd:cd09272   81 CDNQSAIALAKNPVFHSRTKHIDIRYHFIREKVEKGEIKVEYVPTEDQLADILTKPLPRP 140
Retrotran_gag_2 pfam14223
gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains ...
72-196 3.09e-20

gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains LTR-polyproteins, or retrotransposons of the copia-type.


Pssm-ID: 464108  Cd Length: 130  Bit Score: 88.06  E-value: 3.09e-20
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254     72 LIYQGLDEDTFEKVVEATSAKEAWEKLRTSYKGADQVKKVrlqTLRGEFEALQMKEGELVSDYFSRVLTVTNNLKRNGEK 151
Cdd:pfam14223    3 LIVLSLSDSLLRLVRNADTAKEAWDKLESTYERKSPANKL---TLRRQLHSLKMKEGESVLEHINKFEELVNKLSALGVE 79
                           90       100       110       120
                   ....*....|....*....|....*....|....*....|....*
gi 12321254    152 LDDVRIMEKVLRSLDLKFEHIVTVIEETKDLeaMTIEQLLGSLQA 196
Cdd:pfam14223   80 ISDEDLVVKLLRSLPESYENFVTAIESSSDK--ITLEELISKLLD 122
transpos_IS481 NF033577
IS481 family transposase; null
523-684 1.92e-18

IS481 family transposase; null


Pssm-ID: 468094 [Multi-domain]  Cd Length: 283  Bit Score: 87.26  E-value: 1.92e-18
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   523 AQKPLELIHTDVC--GPIKPKslGKSnYFLLFIDDFSRktWVYFLKEKSEVFEIFKKFKAHVEKESGLVIKTMRSDRGGE 600
Cdd:NF033577  124 RAHPGELWHIDIKklGRIPDV--GRL-YLHTAIDDHSR--FAYAELYPDETAETAADFLRRAFAEHGIPIRRVLTDNGSE 198
                          90       100       110       120       130       140       150       160
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   601 FTSK--EFLKYCEDNGIRRQLTVPRSPQQNGVAERKNRTILE--MARsmlkskRLPKELWAEAVACAVYLL---NRSPTK 673
Cdd:NF033577  199 FRSRahGFELALAELGIEHRRTRPYHPQTNGKVERFHRTLKDefAYA------RPYESLAELQAALDEWLHhynHHRPHS 272
                         170
                  ....*....|.
gi 12321254   674 SVSGKTPQEAW 684
Cdd:NF033577  273 ALGGKTPAERF 283
gag_pre-integrs pfam13976
GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements ...
443-510 2.09e-16

GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements and lies just upstream of the integrase region on the polyproteins.


Pssm-ID: 372857  Cd Length: 67  Bit Score: 74.71  E-value: 2.09e-16
                           10        20        30        40        50        60
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....
gi 12321254    443 RMFVLNIRNDIAQCLKMCYKE-ESWLWHLRFGHLNFGGLELLSRKEMVRGLPCINhpNQVCEGCLLGKQ 510
Cdd:pfam13976    1 GLYLLDLSSVANSSIAVASKDdETWLWHRRLGHPSFKGLKKLVKKGLLPGLPISK--DLVCESCQLGKQ 67
rve pfam00665
Integrase core domain; Integrase mediates integration of a DNA copy of the viral genome into ...
526-625 1.94e-14

Integrase core domain; Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain pfam02022. This domain is the central catalytic domain. The carboxyl terminal domain that is a non-specific DNA binding domain pfam00552. The catalytic domain acts as an endonuclease when two nucleotides are removed from the 3' ends of the blunt-ended viral DNA made by reverse transcription. This domain also catalyzes the DNA strand transfer reaction of the 3' ends of the viral DNA to the 5' ends of the integration site.


Pssm-ID: 459897 [Multi-domain]  Cd Length: 98  Bit Score: 70.42  E-value: 1.94e-14
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    526 PLELIHTDVCgPIKPKSLGKSNYFLLFIDDFSRKTWVYFLKE---KSEVFEIFKKFKAHVekesGLVIKTMRSDRGGEFT 602
Cdd:pfam00665    1 PNQLWQGDFT-YIRIPGGGGKLYLLVIVDDFSREILAWALSSemdAELVLDALERAIAFR----GGVPLIIHSDNGSEYT 75
                           90       100
                   ....*....|....*....|...
gi 12321254    603 SKEFLKYCEDNGIRRQLTVPRSP 625
Cdd:pfam00665   76 SKAFREFLKDLGIKPSFSRPGNP 98
Tra5 COG2801
Transposase InsO and inactivated derivatives [Mobilome: prophages, transposons];
523-638 1.09e-12

Transposase InsO and inactivated derivatives [Mobilome: prophages, transposons];


Pssm-ID: 442053 [Multi-domain]  Cd Length: 309  Bit Score: 70.57  E-value: 1.09e-12
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254  523 AQKPLELIHTDV-CGPIKPKSLgksnYFLLFIDDFSRK--TWVYFLKEKSE-VFEIFKKFKAHVEKESGLVIktmRSDRG 598
Cdd:COG2801  145 ATAPNQVWVTDItYIPTAEGWL----YLAAVIDLFSREivGWSVSDSMDAElVVDALEMAIERRGPPKPLIL---HSDNG 217
                         90       100       110       120
                 ....*....|....*....|....*....|....*....|
gi 12321254  599 GEFTSKEFLKYCEDNGIRRQLTVPRSPQQNGVAERKNRTI 638
Cdd:COG2801  218 SQYTSKAYQELLKKLGITQSMSRPGNPQDNAFIESFFGTL 257
transpos_IS3 NF033516
IS3 family transposase;
548-657 3.48e-09

IS3 family transposase;


Pssm-ID: 468052 [Multi-domain]  Cd Length: 369  Bit Score: 60.27  E-value: 3.48e-09
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   548 YFLLFIDDFSRK--TWVYFLKEKSE-VFEIFKKFKAHVEKESGLVIktmRSDRGGEFTSKEFLKYCEDNGIRRQLTVPRS 624
Cdd:NF033516  234 YLAVVLDLFSREivGWSVSTSMSAElVLDALEMAIEWRGKPEGLIL---HSDNGSQYTSKAYREWLKEHGITQSMSRPGN 310
                          90       100       110
                  ....*....|....*....|....*....|...
gi 12321254   625 PQQNGVAERKNRTilemarsmLKSKRLPKELWA 657
Cdd:NF033516  311 CWDNAVAESFFGT--------LKRECLYRRRFR 335
DUF4219 pfam13961
Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal ...
13-39 2.28e-08

Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal of many Gag-pol polyprotein and related proteins. There is a highly conserved YxxWxxxM sequence motif.


Pssm-ID: 433608  Cd Length: 27  Bit Score: 50.97  E-value: 2.28e-08
                           10        20
                   ....*....|....*....|....*..
gi 12321254     13 LTKSNYDNWSLRMKAILGAHDVWEIVE 39
Cdd:pfam13961    1 LDGDNYETWKLRMKLYLQAQDLWEVVE 27
ZnF_C2HC smart00343
zinc finger;
280-295 1.58e-04

zinc finger;


Pssm-ID: 197667 [Multi-domain]  Cd Length: 17  Bit Score: 39.73  E-value: 1.58e-04
                            10
                    ....*....|....*.
gi 12321254     280 KCYNCGKFGHYASECK 295
Cdd:smart00343    1 KCYNCGKEGHIARDCP 16
zf-CCHC pfam00098
Zinc knuckle; The zinc knuckle is a zinc binding motif composed of the the following ...
279-295 2.50e-04

Zinc knuckle; The zinc knuckle is a zinc binding motif composed of the the following CX2CX4HX4C where X can be any amino acid. The motifs are mostly from retroviral gag proteins (nucleocapsid). Prototype structure is from HIV. Also contains members involved in eukaryotic gene regulation, such as C. elegans GLH-1. Structure is an 18-residue zinc finger.


Pssm-ID: 395050 [Multi-domain]  Cd Length: 18  Bit Score: 39.43  E-value: 2.50e-04
                           10
                   ....*....|....*..
gi 12321254    279 VKCYNCGKFGHYASECK 295
Cdd:pfam00098    1 GKCYNCGEPGHIARDCP 17
PTZ00368 PTZ00368
universal minicircle sequence binding protein (UMSBP); Provisional
268-294 2.59e-03

universal minicircle sequence binding protein (UMSBP); Provisional


Pssm-ID: 173561 [Multi-domain]  Cd Length: 148  Bit Score: 39.79  E-value: 2.59e-03
                          10        20        30
                  ....*....|....*....|....*....|..
gi 12321254   268 GH-----PKSRYDKSSVKCYNCGKFGHYASEC 294
Cdd:PTZ00368   62 GHlsrecPEAPPGSGPRSCYNCGQTGHISREC 93
 
Name Accession Description Interval E-value
RVT_2 pfam07727
Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually ...
832-1075 1.61e-97

Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses. This Pfam entry includes reverse transcriptases not recognized by the pfam00078 model.


Pssm-ID: 400190  Cd Length: 243  Bit Score: 311.83  E-value: 1.61e-97
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    832 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVeRYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIH 911
Cdd:pfam07727    1 NETWTLVKLPKNVKPIGTTWVHTHKINDLKEV-QYKARLVAQGFRQIAGEDYDKVFSPVIRLSSVRLLLAIAAEYEWPVH 79
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    912 QMDVKSAFLNGDLEEEVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKE 991
Cdd:pfam07727   80 HMDVSSAFLNGDIDEEIYVKQPPGFNIDNESGKVWQLNKSLYGLKQAPYMWNTCITKVLMDLNFEPDTAESGMYCRGFGE 159
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    992 DILIACLYVDDLIFTGNNPSMFEEFKKEMTKEFEMTDIGLMSYYLGIEVKQEDNGIFITQEGYAKEVLKKFKMDDSNPVC 1071
Cdd:pfam07727  160 NKLIVGLYVDDMFITGSDITIINDFKLELAKHFKMKDLGDISEFLGIEFIQIAGGIRLSQHNYLNSVIKKFNLTNNNGKY 239

                   ....
gi 12321254   1072 TPME 1075
Cdd:pfam07727  240 TPII 243
RNase_HI_RT_Ty1 cd09272
Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) ...
1159-1298 1.30e-81

Ty1/Copia family of RNase HI in long-term repeat retroelements; Ribonuclease H (RNase H) enzymes are divided into two major families, Type 1 and Type 2, based on amino acid sequence similarities and biochemical properties. RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner in the presence of divalent cations. RNase H is widely present in various organisms including bacteria, archaea, and eukaryotes. RNase HI has also been observed as adjunct domains to the reverse transcriptase gene in retroviruses, in long-term repeat (LTR)-bearing and non-LTR retrotransposons. RNase HI in LTR retrotransposons perform degradation of the original RNA template, generation of a polypurine tract (the primer for plus-strand DNA synthesis), and final removal of RNA primers from newly synthesized minus and plus strands. The catalytic residues for RNase H enzymatic activity, three aspartatic acids and one glutamic acid residue (DEDD) are unvaried across all RNase H domains. Phylogenetic patterns of RNase HI of LTR retroelements is classified into five major families, Ty3/Gypsy, Ty1/Copia, Bel/Pao, DIRS1, and the vertebrate retroviruses. The Ty1/Copia family is widely distributed among the genomes of plants, fungi, and animals. RNase H inhibitors have been explored as an anti-HIV drug target because RNase H inactivation inhibits reverse transcription.


Pssm-ID: 260004  Cd Length: 140  Bit Score: 263.56  E-value: 1.30e-81
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1159 VGYSDSDWGGDVDDRKSTSGFVFYIGDTAFTWMSKKQPIVTLSTCEAEYVAATSCVCHAIWLRNLLKELSLPQEEPTKIF 1238
Cdd:cd09272    1 EGYSDADWAGDPDDRRSTSGYVFFLGGGPISWKSKKQTTVALSSTEAEYIALAEAAKEALWLRRLLEELGIPLDGPTTIY 80
                         90       100       110       120       130       140
                 ....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254 1239 VDNKSAIALAKNPVFHDRSKHIDTRYHYIRECVSKKDVQLEYVKTHDQVADIFTKPLKRE 1298
Cdd:cd09272   81 CDNQSAIALAKNPVFHSRTKHIDIRYHFIREKVEKGEIKVEYVPTEDQLADILTKPLPRP 140
Retrotran_gag_2 pfam14223
gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains ...
72-196 3.09e-20

gag-polypeptide of LTR copia-type; This family is found in Plants and fungi, and contains LTR-polyproteins, or retrotransposons of the copia-type.


Pssm-ID: 464108  Cd Length: 130  Bit Score: 88.06  E-value: 3.09e-20
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254     72 LIYQGLDEDTFEKVVEATSAKEAWEKLRTSYKGADQVKKVrlqTLRGEFEALQMKEGELVSDYFSRVLTVTNNLKRNGEK 151
Cdd:pfam14223    3 LIVLSLSDSLLRLVRNADTAKEAWDKLESTYERKSPANKL---TLRRQLHSLKMKEGESVLEHINKFEELVNKLSALGVE 79
                           90       100       110       120
                   ....*....|....*....|....*....|....*....|....*
gi 12321254    152 LDDVRIMEKVLRSLDLKFEHIVTVIEETKDLeaMTIEQLLGSLQA 196
Cdd:pfam14223   80 ISDEDLVVKLLRSLPESYENFVTAIESSSDK--ITLEELISKLLD 122
transpos_IS481 NF033577
IS481 family transposase; null
523-684 1.92e-18

IS481 family transposase; null


Pssm-ID: 468094 [Multi-domain]  Cd Length: 283  Bit Score: 87.26  E-value: 1.92e-18
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   523 AQKPLELIHTDVC--GPIKPKslGKSnYFLLFIDDFSRktWVYFLKEKSEVFEIFKKFKAHVEKESGLVIKTMRSDRGGE 600
Cdd:NF033577  124 RAHPGELWHIDIKklGRIPDV--GRL-YLHTAIDDHSR--FAYAELYPDETAETAADFLRRAFAEHGIPIRRVLTDNGSE 198
                          90       100       110       120       130       140       150       160
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   601 FTSK--EFLKYCEDNGIRRQLTVPRSPQQNGVAERKNRTILE--MARsmlkskRLPKELWAEAVACAVYLL---NRSPTK 673
Cdd:NF033577  199 FRSRahGFELALAELGIEHRRTRPYHPQTNGKVERFHRTLKDefAYA------RPYESLAELQAALDEWLHhynHHRPHS 272
                         170
                  ....*....|.
gi 12321254   674 SVSGKTPQEAW 684
Cdd:NF033577  273 ALGGKTPAERF 283
gag_pre-integrs pfam13976
GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements ...
443-510 2.09e-16

GAG-pre-integrase domain; This domain is found associated with retroviral insertion elements and lies just upstream of the integrase region on the polyproteins.


Pssm-ID: 372857  Cd Length: 67  Bit Score: 74.71  E-value: 2.09e-16
                           10        20        30        40        50        60
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....
gi 12321254    443 RMFVLNIRNDIAQCLKMCYKE-ESWLWHLRFGHLNFGGLELLSRKEMVRGLPCINhpNQVCEGCLLGKQ 510
Cdd:pfam13976    1 GLYLLDLSSVANSSIAVASKDdETWLWHRRLGHPSFKGLKKLVKKGLLPGLPISK--DLVCESCQLGKQ 67
rve pfam00665
Integrase core domain; Integrase mediates integration of a DNA copy of the viral genome into ...
526-625 1.94e-14

Integrase core domain; Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain pfam02022. This domain is the central catalytic domain. The carboxyl terminal domain that is a non-specific DNA binding domain pfam00552. The catalytic domain acts as an endonuclease when two nucleotides are removed from the 3' ends of the blunt-ended viral DNA made by reverse transcription. This domain also catalyzes the DNA strand transfer reaction of the 3' ends of the viral DNA to the 5' ends of the integration site.


Pssm-ID: 459897 [Multi-domain]  Cd Length: 98  Bit Score: 70.42  E-value: 1.94e-14
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254    526 PLELIHTDVCgPIKPKSLGKSNYFLLFIDDFSRKTWVYFLKE---KSEVFEIFKKFKAHVekesGLVIKTMRSDRGGEFT 602
Cdd:pfam00665    1 PNQLWQGDFT-YIRIPGGGGKLYLLVIVDDFSREILAWALSSemdAELVLDALERAIAFR----GGVPLIIHSDNGSEYT 75
                           90       100
                   ....*....|....*....|...
gi 12321254    603 SKEFLKYCEDNGIRRQLTVPRSP 625
Cdd:pfam00665   76 SKAFREFLKDLGIKPSFSRPGNP 98
Tra5 COG2801
Transposase InsO and inactivated derivatives [Mobilome: prophages, transposons];
523-638 1.09e-12

Transposase InsO and inactivated derivatives [Mobilome: prophages, transposons];


Pssm-ID: 442053 [Multi-domain]  Cd Length: 309  Bit Score: 70.57  E-value: 1.09e-12
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254  523 AQKPLELIHTDV-CGPIKPKSLgksnYFLLFIDDFSRK--TWVYFLKEKSE-VFEIFKKFKAHVEKESGLVIktmRSDRG 598
Cdd:COG2801  145 ATAPNQVWVTDItYIPTAEGWL----YLAAVIDLFSREivGWSVSDSMDAElVVDALEMAIERRGPPKPLIL---HSDNG 217
                         90       100       110       120
                 ....*....|....*....|....*....|....*....|
gi 12321254  599 GEFTSKEFLKYCEDNGIRRQLTVPRSPQQNGVAERKNRTI 638
Cdd:COG2801  218 SQYTSKAYQELLKKLGITQSMSRPGNPQDNAFIESFFGTL 257
transpos_IS3 NF033516
IS3 family transposase;
548-657 3.48e-09

IS3 family transposase;


Pssm-ID: 468052 [Multi-domain]  Cd Length: 369  Bit Score: 60.27  E-value: 3.48e-09
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254   548 YFLLFIDDFSRK--TWVYFLKEKSE-VFEIFKKFKAHVEKESGLVIktmRSDRGGEFTSKEFLKYCEDNGIRRQLTVPRS 624
Cdd:NF033516  234 YLAVVLDLFSREivGWSVSTSMSAElVLDALEMAIEWRGKPEGLIL---HSDNGSQYTSKAYREWLKEHGITQSMSRPGN 310
                          90       100       110
                  ....*....|....*....|....*....|...
gi 12321254   625 PQQNGVAERKNRTilemarsmLKSKRLPKELWA 657
Cdd:NF033516  311 CWDNAVAESFFGT--------LKRECLYRRRFR 335
DUF4219 pfam13961
Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal ...
13-39 2.28e-08

Domain of unknown function (DUF4219); This domain is very short and is found at the N-terminal of many Gag-pol polyprotein and related proteins. There is a highly conserved YxxWxxxM sequence motif.


Pssm-ID: 433608  Cd Length: 27  Bit Score: 50.97  E-value: 2.28e-08
                           10        20
                   ....*....|....*....|....*..
gi 12321254     13 LTKSNYDNWSLRMKAILGAHDVWEIVE 39
Cdd:pfam13961    1 LDGDNYETWKLRMKLYLQAQDLWEVVE 27
Tra8 COG2826
Transposase and inactivated derivatives, IS30 family [Mobilome: prophages, transposons];
552-684 1.56e-07

Transposase and inactivated derivatives, IS30 family [Mobilome: prophages, transposons];


Pssm-ID: 442074 [Multi-domain]  Cd Length: 325  Bit Score: 54.89  E-value: 1.56e-07
                         10        20        30        40        50        60        70        80
                 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 12321254  552 FIDDFSRKTWVYFLKEKS-----EVF-EIFKKFKAHVekesglvIKTMRSDRGGEFTskEFLKYCEDNGIRRQLTVPRSP 625
Cdd:COG2826  192 LVERKSRFVILLKLPDKTaesvaDALiRLLRKLPAFL-------RKSITTDNGKEFA--DHKEIEAALGIKVYFADPYSP 262
                         90       100       110       120       130
                 ....*....|....*....|....*....|....*....|....*....|....*....
gi 12321254  626 QQNGVAERKNRTIlemARSMLKSKRLPKELwAEAVACAVYLLNRSPTKSVSGKTPQEAW 684
Cdd:COG2826  263 WQRGTNENTNGLL---RQYFPKGTDFSTVT-QEELDAIADRLNNRPRKCLGYKTPAEVF 317
ZnF_C2HC smart00343
zinc finger;
280-295 1.58e-04

zinc finger;


Pssm-ID: 197667 [Multi-domain]  Cd Length: 17  Bit Score: 39.73  E-value: 1.58e-04
                            10
                    ....*....|....*.
gi 12321254     280 KCYNCGKFGHYASECK 295
Cdd:smart00343    1 KCYNCGKEGHIARDCP 16
zf-CCHC pfam00098
Zinc knuckle; The zinc knuckle is a zinc binding motif composed of the the following ...
279-295 2.50e-04

Zinc knuckle; The zinc knuckle is a zinc binding motif composed of the the following CX2CX4HX4C where X can be any amino acid. The motifs are mostly from retroviral gag proteins (nucleocapsid). Prototype structure is from HIV. Also contains members involved in eukaryotic gene regulation, such as C. elegans GLH-1. Structure is an 18-residue zinc finger.


Pssm-ID: 395050 [Multi-domain]  Cd Length: 18  Bit Score: 39.43  E-value: 2.50e-04
                           10
                   ....*....|....*..
gi 12321254    279 VKCYNCGKFGHYASECK 295
Cdd:pfam00098    1 GKCYNCGEPGHIARDCP 17
PTZ00368 PTZ00368
universal minicircle sequence binding protein (UMSBP); Provisional
268-294 2.59e-03

universal minicircle sequence binding protein (UMSBP); Provisional


Pssm-ID: 173561 [Multi-domain]  Cd Length: 148  Bit Score: 39.79  E-value: 2.59e-03
                          10        20        30
                  ....*....|....*....|....*....|..
gi 12321254   268 GH-----PKSRYDKSSVKCYNCGKFGHYASEC 294
Cdd:PTZ00368   62 GHlsrecPEAPPGSGPRSCYNCGQTGHISREC 93
 
Blast search parameters
Data Source: Precalculated data, version = cdd.v.3.21
Preset Options:Database: CDSEARCH/cdd   Low complexity filter: no  Composition Based Adjustment: yes   E-value threshold: 0.01

References:

  • Wang J et al. (2023), "The conserved domain database in 2023", Nucleic Acids Res.51(D)384-8.
  • Lu S et al. (2020), "The conserved domain database in 2020", Nucleic Acids Res.48(D)265-8.
  • Marchler-Bauer A et al. (2017), "CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.", Nucleic Acids Res.45(D)200-3.
Help | Disclaimer | Write to the Help Desk
NCBI | NLM | NIH