IgBlast tool

Retrieve recent results

Enter Query Sequence

Enter sequence(s)

Sequence can be in form of raw sequence, accession # or gi (example accession Y14934). You may opt to include a definition line starting with ">" at the top in conforming to FASTA format. You can also load your sequences contained in a local file (make sure it is a plain text file). If the sequence is already in GenBank, you can just enter its accession or gi #.

Multiple query sequences may be submitted. Each sequence must have a unique identifier and we suggest that you do not use white spaces in the identifier as any characters after the white spaces will be excluded.

Or, upload local sequence file

Germline gene databases Organism for query sequence

Specify the organism which the query sequence comes from. This allows the program to properly report the V domain delineation, the V-J frame status (i.e, in-frame, out-of-frame, etc) and the translation of the query nucleotide sequence.

Germline V gene Database

♦Germline V gene Database non-default value

All IMGT germline databases are from IMGT/V-QUEST reference directory sets. Sequences from several different categories are available including functional genes (F), open reading frame genes (ORF), pseudogenes whose protein translation frames are intact (in-frame P) and orphon genes that are outside of normal immunoglobulin or T cell receptor gene loci.

AIRR-C V genes are from AIRR community.

NCBI human V genes: This database consists of the "IMGT human V genes (F+ORF+in-frame P) including orphons" database plus a few pseudogenes that IMGT database did not include. It contains the same human sequences as the "Ig germline V genes" database for the previous version of IgBLAST.

NCBI human V genes (old): This is our earliest version of human Ig germline V genes database before addition of the human germline sequences from IMGT database. It is the same as "Ig germline V genes (old)" database for the previous version of IgBLAST.

AIRR-C V genes are from AIRR community.

NCBI mouse V genes, NCBI mouse D genes and NCBI mouse J genes: These are mouse germline sequences independently collected by NCBI.

Rhesus monkey germline V, D, J genes are Macaca Mulatta germline sequence collections from KIMDB.

See NCBI germline genes for details on NCBI germline gene collections.

Custom: You can search your own database. Your database should contain sequences in FASTA format.

Germline D gene Database

♦Germline D gene Database non-default value

Germline J gene Database

♦Germline J gene Database non-default value

Germline C gene Database

♦Germline C gene Database non-default value

The C gene match is intended for rearranged Ig sequences which means your Ig sequence needs to have the V(D)J part preceding the C region.

NCBI human C gene database:

This is the Constant region genes annotated on NCBI reference genome. Since this is from the reference genome only, additional alleles are not included. Note that, to be consistent with IMGT's reference C gene sequence, an extra base (from the J gene end) is added to the start of the C gene sequence to maintain the completeness of the first codon for C gene, and as a result, the IgBLAST result typically shows a one base overlap between J and C genes.

Search Parameters Program

♦Program non-default value

Choose blastp for protein sequences and blastn for nucleotide sequences.

V gene mismatch penalty

♦V gene mismatch penalty non-default value

A higher mismatch penalty (for example, -3) favors detecting V gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (for example, -1) favors detecting longer V gene matches that do not necessarily have a high similarity to the query sequence. In general, a higher penalty works better if your sequence has few or no somatic mutations. But if your sequence has significant mutations (>5%),then a lower penalty should be chosen if you want to accommodate the low similarity introduced by mutations.

Min D gene nucleotide matches

♦Min D gene nucleotide matches non-default value

This controls the threshold for D gene detection. You can set the minimal number of required consecutive nucleotide matches between the query sequence and the D genes based on your own criteria. Note that the matches do not include overlapping matches at V-D or D-J junctions. The default value is 5 nucleotides.

D gene mismatch penalty

♦D gene mismatch penalty non-default value

A higher mismatch penalty (for example, -4) favors detecting D gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (for example, -1) favors detecting longer D gene matches that do not necessarily have a high similarity to the query sequence. In general, a higher penalty works better if your sequence has few or no somatic mutations. But if your sequence has significant mutations (>5%),then a lower penalty should be chosen if you want to accommodate the low similarity introduced by mutations.

Alignment extension

♦Alignment at 5' end non-default value Extend alignment at 5' end

If your sequence has too many differences (due to mutations or other reasons) to the germline sequence at 5' end of the V gene, then IgBLAST result may not show that part of the V gene since IgBLAST uses local alignment algorithm. However, if you'd like to see those missed bases/residues anyway, you can enable this option to direct IgBLAST to perform simple gapless alignment extension (up to 30 bases/residues) into that region. Note that this extension is not a BLAST alignment and should not be used to infer any homology between the query and the subject sequences. It is intended only as a convenience to show the missed part in the context of the germline sequences. If you want to see true BLAST alignment to cover your sequence as much as possible, make sure you select the lowest mismatch penalty (i.e., -1).

Alignment extension

♦Extend alignment at 3' end non-default value Extend alignment at 3' end

This is same as "Extend alignment at 5' end" except that it extends alignment at 3' end of the J gene up to 15 bps.

V(D)J genes overlap

♦Allow V(D)J genes to overlap Allow to overlap

Enabling this option allows V(D)J genes to overlap at the rearranging junctions (i.e, there might be a stretch of nucleotide homology that is shared between the V(D)J gene segment ends, which mostly happens when there are no N nucleotide additions). The program does not allow V, D, J genes to overlap when assigning V, D, J gene matches by default. While this option might change the results for D gene matches in some cases, it has no effect on results for V gene matches (as well as related match statistics). Its effect on J gene matches is minimal and only occurs in rare cases. Note that this option is active only when the D and J gene mismatch penalty are set to -4 and -3, respectively.

Min required V gene length

♦Min required V gene length non-default value

Only shows results if the query sequence matches a germline V gene for at least the specified minimal length (i.e., number of bases or amino acids). Otherwise, reports "No hits found".

Min required J gene length

♦Min required J gene length non-default value

Only shows results if the query sequence matches a germline J gene for at least the specified minimal length (i.e., number of bases). Otherwise, reports "No hits found".

J gene mismatch penalty

♦J gene mismatch penalty non-default value

A higher mismatch penalty (for example, -3) favors detecting J gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (for example, -1) favors detecting longer matches that do not necessarily have a high similarity to the query sequence. in general, a higher penalty works better if your sequence has few or no somatic mutations. But if your sequence has significant mutations (>5%),then a lower penalty should be chosen if you want to accommodate the low similarity introduced by mutations.

Note: Parameter values that differ from the default are highlighted in yellow and marked with ♦

Show results in a new window

Formatting Options Number of germline gene

V gene D gene J gene C gene

♦Number of germline V gene non-default value ♦Number of germline D gene non-default value ♦Number of germline J gene non-default value ♦Number of germline C gene non-default value

Amino acid translation

♦Amino acid translation non-default value Show amino acid translation

This will translate your query as well as the top germline sequence and align the amino acid to the second base of a codon. The mismatched amino acids in the germline sequence will be colored.

V domain delineation system

♦V domain delineation system non-default value

The V domain can be delineated using either IMGT system (Lefranc et al 2003) or Kabat system (Kabat et al, 1991, Sequences of Proteins of Immunological Interest, National Institutes of Health Publication No. 91-3242, 5th ed., United States Department of Health and Human Services, Bethesda, MD). Domain annotation of the query sequence is based on pre-annotated domain information for the best matched germline hit.

Number of clonotypes to show

♦Number of clonotypes to show non-default value

Number of top clonotypes to show. Note this option is applicable only for multiple queries.

Alignment format

♦Alignment format non-default value

AIRR rearrangement tabular:
This is a tab-separated format that the AIRR community has developed for describing Ig/TCR sequences. Essentially this format includes a header line indicating the field names and each subsequent line contains the tab-separated fields for each query sequence. All sequence positions are now 1-based (Previous AIRR specifications use 0-based start and 1-based end sequence coordinate system. But starting on 11/27/2018, all sequence positions were changed to 1-based system per the new AIRR specifications).

Additional databases Database

♦Additional databases non-default value

nr: GenBank+EMBL+DDBJ+PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1, and 2 HTGS sequences.

refseq_genomes: NCBI RefSeq genome sequences.

pdb: Sequences from the Protein Data Bank (PDB).

patent: Nucleotide sequences derived from the Patent division of GenBank.

mammalian genomes: NCBI mammal genomes.

Organism limit
Optional

Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Help

Start typing in the text box, then select your taxid. The search will be restricted to the sequences in the database that correspond to your limited subset.

V gene only

♦Search V gene only non-default value Search V gene only

This allows a user to find the best matches for the V gene in your query sequence among additional non-germline databases (i.e., nr, genome, etc). This option has NO effect on search against germline gene databases (see explanation below).

A typical rearranged query sequence includes a leader, the V, D, J gene (sometimes the C region is also included). When a sequence is submitted for blast search, the similarity matches will be performed over the entire query sequence. Unlike the germline V gene database which only contains the V gene sequences, other databases such as nr contain many rearranged sequences that also include a leader, the V, D, J and C genes. As a result, the best hit from these databases does not necessarily have the best match to the query V gene; Rather, it has the best match over the entire query sequence (For example, it may have very high similarity to the leader, D, J or C genes in a query sequence but only a low match to the V gene). This is not a problem if the goal is trying to find the best overall matches to a query sequence. However, if the goal is to find best matches to the V gene of a query sequence, then one needs to isolate the V gene part manually from a query sequence and then use it for a search.

With this option on, the V gene part from a query sequence is automatically isolated (based on comparison to hits from the germline V gene database) and then used for search against additional databases like nr. This option should be disabled, however, if the search intention is to find best hits based on overall matches.

Number of alignments

♦Number of alignments non-default value

Expect

♦Expect non-default value

This is the statistical significance threshold for reporting matches against database sequences. Lower EXPECT thresholds are more stringent and report only high similarity matches. Choose higher EXPECT value (for example 1 or more) if you expect a low identity between your query sequence and the targets. Note that this option is only for the additional database search (it has no effect on the germline gene database search).

Show results in a new window