Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

How are orthologs calculated?

NCBI's Eukaryotic Genome Annotation Pipeline identifies ortholog gene groups for the NCBI Gene dataset using a combination of protein sequence similarity and local synteny information.

Orthology is determined between a genome being annotated and a reference genome, for example, human or zebrafish, and pairs of orthologs are tracked as groups. Transitive relationships are inferred in the group, for example medaka <-> zebrafish <-> human <-> mouse. Only genes in the NCBI Gene database are eligible for ortholog calculation. With a few exceptions, ortholog calculation is currently limited to vertebrates and arthropods.

For each protein from the genome being annotated, the reference genome is searched for best and near-best matches based on protein sequence similarity. Candidates are further analyzed for nucleotide sequence similarity across all exons (including UTRs), and an additional 2kb sequence on either side of the gene, and microsynteny within the local genomic neighborhood (+/- 10 genes). Orthology relationships are assigned only when there is a clear 1:1 relationship, using the microsynteny information to help resolve closely related paralogs, and may be reviewed by a RefSeq curator to further refine the set.