NCBI Haliotis rufescens Annotation Release 100

The RefSeq genome records for Haliotis rufescens were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Haliotis rufescens Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jan 28 2022
Date of submission of annotation to the public databases: Feb 3 2022
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
H.ruf_v1.0	GCF_003343065.1	Iowa State University	07-26-2018	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	H.ruf_v1.0
Genes and pseudogenes	42,870
protein-coding	35,454
non-coding	7,051
Transcribed pseudogenes	1
Non-transcribed pseudogenes	362
genes with variants	9,964
Immunoglobulin/T-cell receptor gene segments	0
other	2
mRNAs	56,741
fully-supported	48,692
with > 5% ab initio	5,555
partial	905
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	56,741
non-coding RNAs	9,370
fully-supported	5,849
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	6,364
pseudo transcripts	1
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	1
CDSs	56,754
fully-supported	48,692
with > 5% ab initio	5,945
partial	905
with major correction(s)	125
known RefSeq (NP_)	0
model RefSeq (XP_)	56,754

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	42,507	18,772	8,409	63	968,501
All transcripts	66,111	2,987	2,158	51	42,154
mRNA	56,741	3,299	2,409	84	42,154
misc_RNA	1,442	3,382	2,748	77	15,030
tRNA	3,004	75	73	63	87
lncRNA	4,408	1,134	684	51	11,015
snoRNA	157	125	81	64	274
snRNA	286	156	164	106	199
rRNA	71	1,449	154	119	3,872
Single-exon transcripts	2,507	1,483	1,140	264	16,429
coding transcripts (NM_/XM_ )	2,507	1,483	1,140	264	16,429
CDSs	56,754	1,848	1,305	84	41,208
Exons	321,890	338	141	1	16,429
in coding transcripts (NM_/XM_ )	307,940	332	141	1	16,429
in non-coding transcripts (NR_/XR_ )	20,333	378	137	2	10,259
Introns	281,543	3,074	1,026	30	486,037
in coding transcripts (NM_/XM_ )	272,191	3,079	1,035	30	486,037
in non-coding transcripts (NR_/XR_ )	15,528	3,044	881	30	188,733

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.6	1	1	50
Number of exons per transcript	9.61	6	1	182

BUSCO analysis of gene annotation

BUSCO v4.1.4 (Simão et al 2015, PMID: 26059717) was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the mollusca_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation (C:complete [S:single-copy, D:duplicated], F:fragmented, M:missing, n:number of genes used).

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 35441 coding genes, 22447 genes had a protein with an alignment covering 50% or more of the query and 4445 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
H.ruf_v1.0	GCF_003343065.1	37.16%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	148	146 (98.65%)	135 (91.22%)	99.23%	98.59%
Same-species EST	354	177 (50.00%)	163 (46.05%)	98.53%	98.91%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	3,245,082,649	74%	21%	342,505
SAMN01931999	NA	Testis RNA-seq from the Red Abalone, Haliotis rufescens (Haliotis rufescens, SAMN01931999)	127,060,147	73%	22%	73,090
SAMN02639871	NA	Mantle (Haliotis rufescens, 8, SAMN02639871)	18,237,514	82%	8%	139,158
SAMN02639872	NA	Mantle (Haliotis rufescens, 8, SAMN02639872)	19,359,222	82%	8%	139,249
SAMN02639873	NA	Mantle (Haliotis rufescens, 8, SAMN02639873)	20,715,303	82%	8%	139,831
SAMN02639874	NA	Mantle (Haliotis rufescens, 8, SAMN02639874)	20,891,586	82%	9%	152,115
SAMN02639875	NA	Mantle (Haliotis rufescens, 8, SAMN02639875)	18,314,555	79%	9%	136,448
SAMN02639876	NA	Mantle (Haliotis rufescens, 8, SAMN02639876)	17,884,911	82%	8%	135,672
SAMN02639877	NA	Mantle (Haliotis rufescens, 8, SAMN02639877)	21,600,143	81%	7%	149,801
SAMN02639878	NA	Mantle (Haliotis rufescens, 8, SAMN02639878)	55,461,647	83%	2%	144,138
SAMN02639879	NA	Mantle (Haliotis rufescens, 8, SAMN02639879)	25,806,887	81%	7%	148,292
SAMN02639880	NA	Mantle (Haliotis rufescens, 8, SAMN02639880)	9,540,897	82%	8%	108,395
SAMN02639881	NA	Mantle (Haliotis rufescens, 8, SAMN02639881)	16,034,700	81%	8%	137,539
SAMN02639940	NA	Mantle (Haliotis rufescens, 8, SAMN02639940)	38,863,081	82%	8%	170,972
SAMN02639941	NA	Mantle (Haliotis rufescens, 8, SAMN02639941)	20,813,607	82%	7%	143,267
SAMN02639942	NA	Mantle (Haliotis rufescens, 8, SAMN02639942)	56,542,540	81%	8%	183,292
SAMN02639944	NA	Mantle (Haliotis rufescens, 8, SAMN02639944)	19,562,555	81%	8%	138,676
SAMN02639945	NA	Mantle (Haliotis rufescens, 8, SAMN02639945)	14,386,277	81%	7%	128,131
SAMN02639946	NA	Mantle (Haliotis rufescens, 8, SAMN02639946)	16,062,896	81%	7%	129,987
SAMN02639947	NA	Mantle (Haliotis rufescens, 8, SAMN02639947)	19,493,129	80%	7%	134,050
SAMN02639948	NA	Mantle (Haliotis rufescens, 8, SAMN02639948)	14,751,891	81%	7%	124,916
SAMN02639949	NA	Mantle (Haliotis rufescens, 8, SAMN02639949)	13,782,992	80%	7%	122,025
SAMN02639950	NA	Mantle (Haliotis rufescens, 8, SAMN02639950)	12,519,874	82%	7%	118,182
SAMN02639952	NA	Mantle (Haliotis rufescens, 8, SAMN02639952)	21,521,422	81%	7%	143,102
SAMN02639953	NA	Mantle (Haliotis rufescens, 8, SAMN02639953)	20,184,711	81%	7%	139,850
SAMN02639954	NA	Mantle (Haliotis rufescens, 8, SAMN02639954)	44,254,173	80%	7%	175,154
SAMN02639971	NA	Mantle (Haliotis rufescens, 8, SAMN02639971)	24,463,643	82%	7%	147,472
SAMN02639972	NA	Mantle (Haliotis rufescens, 8, SAMN02639972)	23,620,142	80%	7%	143,217
SAMN02639973	NA	Mantle (Haliotis rufescens, 8, SAMN02639973)	16,631,384	81%	7%	138,512
SAMN02639974	NA	Mantle (Haliotis rufescens, 8, SAMN02639974)	23,501,344	82%	8%	142,533
SAMN02639975	NA	Mantle (Haliotis rufescens, 8, SAMN02639975)	18,788,376	80%	7%	140,389
SAMN02639976	NA	Mantle (Haliotis rufescens, 8, SAMN02639976)	18,876,860	80%	7%	135,262
SAMN02639977	NA	Mantle (Haliotis rufescens, 8, SAMN02639977)	13,731,439	81%	7%	126,227
SAMN02639978	NA	Mantle (Haliotis rufescens, 8, SAMN02639978)	20,482,188	81%	7%	140,176
SAMN02639979	NA	Mantle (Haliotis rufescens, 8, SAMN02639979)	52,703,533	81%	8%	179,993
SAMN02639981	NA	Mantle (Haliotis rufescens, 8, SAMN02639981)	18,719,311	81%	8%	139,559
SAMN02639982	NA	Mantle (Haliotis rufescens, 8, SAMN02639982)	21,938,198	82%	8%	142,390
SAMN02639983	NA	Mantle (Haliotis rufescens, 8, SAMN02639983)	19,756,794	82%	8%	137,435
SAMN02639984	NA	Mantle (Haliotis rufescens, 8, SAMN02639984)	11,206,121	80%	8%	112,378
SAMN02639985	NA	Mantle (Haliotis rufescens, 8, SAMN02639985)	7,581,379	81%	6%	95,201
SAMN02639986	NA	Mantle (Haliotis rufescens, 8, SAMN02639986)	25,503,910	82%	7%	146,922
SAMN02639987	NA	Mantle (Haliotis rufescens, 8, SAMN02639987)	15,676,708	81%	7%	131,668
SAMN02639988	NA	Mantle (Haliotis rufescens, 8, SAMN02639988)	12,682,115	81%	8%	123,314
SAMN02639989	NA	Mantle (Haliotis rufescens, 8, SAMN02639989)	17,740,589	81%	7%	133,124
SAMN02639990	NA	Mantle (Haliotis rufescens, 8, SAMN02639990)	18,900,103	81%	8%	143,169
SAMN02639991	NA	Mantle (Haliotis rufescens, 8, SAMN02639991)	15,523,840	81%	7%	132,461
SAMN02641394	NA	Mantle (Haliotis rufescens, 8, SAMN02641394)	19,410,184	82%	8%	140,088
SAMN02641395	NA	Mantle (Haliotis rufescens, 8, SAMN02641395)	31,951,594	82%	8%	160,915
SAMN03444423	NA	Muscle (Haliotis rufescens, SAMN03444423)	764,686	86%	13%	62,611
SAMN11492961	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492961)	53,495,900	46%	12%	73,951
SAMN11492962	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492962)	38,510,082	56%	38%	94,480
SAMN11492963	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492963)	60,903,467	57%	39%	122,502
SAMN11492964	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492964)	40,394,747	78%	30%	188,545
SAMN11492965	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492965)	48,413,604	51%	28%	83,893
SAMN11492966	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492966)	51,848,457	78%	26%	97,364
SAMN11492967	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492967)	61,986,501	47%	27%	82,049
SAMN11492968	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492968)	49,959,588	53%	26%	85,003
SAMN11492970	30657886	Pooled Larvae (Haliotis rufescens, SAMN11492970)	52,640,520	70%	39%	107,296
SAMN11492971	30657886	Cephalic Tentacle (Haliotis rufescens, SAMN11492971)	35,350,653	68%	25%	107,582
SAMN11492972	30657886	Cephalic Tentacle (Haliotis rufescens, SAMN11492972)	51,926,203	80%	26%	204,222
SAMN11492973	30657886	Pooled Eggs (Haliotis rufescens, SAMN11492973)	63,565,753	59%	1%	77,485
SAMN11492974	30657886	Pooled Eggs (Haliotis rufescens, SAMN11492974)	50,388,138	52%	24%	132,390
SAMN11492975	30657886	Epipodium (Haliotis rufescens, SAMN11492975)	86,475,895	75%	30%	225,900
SAMN11492976	30657886	Epipodium (Haliotis rufescens, SAMN11492976)	40,584,989	70%	28%	128,565
SAMN11492977	30657886	Epipodal Tentacle (Haliotis rufescens, SAMN11492977)	29,894,910	67%	26%	115,466
SAMN11492978	30657886	Epipodal Tentacle (Haliotis rufescens, SAMN11492978)	38,295,258	80%	30%	209,632
SAMN11492979	30657886	Female Gonad (Haliotis rufescens, SAMN11492979)	46,335,377	78%	48%	136,715
SAMN11492980	30657886	Female Gonad (Haliotis rufescens, SAMN11492980)	48,438,630	81%	47%	184,759
SAMN11492981	30657886	Foot (Haliotis rufescens, SAMN11492981)	45,404,365	80%	32%	138,048
SAMN11492982	30657886	Foot (Haliotis rufescens, SAMN11492982)	46,711,683	77%	30%	101,292
SAMN11492983	30657886	Ganglion (Haliotis rufescens, SAMN11492983)	42,253,601	75%	29%	151,047
SAMN11492984	30657886	Ganglion (Haliotis rufescens, SAMN11492984)	51,929,689	76%	31%	168,132
SAMN11492985	30657886	Light Receptor (Haliotis rufescens, SAMN11492985)	38,790,227	76%	29%	205,156
SAMN11492986	30657886	Light Receptor (Haliotis rufescens, SAMN11492986)	38,453,939	78%	28%	198,227
SAMN11492987	30657886	Male Gonad (Haliotis rufescens, SAMN11492987)	47,156,331	73%	17%	153,313
SAMN11492988	30657886	Male Gonad (Haliotis rufescens, SAMN11492988)	52,211,146	71%	9%	137,323
SAMN11492989	30657886	Gill (Haliotis rufescens, SAMN11492989)	36,998,288	61%	25%	124,779
SAMN11492990	30657886	Gill (Haliotis rufescens, SAMN11492990)	40,201,035	57%	26%	146,660
SAMN11492991	30657886	Heart (Haliotis rufescens, SAMN11492991)	33,406,354	71%	36%	127,876
SAMN11492992	30657886	Heart (Haliotis rufescens, SAMN11492992)	36,800,084	52%	37%	177,764
SAMN11492993	30657886	Kidney (Haliotis rufescens, SAMN11492993)	50,215,273	62%	35%	142,502
SAMN11492994	30657886	Kidney (Haliotis rufescens, SAMN11492994)	35,983,497	70%	36%	204,631
SAMN11492995	30657886	Liver (Haliotis rufescens, SAMN11492995)	39,455,503	80%	45%	108,878
SAMN11492996	30657886	Liver (Haliotis rufescens, SAMN11492996)	22,896,311	81%	41%	172,644
SAMN11492997	30657886	Mantle (Haliotis rufescens, SAMN11492997)	46,670,189	76%	30%	117,283
SAMN11492998	30657886	Mantle (Haliotis rufescens, SAMN11492998)	48,539,426	76%	31%	207,621
SAMN11492999	30657886	Post-Esophagus (Haliotis rufescens, SAMN11492999)	42,019,862	79%	34%	203,422
SAMN11493000	30657886	Post-Esophagus (Haliotis rufescens, SAMN11493000)	54,439,427	79%	34%	203,224
SAMN13810026	NA	whole animal (Haliotis rufescens, SAMN13810026)	62,025,210	83%	25%	246,431
SAMN13810027	NA	whole animal (Haliotis rufescens, SAMN13810027)	46,064,086	82%	24%	232,849
SAMN13810028	NA	whole animal (Haliotis rufescens, SAMN13810028)	49,695,354	83%	25%	237,170
SAMN13810029	NA	whole animal (Haliotis rufescens, SAMN13810029)	56,364,104	83%	24%	247,288
SAMN13810030	NA	whole animal (Haliotis rufescens, SAMN13810030)	47,832,006	83%	23%	244,173
SAMN13810031	NA	whole animal (Haliotis rufescens, SAMN13810031)	49,355,886	83%	24%	231,750

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR768363	SRX246491	SRP018917	SAMN01931999	47,713,677	85%	22%
SRR770266	SRX246491	SRP018917	SAMN01931999	79,346,470	65%	22%
SRR1169755	SRX471241	SRP037568	SAMN02639871	18,237,514	82%	8%
SRR1168464	SRX470212	SRP037568	SAMN02639872	19,359,222	82%	8%
SRR1168465	SRX470213	SRP037568	SAMN02639873	20,715,303	82%	8%
SRR1168485	SRX470233	SRP037568	SAMN02639874	20,891,586	82%	9%
SRR1168488	SRX470235	SRP037568	SAMN02639875	18,314,555	79%	9%
SRR1168489	SRX470236	SRP037568	SAMN02639876	17,884,911	82%	8%
SRR1168490	SRX470237	SRP037568	SAMN02639877	21,600,143	81%	7%
SRR1168507	SRX470247	SRP037568	SAMN02639878	55,461,647	83%	2%
SRR1168714	SRX470398	SRP037568	SAMN02639879	25,806,887	81%	7%
SRR1168778	SRX470405	SRP037568	SAMN02639880	9,540,897	82%	8%
SRR1168786	SRX470406	SRP037568	SAMN02639881	16,034,700	81%	8%
SRR1168787	SRX470410	SRP037568	SAMN02639940	38,863,081	82%	8%
SRR1168791	SRX470414	SRP037568	SAMN02639941	20,813,607	82%	7%
SRR1168793	SRX470415	SRP037568	SAMN02639942	56,542,540	81%	8%
SRR1169691	SRX471185	SRP037568	SAMN02639944	19,562,555	81%	8%
SRR1169694	SRX471187	SRP037568	SAMN02639945	14,386,277	81%	7%
SRR1169695	SRX471189	SRP037568	SAMN02639946	16,062,896	81%	7%
SRR1169696	SRX471190	SRP037568	SAMN02639947	19,493,129	80%	7%
SRR1169698	SRX471191	SRP037568	SAMN02639948	14,751,891	81%	7%
SRR1169699	SRX471193	SRP037568	SAMN02639949	13,782,992	80%	7%
SRR1169701	SRX471194	SRP037568	SAMN02639950	12,519,874	82%	7%
SRR1169702	SRX471195	SRP037568	SAMN02639952	21,521,422	81%	7%
SRR1169704	SRX471196	SRP037568	SAMN02639953	20,184,711	81%	7%
SRR1169708	SRX471198	SRP037568	SAMN02639954	44,254,173	80%	7%
SRR1169750	SRX471237	SRP037568	SAMN02639971	24,463,643	82%	7%
SRR1169751	SRX471238	SRP037568	SAMN02639972	23,620,142	80%	7%
SRR1169752	SRX471239	SRP037568	SAMN02639973	16,631,384	81%	7%
SRR1169753	SRX471240	SRP037568	SAMN02639974	23,501,344	82%	8%
SRR1170975	SRX472065	SRP037568	SAMN02639975	18,788,376	80%	7%
SRR1170976	SRX472066	SRP037568	SAMN02639976	18,876,860	80%	7%
SRR1170977	SRX472067	SRP037568	SAMN02639977	13,731,439	81%	7%
SRR1170978	SRX472068	SRP037568	SAMN02639978	20,482,188	81%	7%
SRR1170980	SRX472070	SRP037568	SAMN02639979	52,703,533	81%	8%
SRR1170981	SRX472071	SRP037568	SAMN02639981	18,719,311	81%	8%
SRR1170982	SRX472072	SRP037568	SAMN02639982	21,938,198	82%	8%
SRR1170983	SRX472073	SRP037568	SAMN02639983	19,756,794	82%	8%
SRR1170984	SRX472074	SRP037568	SAMN02639984	11,206,121	80%	8%
SRR1170985	SRX472075	SRP037568	SAMN02639985	7,581,379	81%	6%
SRR1170986	SRX472076	SRP037568	SAMN02639986	25,503,910	82%	7%
SRR1170987	SRX472077	SRP037568	SAMN02639987	15,676,708	81%	7%
SRR1170988	SRX472078	SRP037568	SAMN02639988	12,682,115	81%	8%
SRR1170989	SRX472079	SRP037568	SAMN02639989	17,740,589	81%	7%
SRR1170990	SRX472080	SRP037568	SAMN02639990	18,900,103	81%	8%
SRR1170991	SRX472081	SRP037568	SAMN02639991	15,523,840	81%	7%
SRR1168492	SRX470240	SRP037568	SAMN02641394	19,410,184	82%	8%
SRR1169692	SRX471186	SRP037568	SAMN02641395	31,951,594	82%	8%
SRR1926333	SRX965945	SRP056469	SAMN03444423	764,686	86%	13%
SRR8956803	SRX5736361	SRP193878	SAMN11492961	53,495,900	46%	12%
SRR8956802	SRX5736362	SRP193878	SAMN11492962	38,510,082	56%	38%
SRR8956805	SRX5736359	SRP193878	SAMN11492963	60,903,467	57%	39%
SRR8956804	SRX5736360	SRP193878	SAMN11492964	40,394,747	78%	30%
SRR8956799	SRX5736365	SRP193878	SAMN11492965	48,413,604	51%	28%
SRR8956798	SRX5736366	SRP193878	SAMN11492966	51,848,457	78%	26%
SRR8956801	SRX5736363	SRP193878	SAMN11492967	61,986,501	47%	27%
SRR8956800	SRX5736364	SRP193878	SAMN11492968	49,959,588	53%	26%
SRR8956796	SRX5736368	SRP193878	SAMN11492970	52,640,520	70%	39%
SRR8956781	SRX5736383	SRP193878	SAMN11492971	35,350,653	68%	25%
SRR8956780	SRX5736384	SRP193878	SAMN11492972	51,926,203	80%	26%
SRR8956779	SRX5736385	SRP193878	SAMN11492973	63,565,753	59%	1%
SRR8956778	SRX5736386	SRP193878	SAMN11492974	50,388,138	52%	24%
SRR8956777	SRX5736387	SRP193878	SAMN11492975	86,475,895	75%	30%
SRR8956776	SRX5736388	SRP193878	SAMN11492976	40,584,989	70%	28%
SRR8956775	SRX5736389	SRP193878	SAMN11492977	29,894,910	67%	26%
SRR8956774	SRX5736390	SRP193878	SAMN11492978	38,295,258	80%	30%
SRR8956773	SRX5736391	SRP193878	SAMN11492979	46,335,377	78%	48%
SRR8956772	SRX5736392	SRP193878	SAMN11492980	48,438,630	81%	47%
SRR8956790	SRX5736374	SRP193878	SAMN11492981	45,404,365	80%	32%
SRR8956791	SRX5736373	SRP193878	SAMN11492982	46,711,683	77%	30%
SRR8956788	SRX5736376	SRP193878	SAMN11492983	42,253,601	75%	29%
SRR8956789	SRX5736375	SRP193878	SAMN11492984	51,929,689	76%	31%
SRR8956786	SRX5736378	SRP193878	SAMN11492985	38,790,227	76%	29%
SRR8956787	SRX5736377	SRP193878	SAMN11492986	38,453,939	78%	28%
SRR8956784	SRX5736380	SRP193878	SAMN11492987	47,156,331	73%	17%
SRR8956785	SRX5736379	SRP193878	SAMN11492988	52,211,146	71%	9%
SRR8956793	SRX5736371	SRP193878	SAMN11492989	36,998,288	61%	25%
SRR8956794	SRX5736370	SRP193878	SAMN11492990	40,201,035	57%	26%
SRR8956771	SRX5736393	SRP193878	SAMN11492991	33,406,354	71%	36%
SRR8956770	SRX5736394	SRP193878	SAMN11492992	36,800,084	52%	37%
SRR8956783	SRX5736381	SRP193878	SAMN11492993	50,215,273	62%	35%
SRR8956782	SRX5736382	SRP193878	SAMN11492994	35,983,497	70%	36%
SRR8956767	SRX5736397	SRP193878	SAMN11492995	39,455,503	80%	45%
SRR8956766	SRX5736398	SRP193878	SAMN11492996	22,896,311	81%	41%
SRR8956769	SRX5736395	SRP193878	SAMN11492997	46,670,189	76%	30%
SRR8956768	SRX5736396	SRP193878	SAMN11492998	48,539,426	76%	31%
SRR8956795	SRX5736369	SRP193878	SAMN11492999	42,019,862	79%	34%
SRR8956792	SRX5736372	SRP193878	SAMN11493000	54,439,427	79%	34%
SRR10858543	SRX7528633	SRP240927	SAMN13810026	62,025,210	83%	25%
SRR10858542	SRX7528634	SRP240927	SAMN13810027	46,064,086	82%	24%
SRR10858539	SRX7528637	SRP240927	SAMN13810028	49,695,354	83%	25%
SRR10858538	SRX7528638	SRP240927	SAMN13810029	56,364,104	83%	24%
SRR10858537	SRX7528639	SRP240927	SAMN13810030	47,832,006	83%	23%
SRR10858536	SRX7528640	SRP240927	SAMN13810031	49,355,886	83%	24%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Octopus sinensis high-quality model RefSeq (XP_)	13,472	10,141 (75.27%)	10,141 (75.27%)	61.60%	51.94%
Crassostrea gigas high-quality model RefSeq (XP_)	28,029	18,611 (66.40%)	18,611 (66.40%)	57.72%	40.53%
Mollusca GenBank	15,725	8,681 (55.21%)	8,681 (55.21%)	72.74%	74.78%
Mollusca known RefSeq (NP_)	484	426 (88.02%)	426 (88.02%)	73.54%	75.32%
Same-species GenBank	107	98 (91.59%)	98 (91.59%)	84.80%	93.02%
Aplysia californica high-quality model RefSeq (XP_)	9,849	7,604 (77.21%)	7,604 (77.21%)	61.12%	53.98%
Pecten maximus high-quality model RefSeq (XP_)	18,685	14,027 (75.07%)	14,027 (75.07%)	60.26%	49.52%
Homo sapiens known RefSeq (NP_)	63,715	41,927 (65.80%)	41,927 (65.80%)	58.98%	42.55%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences