NCBI Anopheles coluzzii Annotation Release 101

The RefSeq genome records for Anopheles coluzzii were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Anopheles coluzzii Annotation Release 101

Annotation release ID: 101
Date of Entrez queries for transcripts and proteins: Jul 15 2022
Date of submission of annotation to the public databases: Jul 26 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
AcolN3	GCF_943734685.1	WELLCOME SANGER INSTITUTE	06-24-2022	Reference	4 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	AcolN3
Genes and pseudogenes	14,579
protein-coding	12,410
non-coding	1,951
Transcribed pseudogenes	0
Non-transcribed pseudogenes	218
genes with variants	3,816
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	24,012
fully-supported	23,244
with > 5% ab initio	502
partial	50
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	24,012
non-coding RNAs	2,557
fully-supported	1,176
with > 5% ab initio	0
partial	5
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,161
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	24,012
fully-supported	23,244
with > 5% ab initio	525
partial	50
with major correction(s)	48
known RefSeq (NP_)	0
model RefSeq (XP_)	24,012

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	14,361	12,233	2,888	56	600,156
All transcripts	26,569	4,155	3,122	56	57,046
mRNA	24,012	4,406	3,317	199	57,046
misc_RNA	408	4,875	3,675	248	26,781
tRNA	396	74	73	71	87
lncRNA	776	2,087	1,468	137	16,140
snoRNA	10	111	106	72	203
snRNA	35	132	127	56	195
rRNA	932	1,010	158	119	4,110
Single-exon transcripts	1,127	1,971	1,366	237	19,028
coding transcripts (NM_/XM_ )	1,127	1,971	1,366	237	19,028
CDSs	24,012	2,439	1,686	111	55,746
Exons	73,326	649	299	4	23,700
in coding transcripts (NM_/XM_ )	70,827	645	298	4	23,700
in non-coding transcripts (NR_/XR_ )	4,008	648	265	10	17,480
Introns	59,058	3,400	144	30	582,452
in coding transcripts (NM_/XM_ )	57,401	3,346	145	30	489,093
in non-coding transcripts (NR_/XR_ )	3,094	4,354	144	31	582,452

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.87	1	1	50
Number of exons per transcript	6.73	5	1	76

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the diptera_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Drosophila melanogaster known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 12410 coding genes, 9745 genes had a protein with an alignment covering 50% or more of the query and 3035 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Drosophila melanogaster known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
AcolN3	GCF_943734685.1	25.86%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	9	9 (100.00%)	9 (100.00%)	99.66%	100.00%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,620,287,806	65%	11%	76,510
SAMEA104557110	NA	whole organism (Anopheles coluzzii, female, SAMEA104557110)	44,807,656	23%	3%	31,628
SAMEA104557111	NA	whole organism (Anopheles coluzzii, female, SAMEA104557111)	57,263,783	21%	3%	30,368
SAMEA104557112	NA	whole organism (Anopheles coluzzii, female, SAMEA104557112)	55,824,736	14%	3%	26,430
SAMEA104557113	NA	whole organism (Anopheles coluzzii, female, SAMEA104557113)	48,011,734	19%	3%	28,377
SAMEA104557114	NA	whole organism (Anopheles coluzzii, female, SAMEA104557114)	46,572,148	23%	3%	29,407
SAMEA104557115	NA	whole organism (Anopheles coluzzii, female, SAMEA104557115)	57,675,707	15%	3%	27,922
SAMEA104557116	NA	whole organism (Anopheles coluzzii, female, SAMEA104557116)	46,370,406	18%	3%	26,400
SAMEA104557118	NA	whole organism (Anopheles coluzzii, female, SAMEA104557118)	61,379,306	16%	3%	30,481
SAMEA104557120	NA	whole organism (Anopheles coluzzii, female, SAMEA104557120)	65,998,823	13%	3%	29,405
SAMEA104557121	NA	whole organism (Anopheles coluzzii, female, SAMEA104557121)	51,947,039	14%	3%	27,586
SAMEA104557122	NA	whole organism (Anopheles coluzzii, female, SAMEA104557122)	48,182,249	16%	3%	29,182
SAMEA104557123	NA	whole organism (Anopheles coluzzii, female, SAMEA104557123)	66,458,476	20%	3%	32,912
SAMN00009554	NA	gastric caeca (Anopheles coluzzii, SAMN00009554)	218,732	49%	13%	2,990
SAMN00009555	NA	malpighian tubules (Anopheles coluzzii, SAMN00009555)	202,442	3%	19%	743
SAMN02928775	26945667	18 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928775)	14,305,962	60%	10%	46,745
SAMN02928776	26945667	18 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928776)	12,522,440	85%	11%	48,483
SAMN02928777	26945667	18 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928777)	21,602,803	79%	11%	52,567
SAMN02928778	26945667	18 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928778)	27,724,958	69%	11%	53,276
SAMN02928779	26945667	18 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928779)	15,409,037	76%	12%	49,893
SAMN02928780	26945667	18 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928780)	12,483,546	82%	12%	49,298
SAMN02928781	26945667	18 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928781)	14,726,767	79%	10%	48,972
SAMN02928782	26945667	18 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928782)	18,791,839	39%	9%	44,415
SAMN02928791	26945667	66 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928791)	15,556,301	84%	13%	47,987
SAMN02928792	26945667	66 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928792)	8,323,916	77%	12%	41,016
SAMN02928793	26945667	66 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928793)	10,596,461	83%	12%	43,568
SAMN02928794	26945667	66 hr larvae, distilled water exposed (Anopheles coluzzii, SAMN02928794)	22,211,086	80%	12%	48,807
SAMN02928795	26945667	66 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928795)	18,579,233	51%	11%	44,048
SAMN02928796	26945667	66 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928796)	31,150,673	80%	12%	53,029
SAMN02928797	26945667	66 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928797)	21,697,148	84%	12%	50,036
SAMN02928798	26945667	66 hr larvae, saltwater exposed (Anopheles coluzzii, SAMN02928798)	20,089,404	79%	12%	48,684
SAMN04252494	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252494)	52,146,790	89%	13%	57,852
SAMN04252495	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252495)	46,397,470	90%	14%	56,652
SAMN04252496	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252496)	58,317,152	89%	14%	58,099
SAMN04252497	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252497)	46,305,624	88%	14%	56,807
SAMN04252498	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252498)	45,953,814	89%	14%	56,382
SAMN04252499	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252499)	43,938,968	86%	14%	55,589
SAMN04252506	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252506)	51,829,120	85%	13%	57,029
SAMN04252507	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252507)	48,379,230	88%	13%	57,561
SAMN04252508	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252508)	71,779,478	85%	14%	59,066
SAMN04252515	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252515)	66,870,612	89%	13%	58,711
SAMN04252516	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252516)	66,583,032	89%	14%	58,721
SAMN04252517	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252517)	51,591,496	87%	14%	55,814
SAMN04252519	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252519)	53,025,418	89%	14%	58,132
SAMN04252520	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252520)	59,871,106	84%	14%	58,245
SAMN04252521	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252521)	59,517,230	89%	13%	58,450
SAMN04252522	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252522)	59,531,320	89%	13%	58,293
SAMN04252523	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252523)	46,363,570	90%	14%	56,606
SAMN04252524	26945667	pool of whole body of 35-50 larvae (Anopheles coluzzii, 18 h post-hatch, SAMN04252524)	52,344,298	83%	14%	56,826
SAMN06920435	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920435)	2,322,892	83%	12%	17,959
SAMN06920436	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920436)	24,984,208	83%	13%	32,348
SAMN06920437	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920437)	114,818,402	18%	7%	32,764
SAMN06920438	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920438)	21,692,192	85%	12%	35,104
SAMN06920439	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920439)	50,918,068	76%	12%	35,072
SAMN06920440	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920440)	19,679,280	86%	12%	34,061
SAMN06920441	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920441)	61,476,208	66%	11%	40,660
SAMN06920442	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920442)	26,355,322	86%	12%	36,093
SAMN06920443	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920443)	27,767,002	84%	11%	31,833
SAMN06920444	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920444)	6,249,572	80%	10%	20,060
SAMN06920445	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920445)	25,152,226	83%	10%	22,021
SAMN06920446	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920446)	19,143,760	82%	10%	19,982
SAMN06920447	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920447)	109,381,210	83%	11%	40,146
SAMN06920448	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920448)	17,256,910	81%	10%	27,907
SAMN06920449	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920449)	84,947,290	86%	10%	37,056
SAMN06920450	NA	Midgut (Anopheles coluzzii, Adults, 3 days post emergence, female, SAMN06920450)	56,605,264	77%	11%	30,299
SAMN06920451	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920451)	85,436,812	84%	11%	43,865
SAMN06920452	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920452)	92,561,464	90%	11%	40,914
SAMN06920453	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920453)	14,790,060	85%	11%	26,518
SAMN06920454	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920454)	36,437,358	87%	11%	32,159
SAMN06920455	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920455)	94,178,870	82%	12%	41,918
SAMN06920456	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920456)	112,859,658	80%	9%	32,131
SAMN06920457	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920457)	145,618,408	82%	11%	42,230
SAMN06920458	NA	Midgut (Anopheles coluzzii, Adults, 4 days post emergence, female, SAMN06920458)	32,264,254	87%	10%	29,880
SAMN06920459	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920459)	9,599,708	85%	12%	29,697
SAMN06920460	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920460)	8,811,814	90%	11%	30,127
SAMN06920461	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920461)	19,410,728	84%	11%	35,591
SAMN06920462	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920462)	19,172,728	87%	11%	34,265
SAMN06920463	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920463)	1,654,810	84%	11%	19,155
SAMN06920464	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920464)	26,437,064	85%	12%	34,408
SAMN06920465	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920465)	10,709,506	84%	11%	31,514
SAMN06920466	NA	Midgut (Anopheles coluzzii, Adults, 6 days post emergence, female, SAMN06920466)	76,569,666	87%	11%	43,370
SAMN06920467	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920467)	7,916,350	90%	10%	23,221
SAMN06920468	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920468)	19,637,792	75%	10%	18,864
SAMN06920469	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920469)	11,523,930	87%	11%	26,343
SAMN06920470	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920470)	23,898,618	87%	11%	28,007
SAMN06920471	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920471)	98,209,324	90%	11%	45,068
SAMN06920472	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920472)	142,449,520	89%	11%	43,255
SAMN06920473	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920473)	13,250,016	87%	11%	29,285
SAMN06920474	NA	Midgut (Anopheles coluzzii, Adults, 7 days post emergence, female, SAMN06920474)	14,347,338	88%	11%	20,631
SAMN07222166	NA	malpighian tubules (Anopheles coluzzii, female, SAMN07222166)	191,639,262
SAMN07222587	NA	malpighian tubules (Anopheles coluzzii, female, SAMN07222587)	188,024,114
SAMN07571904	NA	Anteannae (Anopheles coluzzii, 5-7 day, female, SAMN07571904)	53,462,674	80%	4%	45,037
SAMN07571905	NA	Anteannae (Anopheles coluzzii, 5-7 day, female, SAMN07571905)	35,277,933	82%	17%	48,946
SAMN07571906	NA	Maxillary Palp (Anopheles coluzzii, 5-7 day, female, SAMN07571906)	49,088,124	77%	4%	45,378
SAMN07571907	NA	Maxillary Palp (Anopheles coluzzii, 5-7 day, female, SAMN07571907)	46,810,073	81%	4%	44,166
SAMN09384610	29860336	whole pupae (Anopheles coluzzii, female, SAMN09384610)	54,677,678	80%	16%	58,361
SAMN09384611	29860336	whole pupae (Anopheles coluzzii, female, SAMN09384611)	54,525,428	83%	16%	58,532
SAMN09384612	29860336	whole pupae (Anopheles coluzzii, male, SAMN09384612)	49,031,431	85%	16%	58,526
SAMN09384613	29860336	whole pupae (Anopheles coluzzii, male, SAMN09384613)	52,125,424	80%	16%	59,525
SAMN09714764	30623175	Testes (Anopheles coluzzii, SAMN09714764)	9,484,578	73%	10%	40,072
SAMN09714765	30623175	Testes (Anopheles coluzzii, SAMN09714765)	24,534,146	72%	10%	46,037
SAMN09714766	30623175	Testes (Anopheles coluzzii, SAMN09714766)	48,246,186	72%	9%	52,049
SAMN09714767	30623175	Male Acessory Glands (MAGs) (Anopheles coluzzii, SAMN09714767)	22,581,554	69%	12%	45,248
SAMN09714768	30623175	Male Acessory Glands (MAGs) (Anopheles coluzzii, SAMN09714768)	1,340,512	70%	12%	17,439
SAMN09714769	30623175	Male Acessory Glands (MAGs) (Anopheles coluzzii, SAMN09714769)	35,512,548	67%	11%	49,126

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR2275802	ERX2327184	ERP106575	SAMEA104557110	44,807,656	23%	3%
ERR2275803	ERX2327185	ERP106575	SAMEA104557111	57,263,783	21%	3%
ERR2275804	ERX2327186	ERP106575	SAMEA104557112	55,824,736	14%	3%
ERR2275805	ERX2327187	ERP106575	SAMEA104557113	48,011,734	19%	3%
ERR2275806	ERX2327188	ERP106575	SAMEA104557114	46,572,148	23%	3%
ERR2275807	ERX2327189	ERP106575	SAMEA104557115	57,675,707	15%	3%
ERR2275808	ERX2327190	ERP106575	SAMEA104557116	46,370,406	18%	3%
ERR2275810	ERX2327192	ERP106575	SAMEA104557118	61,379,306	16%	3%
ERR2275812	ERX2327194	ERP106575	SAMEA104557120	65,998,823	13%	3%
ERR2275813	ERX2327195	ERP106575	SAMEA104557121	51,947,039	14%	3%
ERR2275814	ERX2327196	ERP106575	SAMEA104557122	48,182,249	16%	3%
ERR2275815	ERX2327197	ERP106575	SAMEA104557123	66,458,476	20%	3%
SRR037070	SRX017282	SRP002055	SAMN00009554	218,732	49%	13%
SRR037071	SRX017287	SRP002055	SAMN00009555	202,442	3%	19%
SRR1521746	SRX658333	SRP044683	SAMN02928775	14,305,962	60%	10%
SRR1521747	SRX658334	SRP044683	SAMN02928776	12,522,440	85%	11%
SRR1521748	SRX658335	SRP044683	SAMN02928777	21,602,803	79%	11%
SRR1521749	SRX658336	SRP044683	SAMN02928778	27,724,958	69%	11%
SRR1521750	SRX658337	SRP044683	SAMN02928779	15,409,037	76%	12%
SRR1521751	SRX658338	SRP044683	SAMN02928780	12,483,546	82%	12%
SRR1521752	SRX658339	SRP044683	SAMN02928781	14,726,767	79%	10%
SRR1521753	SRX658340	SRP044683	SAMN02928782	18,791,839	39%	9%
SRR1521762	SRX658349	SRP044683	SAMN02928791	15,556,301	84%	13%
SRR1521763	SRX658350	SRP044683	SAMN02928792	8,323,916	77%	12%
SRR1521764	SRX658351	SRP044683	SAMN02928793	10,596,461	83%	12%
SRR1521765	SRX658352	SRP044683	SAMN02928794	22,211,086	80%	12%
SRR1521766	SRX658353	SRP044683	SAMN02928795	18,579,233	51%	11%
SRR1521767	SRX658354	SRP044683	SAMN02928796	31,150,673	80%	12%
SRR1521768	SRX658355	SRP044683	SAMN02928797	21,697,148	84%	12%
SRR1521769	SRX658356	SRP044683	SAMN02928798	20,089,404	79%	12%
SRR2932595	SRX1424945	SRP065966	SAMN04252494	52,146,790	89%	13%
SRR2932596	SRX1424946	SRP065966	SAMN04252495	46,397,470	90%	14%
SRR2932597	SRX1424947	SRP065966	SAMN04252496	58,317,152	89%	14%
SRR2932598	SRX1424948	SRP065966	SAMN04252497	46,305,624	88%	14%
SRR2932599	SRX1424949	SRP065966	SAMN04252498	45,953,814	89%	14%
SRR2932600	SRX1424950	SRP065966	SAMN04252499	9,751,376	85%	14%
SRR2932601	SRX1424950	SRP065966	SAMN04252499	8,938,844	86%	14%
SRR2932602	SRX1424950	SRP065966	SAMN04252499	7,735,704	86%	14%
SRR2932603	SRX1424950	SRP065966	SAMN04252499	9,239,328	86%	14%
SRR2932604	SRX1424950	SRP065966	SAMN04252499	8,273,716	86%	14%
SRR2932571	SRX1424921	SRP065966	SAMN04252506	51,829,120	85%	13%
SRR2932572	SRX1424922	SRP065966	SAMN04252507	48,379,230	88%	13%
SRR2932573	SRX1424923	SRP065966	SAMN04252508	71,779,478	85%	14%
SRR2932574	SRX1424924	SRP065966	SAMN04252515	66,870,612	89%	13%
SRR2932575	SRX1424925	SRP065966	SAMN04252516	66,583,032	89%	14%
SRR2932576	SRX1424926	SRP065966	SAMN04252517	51,591,496	87%	14%
SRR2932583	SRX1424933	SRP065966	SAMN04252519	53,025,418	89%	14%
SRR2932584	SRX1424934	SRP065966	SAMN04252520	59,871,106	84%	14%
SRR2932585	SRX1424935	SRP065966	SAMN04252521	59,517,230	89%	13%
SRR2932586	SRX1424936	SRP065966	SAMN04252522	59,531,320	89%	13%
SRR2932587	SRX1424937	SRP065966	SAMN04252523	46,363,570	90%	14%
SRR2932588	SRX1424938	SRP065966	SAMN04252524	52,344,298	83%	14%
SRR5526041	SRX2795484	SRP106793	SAMN06920435	2,322,892	83%	12%
SRR5526042	SRX2795483	SRP106793	SAMN06920436	24,984,208	83%	13%
SRR5526039	SRX2795486	SRP106793	SAMN06920437	114,818,402	18%	7%
SRR5526040	SRX2795485	SRP106793	SAMN06920438	21,692,192	85%	12%
SRR5526045	SRX2795480	SRP106793	SAMN06920439	50,918,068	76%	12%
SRR5526046	SRX2795479	SRP106793	SAMN06920440	19,679,280	86%	12%
SRR5526043	SRX2795482	SRP106793	SAMN06920441	61,476,208	66%	11%
SRR5526044	SRX2795481	SRP106793	SAMN06920442	26,355,322	86%	12%
SRR5526048	SRX2795477	SRP106793	SAMN06920443	27,767,002	84%	11%
SRR5526049	SRX2795476	SRP106793	SAMN06920444	6,249,572	80%	10%
SRR5526053	SRX2795472	SRP106793	SAMN06920445	25,152,226	83%	10%
SRR5526054	SRX2795471	SRP106793	SAMN06920446	19,143,760	82%	10%
SRR5526051	SRX2795474	SRP106793	SAMN06920447	109,381,210	83%	11%
SRR5526052	SRX2795473	SRP106793	SAMN06920448	17,256,910	81%	10%
SRR5526057	SRX2795468	SRP106793	SAMN06920449	84,947,290	86%	10%
SRR5526058	SRX2795467	SRP106793	SAMN06920450	56,605,264	77%	11%
SRR5526055	SRX2795470	SRP106793	SAMN06920451	85,436,812	84%	11%
SRR5526056	SRX2795469	SRP106793	SAMN06920452	92,561,464	90%	11%
SRR5526059	SRX2795466	SRP106793	SAMN06920453	14,790,060	85%	11%
SRR5526060	SRX2795465	SRP106793	SAMN06920454	36,437,358	87%	11%
SRR5526068	SRX2795457	SRP106793	SAMN06920455	94,178,870	82%	12%
SRR5526067	SRX2795458	SRP106793	SAMN06920456	112,859,658	80%	9%
SRR5526065	SRX2795460	SRP106793	SAMN06920457	145,618,408	82%	11%
SRR5526066	SRX2795459	SRP106793	SAMN06920458	32,264,254	87%	10%
SRR5526072	SRX2795453	SRP106793	SAMN06920459	9,599,708	85%	12%
SRR5526071	SRX2795454	SRP106793	SAMN06920460	8,811,814	90%	11%
SRR5526070	SRX2795455	SRP106793	SAMN06920461	19,410,728	84%	11%
SRR5526069	SRX2795456	SRP106793	SAMN06920462	19,172,728	87%	11%
SRR5526064	SRX2795461	SRP106793	SAMN06920463	1,654,810	84%	11%
SRR5526063	SRX2795462	SRP106793	SAMN06920464	26,437,064	85%	12%
SRR5526078	SRX2795447	SRP106793	SAMN06920465	10,709,506	84%	11%
SRR5526076	SRX2795449	SRP106793	SAMN06920466	76,569,666	87%	11%
SRR5526047	SRX2795478	SRP106793	SAMN06920467	7,916,350	90%	10%
SRR5526050	SRX2795475	SRP106793	SAMN06920468	19,637,792	75%	10%
SRR5526073	SRX2795452	SRP106793	SAMN06920469	11,523,930	87%	11%
SRR5526074	SRX2795451	SRP106793	SAMN06920470	23,898,618	87%	11%
SRR5526075	SRX2795450	SRP106793	SAMN06920471	98,209,324	90%	11%
SRR5526077	SRX2795448	SRP106793	SAMN06920472	142,449,520	89%	11%
SRR5526061	SRX2795464	SRP106793	SAMN06920473	13,250,016	87%	11%
SRR5526062	SRX2795463	SRP106793	SAMN06920474	14,347,338	88%	11%
SRR5988577	SRX3144345	SRP116401	SAMN07571904	53,462,674	80%	4%
SRR5988576	SRX3144346	SRP116401	SAMN07571905	35,277,933	82%	17%
SRR5988575	SRX3144347	SRP116401	SAMN07571906	49,088,124	77%	4%
SRR5988574	SRX3144348	SRP116401	SAMN07571907	46,810,073	81%	4%
SRR6880004	SRX3832578	SRP136253	SAMN07222166	14,648,277
SRR6880003	SRX3832579	SRP136253	SAMN07222166	13,983,690
SRR6880002	SRX3832580	SRP136253	SAMN07222166	15,262,844
SRR6880001	SRX3832581	SRP136253	SAMN07222166	15,300,447
SRR6880000	SRX3832582	SRP136253	SAMN07222166	18,939,338
SRR6879999	SRX3832583	SRP136253	SAMN07222166	18,990,510
SRR6879998	SRX3832584	SRP136253	SAMN07222166	15,475,847
SRR6879997	SRX3832585	SRP136253	SAMN07222166	14,843,235
SRR6879994	SRX3832588	SRP136253	SAMN07222166	14,713,534
SRR6879993	SRX3832589	SRP136253	SAMN07222166	23,566,494
SRR6879992	SRX3832590	SRP136253	SAMN07222166	14,545,609
SRR6879991	SRX3832591	SRP136253	SAMN07222166	11,369,437
SRR6879996	SRX3832586	SRP136253	SAMN07222587	18,214,218
SRR6879995	SRX3832587	SRP136253	SAMN07222587	17,847,367
SRR6879990	SRX3832592	SRP136253	SAMN07222587	16,761,453
SRR6879989	SRX3832593	SRP136253	SAMN07222587	15,532,723
SRR6879988	SRX3832594	SRP136253	SAMN07222587	18,818,339
SRR6879987	SRX3832595	SRP136253	SAMN07222587	16,667,545
SRR6879986	SRX3832596	SRP136253	SAMN07222587	16,778,108
SRR6879985	SRX3832597	SRP136253	SAMN07222587	22,014,330
SRR6879984	SRX3832598	SRP136253	SAMN07222587	15,946,996
SRR6879983	SRX3832599	SRP136253	SAMN07222587	9,761,894
SRR6879982	SRX3832600	SRP136253	SAMN07222587	7,130,795
SRR6879981	SRX3832601	SRP136253	SAMN07222587	12,550,346
SRR7345498	SRX4218907	SRP150554	SAMN09384610	54,677,678	80%	16%
SRR7345497	SRX4218908	SRP150554	SAMN09384611	54,525,428	83%	16%
SRR7345500	SRX4218905	SRP150554	SAMN09384612	49,031,431	85%	16%
SRR7345499	SRX4218906	SRP150554	SAMN09384613	52,125,424	80%	16%
SRR7591647	SRX4456862	SRP155215	SAMN09714764	9,484,578	73%	10%
SRR7591646	SRX4456861	SRP155215	SAMN09714765	24,534,146	72%	10%
SRR7591645	SRX4456860	SRP155215	SAMN09714766	48,246,186	72%	9%
SRR7591644	SRX4456859	SRP155215	SAMN09714767	22,581,554	69%	12%
SRR7591643	SRX4456858	SRP155215	SAMN09714768	1,340,512	70%	12%
SRR7591642	SRX4456857	SRP155215	SAMN09714769	35,512,548	67%	11%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	9	9 (100.00%)	9 (100.00%)	98.77%	99.77%
Zeugodacus cucurbitae high-quality model RefSeq (XP_)	11,469	9,188 (80.11%)	9,188 (80.11%)	61.67%	54.89%
Insecta GenBank	117,727	91,407 (77.64%)	91,407 (77.64%)	67.78%	67.87%
Insecta known RefSeq (NP_)	39,186	31,600 (80.64%)	31,600 (80.64%)	66.39%	61.28%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	8,762 (76.28%)	8,762 (76.28%)	60.77%	55.31%
Aedes aegypti high-quality model RefSeq (XP_)	13,217	12,014 (90.90%)	12,014 (90.90%)	76.63%	79.32%
Apis mellifera high-quality model RefSeq (XP_)	8,879	7,014 (79.00%)	7,014 (79.00%)	62.67%	58.92%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
AcolN3 (Current) Coverage: 84.10%	AcolN3 (Current) Coverage: 87.21%
AcolMOP1 (Previous) Coverage: 80.77%	AcolMOP1 (Previous) Coverage: 84.16%
Percent Identity: 91.64%	Percent Identity: 92.03%

Comparison of the current and previous annotations

The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	AcolN3 (Current) to AcolMOP1 (Previous)
Identical	6%
Minor changes	69%
Major changes	10%
New	14%
Deprecated	15%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences