NCBI Bombyx mori Annotation Release GCF_030269925.1-RS_2024_01

The genome sequence records for Bombyx mori RefSeq assembly GCF_030269925.1 (ASM3026992v2) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_030269925.1-RS_2024_01".

Date of Entrez queries for transcripts and proteins: Jan 25 2024
Date of submission of annotation to the public databases: Feb 5 2024
Software version: 10.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM3026992v2	GCF_030269925.1	Shimada-lab Faculty of Science Department of Life Science, Gakushuin University	07-06-2023	Reference	30 assembled chromosomes

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM3026992v2
Genes and pseudogenes	18,210
protein-coding	13,459
non-coding	4,662
Transcribed pseudogenes	2
Non-transcribed pseudogenes	81
genes with variants	6,009
Immunoglobulin/T-cell receptor gene segments	0
other	6
mRNAs	28,334
fully-supported	27,365
with > 5% ab initio	802
partial	313
with filled gap(s)	2
known RefSeq (NM_)	2,349
model RefSeq (XM_)	25,985
non-coding RNAs	6,516
fully-supported	5,731
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	412
model RefSeq (XR_)	5,570
pseudo transcripts	2
fully-supported	2
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	1
model RefSeq (XR_)	1
CDSs	28,341
fully-supported	27,365
with > 5% ab initio	828
partial	72
with major correction(s)	161
known RefSeq (NP_)	2,356
model RefSeq (XP_)	25,985

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	18,127	18,675	5,902	59	960,210
All transcripts	34,850	3,439	2,483	18	55,048
mRNA	28,334	3,825	2,903	192	55,048
misc_RNA	990	4,111	3,014	194	37,453
miRNA	508	23	22	18	29
tRNA	431	74	73	62	84
lncRNA	4,339	1,663	1,010	131	26,214
snoRNA	16	123	110	71	202
snRNA	100	132	112	90	199
rRNA	132	851	119	118	4,118
Single-exon transcripts	1,222	1,913	1,380	218	36,344
coding transcripts (NM_/XM_ )	1,218	1,911	1,380	218	36,344
non-coding transcripts (NR_/XR_ )	4	2,344	3,434	659	3,982
CDSs	28,347	2,063	1,464	33	54,378
Exons	130,829	437	176	4	37,474
in coding transcripts (NM_/XM_ )	119,215	414	171	4	37,474
in non-coding transcripts (NR_/XR_ )	15,585	563	222	9	21,996
Introns	109,232	3,479	784	30	922,858
in coding transcripts (NM_/XM_ )	102,203	3,483	796	30	922,858
in non-coding transcripts (NR_/XR_ )	10,983	3,199	655	30	479,197

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.94	1	1	51
Number of exons per transcript	9.4	6	1	160

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the lepidoptera_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Drosophila melanogaster known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 13446 coding genes, 9589 genes had a protein with an alignment covering 50% or more of the query and 2885 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Drosophila melanogaster known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM3026992v2	GCF_030269925.1	46.69%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	2,813	2,800 (99.54%)	2,729 (97.01%)	99.55%	99.25%
Same-species Genbank	17,073	16,512 (96.71%)	15,420 (90.32%)	99.43%	98.79%
Same-species TSA	111,769	106,292 (95.10%)	97,166 (86.93%)	99.89%	99.72%
Same-species EST	568,813	515,828 (90.68%)	500,491 (87.99%)	99.45%	99.12%

RefSeq transcript alignment quality report

The known RefSeq transcripts (NM_ and NR_ accessions) are a set of hiqh-quality transcripts maintained by the RefSeq group at NCBI. Alignment statistics for this group of transcripts, such as percent and number of sequences not aligning at all, percent best alignments split between multiple scaffolds, and percent alignments not covering the full CDS are indicative of the genome quality and are provided below.

	ASM3026992v2 Primary Assembly
Number of sequences retrieved from Entrez	2,813
Number (%) of sequences not aligning	13 (0.46%)
Number (%) of sequences with multiple best alignments (split genes)	2 (0.07%)
Number (%) of sequences with CDS coverage < 95%	24 (1.01%)

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,543,401,370	81%	40%	128,320
SAMD00506766	NA	Anterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506766)	39,885,710	79%	38%	72,303
SAMD00506767	NA	Anterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506767)	38,795,670	75%	41%	64,680
SAMD00506768	NA	Middle part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506768)	45,288,676	73%	38%	67,745
SAMD00506769	NA	Posterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506769)	48,918,064	78%	36%	68,225
SAMD00506770	NA	Posterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506770)	56,581,696	78%	49%	66,188
SAMD00506771	NA	Fat body (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506771)	42,028,122	80%	29%	65,384
SAMD00506772	NA	Midgut (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506772)	40,563,848	81%	49%	67,363
SAMD00506773	NA	Malpighian tubule (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506773)	42,257,296	76%	37%	71,734
SAMD00506774	NA	Testis (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506774)	47,331,412	80%	37%	92,273
SAMD00506775	NA	Ovary (Bombyx mori, 3rd-Day fifth-instar larva, female, SAMD00506775)	37,401,666	79%	38%	83,921
SAMD00506776	NA	Anterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506776)	35,175,928	81%	37%	65,408
SAMD00506777	NA	Anterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506777)	42,155,964	75%	41%	64,679
SAMD00506778	NA	Middle part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506778)	36,506,690	74%	37%	64,618
SAMD00506779	NA	Posterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506779)	40,630,934	78%	36%	67,318
SAMD00506780	NA	Posterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506780)	48,801,442	77%	49%	65,466
SAMD00506781	NA	Fat body (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506781)	35,604,522	81%	30%	63,716
SAMD00506782	NA	Midgut (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506782)	42,543,900	82%	43%	68,626
SAMD00506783	NA	Malpighian tubule (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506783)	37,197,106	76%	37%	72,032
SAMD00506784	NA	Testis (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506784)	34,425,902	80%	37%	91,070
SAMD00506785	NA	Ovary (Bombyx mori, 3rd-Day fifth-instar larva, female, SAMD00506785)	36,998,012	80%	37%	83,798
SAMD00506786	NA	Anterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506786)	45,775,886	79%	38%	70,811
SAMD00506787	NA	Anterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506787)	39,049,890	75%	41%	64,273
SAMD00506788	NA	Middle part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506788)	55,658,206	72%	39%	66,376
SAMD00506789	NA	Posterior part of the middle silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506789)	53,003,350	79%	36%	68,175
SAMD00506790	NA	Posterior silk gland (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506790)	36,673,608	77%	49%	62,781
SAMD00506791	NA	Fat body (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506791)	35,314,954	81%	29%	63,572
SAMD00506792	NA	Midgut (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506792)	40,971,400	82%	49%	66,947
SAMD00506793	NA	Malpighian tubule (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506793)	44,800,698	76%	34%	71,827
SAMD00506794	NA	Testis (Bombyx mori, 3rd-Day fifth-instar larva, male, SAMD00506794)	40,776,322	80%	37%	91,338
SAMD00506795	NA	Ovary (Bombyx mori, 3rd-Day fifth-instar larva, female, SAMD00506795)	40,141,466	80%	38%	87,876
SAMD00631318	NA	testis (Bombyx mori, male, SAMD00631318)	21,476,966	64%	13%	39,284
SAMD00631319	NA	testis (Bombyx mori, male, SAMD00631319)	19,973,349	61%	13%	35,723
SAMD00631320	NA	testis (Bombyx mori, male, SAMD00631320)	15,512,445	61%	13%	33,486
SAMD00631321	NA	testis (Bombyx mori, male, SAMD00631321)	14,271,374	60%	15%	30,804
SAMD00631322	NA	testis (Bombyx mori, male, SAMD00631322)	19,880,625	60%	14%	32,919
SAMD00631323	NA	testis (Bombyx mori, male, SAMD00631323)	26,262,301	61%	14%	36,819
SAMN18603838	NA	male fat body (Bombyx mori, SAMN18603838)	58,001,004	91%	30%	72,678
SAMN18603839	NA	anterior gut (Bombyx mori, SAMN18603839)	53,382,676	91%	47%	74,210
SAMN18603840	NA	anterior silk gland (Bombyx mori, SAMN18603840)	56,133,626	90%	37%	75,515
SAMN18603841	NA	brain (Bombyx mori, SAMN18603841)	55,400,324	88%	33%	89,899
SAMN18603842	NA	epidermis (Bombyx mori, SAMN18603842)	52,296,792	86%	36%	75,269
SAMN18603843	NA	female fat body (Bombyx mori, SAMN18603843)	56,628,376	92%	32%	73,842
SAMN18603844	NA	middle gut (Bombyx mori, SAMN18603844)	51,124,350	85%	43%	76,370
SAMN18603845	NA	malpighian tubule (Bombyx mori, SAMN18603845)	58,143,876	82%	39%	73,246
SAMN18603846	NA	middle silk gland (Bombyx mori, SAMN18603846)	58,679,904	86%	37%	79,004
SAMN18603847	NA	nerve (Bombyx mori, SAMN18603847)	56,360,866	88%	34%	89,724
SAMN18603848	NA	ovary (Bombyx mori, SAMN18603848)	54,633,964	90%	38%	93,091
SAMN18603849	NA	posterior gut (Bombyx mori, SAMN18603849)	50,663,632	86%	34%	79,938
SAMN18603850	NA	posterior silk gland (Bombyx mori, SAMN18603850)	53,808,646	74%	43%	70,379
SAMN18603851	NA	testis (Bombyx mori, SAMN18603851)	55,497,260	90%	38%	100,576
SAMN18603852	NA	trachea (Bombyx mori, SAMN18603852)	56,686,020	90%	35%	81,103
SAMN18603853	NA	wing disc (Bombyx mori, SAMN18603853)	52,618,846	90%	38%	83,215
SAMN28535297	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535297)	53,688,380	85%	48%	72,302
SAMN28535298	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535298)	52,235,536	84%	46%	74,469
SAMN28535299	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535299)	49,275,636	85%	47%	73,565
SAMN28535300	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535300)	48,649,046	85%	47%	70,125
SAMN28535301	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535301)	50,566,774	84%	46%	71,135
SAMN28535302	NA	midgut (Bombyx mori, Five-year-old silkworm, male and female, SAMN28535302)	56,010,756	86%	46%	71,688
SAMN32018985	NA	hemocytes (Bombyx mori, SAMN32018985)	47,488,424	79%	44%	70,917
SAMN32018986	NA	hemocytes (Bombyx mori, SAMN32018986)	47,445,542	81%	44%	70,597
SAMN32018987	NA	hemocytes (Bombyx mori, SAMN32018987)	46,265,884	80%	44%	70,888
SAMN32018988	NA	hemocytes (Bombyx mori, SAMN32018988)	42,386,092	79%	47%	72,242
SAMN32018989	NA	hemocytes (Bombyx mori, SAMN32018989)	45,993,854	82%	47%	71,542
SAMN32018990	NA	hemocytes (Bombyx mori, SAMN32018990)	42,483,856	82%	47%	72,015
SAMN32018991	NA	hemocytes (Bombyx mori, SAMN32018991)	49,769,268	78%	46%	73,823
SAMN32018992	NA	hemocytes (Bombyx mori, SAMN32018992)	49,033,332	79%	45%	74,494
SAMN32018993	NA	hemocytes (Bombyx mori, SAMN32018993)	46,052,374	80%	46%	75,446
SAMN35672750	NA	midgut (Bombyx mori, SAMN35672750)	58,567,396	80%	45%	66,073
SAMN35672751	NA	midgut (Bombyx mori, SAMN35672751)	58,514,782	78%	43%	67,698
SAMN35672752	NA	midgut (Bombyx mori, SAMN35672752)	52,388,054	81%	45%	65,644
SAMN35672753	NA	midgut (Bombyx mori, SAMN35672753)	45,214,514	79%	39%	50,133
SAMN35672754	NA	midgut (Bombyx mori, SAMN35672754)	45,626,162	80%	41%	59,880
SAMN35672755	NA	midgut (Bombyx mori, SAMN35672755)	51,514,796	79%	40%	62,397
SAMN35672756	NA	midgut (Bombyx mori, SAMN35672756)	54,009,396	76%	30%	62,651
SAMN35672757	NA	midgut (Bombyx mori, SAMN35672757)	49,876,556	81%	44%	68,445
SAMN35672758	NA	midgut (Bombyx mori, SAMN35672758)	54,791,680	79%	37%	60,335
SAMN36719181	NA	Silk gland (Bombyx mori, SAMN36719181)	54,795,286	73%	42%	70,474
SAMN36719182	NA	Silk gland (Bombyx mori, SAMN36719182)	55,492,440	72%	41%	70,591
SAMN36719183	NA	Silk gland (Bombyx mori, SAMN36719183)	59,691,038	72%	41%	70,865
SAMN36719184	NA	Silk gland (Bombyx mori, SAMN36719184)	63,721,280	75%	38%	70,113
SAMN36719185	NA	Silk gland (Bombyx mori, SAMN36719185)	54,862,670	74%	39%	69,601
SAMN36719186	NA	Silk gland (Bombyx mori, SAMN36719186)	58,940,334	75%	38%	70,006
SAMN36719187	NA	Silk gland (Bombyx mori, SAMN36719187)	55,378,608	77%	40%	68,997
SAMN36719188	NA	Silk gland (Bombyx mori, SAMN36719188)	58,142,882	77%	40%	69,209
SAMN36719189	NA	Silk gland (Bombyx mori, SAMN36719189)	57,560,334	78%	40%	69,054
SAMN36871147	NA	midgut (Bombyx mori, 5th, SAMN36871147)	47,748,912	88%	46%	68,183
SAMN36871149	NA	midgut (Bombyx mori, 5th, SAMN36871149)	42,933,292	87%	51%	65,942
SAMN36871151	NA	midgut (Bombyx mori, 5th, SAMN36871151)	59,453,582	88%	49%	70,779
SAMN36871153	NA	midgut (Bombyx mori, 5th, SAMN36871153)	48,386,032	88%	52%	65,167
SAMN36871155	NA	midgut (Bombyx mori, 5th, SAMN36871155)	48,045,684	87%	48%	68,765
SAMN36871157	NA	midgut (Bombyx mori, 5th, SAMN36871157)	46,742,290	86%	45%	71,324
SAMN37233678	37752953	ovary (Bombyx mori, SAMN37233678)	22,636,718	91%	41%	72,495
SAMN37233679	37752953	ovary (Bombyx mori, SAMN37233679)	22,022,363	93%	43%	68,397
SAMN37233680	37752953	ovary (Bombyx mori, SAMN37233680)	22,044,867	93%	41%	68,241
SAMN37233681	37752953	ovary (Bombyx mori, SAMN37233681)	22,293,704	93%	42%	68,893
SAMN37233682	37752953	ovary (Bombyx mori, SAMN37233682)	20,969,552	93%	40%	68,521
SAMN37233683	37752953	ovary (Bombyx mori, SAMN37233683)	21,848,927	93%	43%	70,044
SAMN37233684	37752953	ovary (Bombyx mori, SAMN37233684)	23,209,872	91%	43%	71,423
SAMN37233685	37752953	ovary (Bombyx mori, SAMN37233685)	24,364,368	92%	44%	73,233
SAMN37233686	37752953	ovary (Bombyx mori, SAMN37233686)	26,257,592	90%	41%	73,832
SAMN37233687	37752953	ovary (Bombyx mori, SAMN37233687)	23,069,750	91%	42%	74,209
SAMN37233688	37752953	ovary (Bombyx mori, SAMN37233688)	21,285,739	90%	40%	62,655
SAMN37233689	37752953	ovary (Bombyx mori, SAMN37233689)	24,959,602	91%	43%	77,436

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR384268	DRX370144	DRP009116	SAMD00506766	39,885,710	79%	38%
DRR384269	DRX370145	DRP009116	SAMD00506767	38,795,670	75%	41%
DRR384270	DRX370146	DRP009116	SAMD00506768	45,288,676	73%	38%
DRR384271	DRX370147	DRP009116	SAMD00506769	48,918,064	78%	36%
DRR384272	DRX370148	DRP009116	SAMD00506770	56,581,696	78%	49%
DRR384273	DRX370149	DRP009116	SAMD00506771	42,028,122	80%	29%
DRR384274	DRX370150	DRP009116	SAMD00506772	40,563,848	81%	49%
DRR384275	DRX370151	DRP009116	SAMD00506773	42,257,296	76%	37%
DRR384276	DRX370152	DRP009116	SAMD00506774	47,331,412	80%	37%
DRR384277	DRX370153	DRP009116	SAMD00506775	37,401,666	79%	38%
DRR384278	DRX370154	DRP009116	SAMD00506776	35,175,928	81%	37%
DRR384279	DRX370155	DRP009116	SAMD00506777	42,155,964	75%	41%
DRR384280	DRX370156	DRP009116	SAMD00506778	36,506,690	74%	37%
DRR384281	DRX370157	DRP009116	SAMD00506779	40,630,934	78%	36%
DRR384282	DRX370158	DRP009116	SAMD00506780	48,801,442	77%	49%
DRR384283	DRX370159	DRP009116	SAMD00506781	35,604,522	81%	30%
DRR384284	DRX370160	DRP009116	SAMD00506782	42,543,900	82%	43%
DRR384285	DRX370161	DRP009116	SAMD00506783	37,197,106	76%	37%
DRR384286	DRX370162	DRP009116	SAMD00506784	34,425,902	80%	37%
DRR384287	DRX370163	DRP009116	SAMD00506785	36,998,012	80%	37%
DRR384288	DRX370164	DRP009116	SAMD00506786	45,775,886	79%	38%
DRR384289	DRX370165	DRP009116	SAMD00506787	39,049,890	75%	41%
DRR384290	DRX370166	DRP009116	SAMD00506788	55,658,206	72%	39%
DRR384291	DRX370167	DRP009116	SAMD00506789	53,003,350	79%	36%
DRR384292	DRX370168	DRP009116	SAMD00506790	36,673,608	77%	49%
DRR384293	DRX370169	DRP009116	SAMD00506791	35,314,954	81%	29%
DRR384294	DRX370170	DRP009116	SAMD00506792	40,971,400	82%	49%
DRR384295	DRX370171	DRP009116	SAMD00506793	44,800,698	76%	34%
DRR384296	DRX370172	DRP009116	SAMD00506794	40,776,322	80%	37%
DRR384297	DRX370173	DRP009116	SAMD00506795	40,141,466	80%	38%
DRR492946	DRX477145	DRP010247	SAMD00631318	21,476,966	64%	13%
DRR492947	DRX477146	DRP010247	SAMD00631319	19,973,349	61%	13%
DRR492948	DRX477147	DRP010247	SAMD00631320	15,512,445	61%	13%
DRR492949	DRX477148	DRP010247	SAMD00631321	14,271,374	60%	15%
DRR492950	DRX477149	DRP010247	SAMD00631322	19,880,625	60%	14%
DRR492951	DRX477150	DRP010247	SAMD00631323	26,262,301	61%	14%
SRR14134621	SRX10504550	SRP313218	SAMN18603838	58,001,004	91%	30%
SRR14134620	SRX10504551	SRP313218	SAMN18603839	53,382,676	91%	47%
SRR14134613	SRX10504558	SRP313218	SAMN18603840	56,133,626	90%	37%
SRR14134612	SRX10504559	SRP313218	SAMN18603841	55,400,324	88%	33%
SRR14134611	SRX10504560	SRP313218	SAMN18603842	52,296,792	86%	36%
SRR14134610	SRX10504561	SRP313218	SAMN18603843	56,628,376	92%	32%
SRR14134609	SRX10504562	SRP313218	SAMN18603844	51,124,350	85%	43%
SRR14134608	SRX10504563	SRP313218	SAMN18603845	58,143,876	82%	39%
SRR14134607	SRX10504564	SRP313218	SAMN18603846	58,679,904	86%	37%
SRR14134606	SRX10504565	SRP313218	SAMN18603847	56,360,866	88%	34%
SRR14134619	SRX10504552	SRP313218	SAMN18603848	54,633,964	90%	38%
SRR14134618	SRX10504553	SRP313218	SAMN18603849	50,663,632	86%	34%
SRR14134617	SRX10504554	SRP313218	SAMN18603850	53,808,646	74%	43%
SRR14134616	SRX10504555	SRP313218	SAMN18603851	55,497,260	90%	38%
SRR14134615	SRX10504556	SRP313218	SAMN18603852	56,686,020	90%	35%
SRR14134614	SRX10504557	SRP313218	SAMN18603853	52,618,846	90%	38%
SRR19279539	SRX15339805	SRP375991	SAMN28535297	53,688,380	85%	48%
SRR19279538	SRX15339806	SRP375991	SAMN28535298	52,235,536	84%	46%
SRR19279537	SRX15339807	SRP375991	SAMN28535299	49,275,636	85%	47%
SRR19279536	SRX15339808	SRP375991	SAMN28535300	48,649,046	85%	47%
SRR19279535	SRX15339809	SRP375991	SAMN28535301	50,566,774	84%	46%
SRR19279534	SRX15339810	SRP375991	SAMN28535302	56,010,756	86%	46%
SRR22514248	SRX18478670	SRP411146	SAMN32018985	47,488,424	79%	44%
SRR22514247	SRX18478671	SRP411146	SAMN32018986	47,445,542	81%	44%
SRR22514246	SRX18478672	SRP411146	SAMN32018987	46,265,884	80%	44%
SRR22514245	SRX18478673	SRP411146	SAMN32018988	42,386,092	79%	47%
SRR22514244	SRX18478674	SRP411146	SAMN32018989	45,993,854	82%	47%
SRR22514243	SRX18478675	SRP411146	SAMN32018990	42,483,856	82%	47%
SRR22514242	SRX18478676	SRP411146	SAMN32018991	49,769,268	78%	46%
SRR22514241	SRX18478677	SRP411146	SAMN32018992	49,033,332	79%	45%
SRR22514240	SRX18478678	SRP411146	SAMN32018993	46,052,374	80%	46%
SRR24907600	SRX20669057	SRP443126	SAMN35672750	58,567,396	80%	45%
SRR24907599	SRX20669058	SRP443126	SAMN35672751	58,514,782	78%	43%
SRR24907598	SRX20669059	SRP443126	SAMN35672752	52,388,054	81%	45%
SRR24907597	SRX20669060	SRP443126	SAMN35672753	45,214,514	79%	39%
SRR24907596	SRX20669061	SRP443126	SAMN35672754	45,626,162	80%	41%
SRR24907595	SRX20669062	SRP443126	SAMN35672755	51,514,796	79%	40%
SRR24907594	SRX20669063	SRP443126	SAMN35672756	54,009,396	76%	30%
SRR24907593	SRX20669064	SRP443126	SAMN35672757	49,876,556	81%	44%
SRR24907592	SRX20669065	SRP443126	SAMN35672758	54,791,680	79%	37%
SRR25433531	SRX21167733	SRP451653	SAMN36719181	54,795,286	73%	42%
SRR25433532	SRX21167732	SRP451653	SAMN36719182	55,492,440	72%	41%
SRR25433533	SRX21167731	SRP451653	SAMN36719183	59,691,038	72%	41%
SRR25433534	SRX21167730	SRP451653	SAMN36719184	63,721,280	75%	38%
SRR25433535	SRX21167729	SRP451653	SAMN36719185	54,862,670	74%	39%
SRR25433536	SRX21167728	SRP451653	SAMN36719186	58,940,334	75%	38%
SRR25433528	SRX21167736	SRP451653	SAMN36719187	55,378,608	77%	40%
SRR25433529	SRX21167735	SRP451653	SAMN36719188	58,142,882	77%	40%
SRR25433530	SRX21167734	SRP451653	SAMN36719189	57,560,334	78%	40%
SRR25551244	SRX21280440	SRP453749	SAMN36871147	47,748,912	88%	46%
SRR25551242	SRX21280442	SRP453749	SAMN36871149	42,933,292	87%	51%
SRR25551241	SRX21280443	SRP453749	SAMN36871151	59,453,582	88%	49%
SRR25551240	SRX21280444	SRP453749	SAMN36871153	48,386,032	88%	52%
SRR25551239	SRX21280445	SRP453749	SAMN36871155	48,045,684	87%	48%
SRR25551243	SRX21280441	SRP453749	SAMN36871157	46,742,290	86%	45%
SRR25875463	SRX21596114	SRP458048	SAMN37233678	22,636,718	91%	41%
SRR25875464	SRX21596113	SRP458048	SAMN37233679	22,022,363	93%	43%
SRR25875465	SRX21596112	SRP458048	SAMN37233680	22,044,867	93%	41%
SRR25875466	SRX21596111	SRP458048	SAMN37233681	22,293,704	93%	42%
SRR25875470	SRX21596110	SRP458048	SAMN37233682	20,969,552	93%	40%
SRR25875467	SRX21596109	SRP458048	SAMN37233683	21,848,927	93%	43%
SRR25875468	SRX21596108	SRP458048	SAMN37233684	23,209,872	91%	43%
SRR25875469	SRX21596107	SRP458048	SAMN37233685	24,364,368	92%	44%
SRR25875471	SRX21596106	SRP458048	SAMN37233686	26,257,592	90%	41%
SRR25875472	SRX21596105	SRP458048	SAMN37233687	23,069,750	91%	42%
SRR25875473	SRX21596104	SRP458048	SAMN37233688	21,285,739	90%	40%
SRR25875474	SRX21596103	SRP458048	SAMN37233689	24,959,602	91%	43%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	243704859	235922960 (96.80%)	181380482 (74.42%)	93.98	88.35
DRR453261	SAMD00588812	10276258	9387974 (91.35%)	5538578 (53.89%)	90.15	87.49
SRR19628959	SAMN28985739	38089398	37035363 (97.23%)	29055745 (76.28%)	94.24	87.88
SRR19628960	SAMN28985738	26974498	26210591 (97.16%)	20384902 (75.57%)	94.27	88.66
SRR19628961	SAMN28985737	30343004	29413858 (96.93%)	22713648 (74.85%)	94.13	88.66
SRR19628962	SAMN28985736	30518025	29556899 (96.85%)	22907893 (75.06%)	94.21	88.36
SRR19628963	SAMN28985735	25677112	25029097 (97.47%)	19552872 (76.14%)	94.26	88.74
SRR19628964	SAMN28985734	24637908	23969119 (97.28%)	18637610 (75.64%)	94.28	88.64
SRR19628965	SAMN28985733	25496865	24668040 (96.74%)	18955878 (74.34%)	94.15	88.62
SRR19628966	SAMN28985732	31691791	30652019 (96.71%)	23633356 (74.57%)	94.2	87.98

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Insecta GenBank	132,991	103,551 (77.86%)	103,551 (77.86%)	70.13%	73.09%
Insecta known RefSeq (NP_)	37,964	26,232 (69.10%)	26,232 (69.10%)	65.62%	58.81%
Same-species GenBank	3,497	3,455 (98.80%)	3,455 (98.80%)	86.19%	91.21%
Same-species known RefSeq (NP_)	2,391	2,371 (99.16%)	2,371 (99.16%)	85.18%	89.70%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments.

First Pass	Total
ASM3026992v2 (Current) Coverage: 96.34%	ASM3026992v2 (Current) Coverage: 97.59%
Bmori_2016v1.0 (Previous) Coverage: 96.62%	Bmori_2016v1.0 (Previous) Coverage: 98.44%
Percent Identity: 99.10%	Percent Identity: 99.06%

Comparison of the current and previous annotations

The annotations produced for this release were compared to the annotations in the previous release for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	ASM3026992v2 (Current) to Bmori_2016v1.0 (Previous)
Identical	15%
Minor changes	56%
Major changes	10%
New	18%
Deprecated	12%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences