NCBI Drosophila simulans Annotation Release 103

The RefSeq genome records for Drosophila simulans were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Drosophila simulans Annotation Release 103

Annotation release ID: 103
Date of Entrez queries for transcripts and proteins: Oct 28 2021
Date of submission of annotation to the public databases: Nov 3 2021
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
Prin_Dsim_3.1	GCF_016746395.2	Princeton University	10-05-2021	Reference	6 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	Prin_Dsim_3.1
Genes and pseudogenes	15,845
protein-coding	14,256
non-coding	1,432
Transcribed pseudogenes	3
Non-transcribed pseudogenes	154
genes with variants	4,441
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	26,225
fully-supported	25,925
with > 5% ab initio	162
partial	31
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	26,225
non-coding RNAs	2,562
fully-supported	1,914
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,273
pseudo transcripts	3
fully-supported	3
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	3
CDSs	26,225
fully-supported	25,925
with > 5% ab initio	174
partial	31
with major correction(s)	147
known RefSeq (NP_)	0
model RefSeq (XP_)	26,225

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	15,688	5,889	2,017	46	530,661
All transcripts	28,787	3,080	2,157	46	59,419
mRNA	26,225	3,196	2,260	93	59,419
misc_RNA	652	3,415	2,521	110	16,979
tRNA	289	74	73	70	87
lncRNA	1,269	1,812	1,289	132	16,458
snoRNA	82	106	87	46	315
snRNA	34	149	165	65	271
guide_RNA	5	151	142	137	178
rRNA	231	1,236	179	118	3,970
Single-exon transcripts	2,313	1,223	999	93	12,684
coding transcripts (NM_/XM_ )	2,313	1,223	999	93	12,684
CDSs	26,225	2,238	1,527	93	56,001
Exons	77,295	514	278	1	18,101
in coding transcripts (NM_/XM_ )	73,401	510	278	1	18,101
in non-coding transcripts (NR_/XR_ )	5,997	501	239	3	15,233
Introns	59,269	1,507	91	30	175,284
in coding transcripts (NM_/XM_ )	56,860	1,500	90	30	175,284
in non-coding transcripts (NR_/XR_ )	4,460	1,792	116	34	175,284

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.85	1	1	50
Number of exons per transcript	6.47	5	1	79

BUSCO analysis of gene annotation

BUSCO v4.1.4 (Simão et al 2015, PMID: 26059717) was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the diptera_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation (C:complete [S:single-copy, D:duplicated], F:fragmented, M:missing, n:number of genes used).

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Drosophila melanogaster known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 14256 coding genes, 13914 genes had a protein with an alignment covering 50% or more of the query and 13068 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Drosophila melanogaster known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker and RepeatMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
Prin_Dsim_3.1	GCF_016746395.2		22.68%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	1,343	1,332 (99.18%)	1,320 (98.29%)	99.08%	99.90%
Same-species EST	118,739	113,969 (95.98%)	109,494 (92.21%)	98.96%	99.46%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	6,172,327,162	75%	14%	77,853
SAMD00115811	NA	abdominal tip (Drosophila simulans, male, SAMD00115811)	72,493,740	88%	12%	45,651
SAMD00115812	NA	abdominal tip (Drosophila simulans, male, SAMD00115812)	75,230,314	88%	12%	45,472
SAMD00115813	NA	abdominal tip (Drosophila simulans, male, SAMD00115813)	64,829,404	88%	12%	45,721
SAMD00115814	NA	abdominal tip (Drosophila simulans, male, SAMD00115814)	65,816,134	89%	13%	45,525
SAMD00115815	NA	abdominal tip (Drosophila simulans, male, SAMD00115815)	77,315,346	87%	13%	46,191
SAMD00115816	NA	abdominal tip (Drosophila simulans, male, SAMD00115816)	39,047,752	89%	14%	44,035
SAMN02179238	24515119	Whole fly (Drosophila simulans, 2d, female, SAMN02179238)	36,013,346	39%	5%	34,811
SAMN02593614	24651406	ovary (Drosophila simulans, female, SAMN02593614)	37,295,665	85%	5%	33,925
SAMN02593615	24651406	ovary (Drosophila simulans, female, SAMN02593615)	38,153,168	86%	5%	33,926
SAMN02593616	24651406	ovary (Drosophila simulans, female, SAMN02593616)	44,430,048	78%	5%	34,203
SAMN02593617	24651406	ovary (Drosophila simulans, female, SAMN02593617)	37,669,083	84%	5%	33,597
SAMN02593618	24651406	ovary (Drosophila simulans, female, SAMN02593618)	37,838,101	84%	5%	33,649
SAMN02593619	24651406	ovary (Drosophila simulans, female, SAMN02593619)	39,354,961	83%	5%	33,742
SAMN02593628	24651406	whole_animal (Drosophila simulans, male, SAMN02593628)	17,669,464	74%	12%	37,914
SAMN02593629	24651406	whole_animal (Drosophila simulans, male, SAMN02593629)	23,246,647	73%	12%	39,414
SAMN02713493	NA	Whole body (Drosophila simulans, Adult, female, SAMN02713493)	369,512,350	80%	16%	65,669
SAMN02934562	NA	eye-antennal imaginal disc (Drosophila simulans, SAMN02934562)	13,140,413	53%	5%	31,442
SAMN02934571	NA	wing imaginal disc (Drosophila simulans, SAMN02934571)	10,302,158	81%	5%	30,810
SAMN02934580	NA	brain (Drosophila simulans, SAMN02934580)	15,137,242	77%	5%	38,708
SAMN03067555	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067555)	61,626,294	86%	11%	57,764
SAMN03067556	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067556)	63,247,458	86%	11%	57,563
SAMN03067557	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067557)	58,902,114	84%	11%	56,520
SAMN03067558	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067558)	56,045,466	87%	12%	56,833
SAMN03067559	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067559)	53,232,010	86%	11%	56,645
SAMN03067560	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067560)	58,651,590	86%	12%	56,984
SAMN03067561	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067561)	58,155,902	86%	11%	57,816
SAMN03067562	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067562)	52,472,170	87%	11%	56,890
SAMN03067563	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067563)	50,898,584	81%	11%	55,961
SAMN03067564	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067564)	68,073,070	87%	11%	58,159
SAMN03067565	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067565)	75,661,706	85%	11%	58,887
SAMN03067566	25950438	Whole body (Drosophila simulans, 48 hours after eclosion, male, SAMN03067566)	63,029,540	86%	11%	57,489
SAMN03465302	NA	whole organism (Drosophila simulans, pooled male and female, SAMN03465302)	92,949,544	56%	13%	58,076
SAMN03481968	26430061	antenna (Drosophila simulans, within 1hr of eclosion, male, SAMN03481968)	56,450,834	77%	13%	50,022
SAMN03481969	26430061	antenna (Drosophila simulans, within 1hr of eclosion, male, SAMN03481969)	52,016,642	77%	13%	49,382
SAMN03481970	26430061	antenna (Drosophila simulans, within 1hr of eclosion, male, SAMN03481970)	54,276,344	79%	13%	49,787
SAMN03481971	26430061	antenna (Drosophila simulans, within 1hr of eclosion, female, SAMN03481971)	76,681,368	80%	14%	49,070
SAMN03481972	26430061	antenna (Drosophila simulans, within 1hr of eclosion, female, SAMN03481972)	70,491,240	80%	14%	48,551
SAMN03481973	26430061	antenna (Drosophila simulans, within 1hr of eclosion, female, SAMN03481973)	73,661,726	81%	14%	48,981
SAMN04112806	NA	ovaries (Drosophila simulans, SAMN04112806)	126,248,932	76%	16%	42,829
SAMN04112807	NA	ovaries (Drosophila simulans, SAMN04112807)	137,006,732	78%	16%	43,657
SAMN04112808	NA	ovaries (Drosophila simulans, SAMN04112808)	130,942,616	81%	15%	44,872
SAMN04112809	NA	ovaries (Drosophila simulans, SAMN04112809)	176,309,186	84%	15%	44,494
SAMN04112810	NA	ovaries (Drosophila simulans, SAMN04112810)	160,362,164	85%	15%	45,863
SAMN04361918	27220689	eye-antennal imaginal disc (Drosophila simulans, female, SAMN04361918)	94,988,342	87%	14%	48,742
SAMN04361919	27220689	eye-antennal imaginal disc (Drosophila simulans, female, SAMN04361919)	119,486,598	84%	14%	50,784
SAMN04361920	27220689	eye-antennal imaginal disc (Drosophila simulans, female, SAMN04361920)	147,975,110	79%	13%	50,968
SAMN05645889	NA	D. sim female 1 (Drosophila simulans, SAMN05645889)	2,020,834	92%	15%	196
SAMN05645893	NA	D. sim male 2 (Drosophila simulans, SAMN05645893)	1,406,852	93%	14%	174
SAMN05645894	NA	D. sim male 1 (Drosophila simulans, SAMN05645894)	1,541,210	92%	15%	170
SAMN05645895	NA	D. sim female 2 (Drosophila simulans, SAMN05645895)	4,726,342	95%	14%	202
SAMN05954844	29260710	Dsim_05 (Drosophila simulans, SAMN05954844)	8,883,190	72%	18%	9,873
SAMN05954845	29260710	Dsim_04 (Drosophila simulans, SAMN05954845)	21,309,126	67%	17%	15,458
SAMN05954846	29260710	Dsim_03 (Drosophila simulans, SAMN05954846)	14,779,996	74%	15%	12,979
SAMN05954847	29260710	Dsim_02 (Drosophila simulans, SAMN05954847)	11,299,658	74%	14%	14,224
SAMN05954848	29260710	Dsim_01 (Drosophila simulans, SAMN05954848)	30,527,098	73%	15%	18,128
SAMN05954855	29260710	Dsim_14 (Drosophila simulans, SAMN05954855)	5,405,836	69%	22%	9,866
SAMN05954856	29260710	Dsim_13 (Drosophila simulans, SAMN05954856)	1,130,880	63%	21%	3,761
SAMN05954857	29260710	Dsim_12 (Drosophila simulans, SAMN05954857)	6,189,934	70%	19%	10,367
SAMN05954858	29260710	Dsim_11 (Drosophila simulans, SAMN05954858)	11,326,784	71%	20%	13,997
SAMN05954859	29260710	Dsim_10 (Drosophila simulans, SAMN05954859)	51,899,670	68%	19%	25,611
SAMN05954860	29260710	Dsim_09 (Drosophila simulans, SAMN05954860)	45,265,654	73%	17%	23,185
SAMN05954861	29260710	Dsim_08 (Drosophila simulans, SAMN05954861)	7,634,394	79%	17%	10,126
SAMN05954862	29260710	Dsim_07 (Drosophila simulans, SAMN05954862)	26,537,658	71%	18%	17,640
SAMN05954863	29260710	Dsim_06 (Drosophila simulans, SAMN05954863)	5,322,294	67%	18%	8,399
SAMN05954864	29260710	Dsim_19 (Drosophila simulans, SAMN05954864)	4,280,630	73%	35%	6,511
SAMN05954865	29260710	Dsim_18 (Drosophila simulans, SAMN05954865)	5,645,720	69%	40%	8,091
SAMN05954866	29260710	Dsim_17 (Drosophila simulans, SAMN05954866)	29,525,720	73%	33%	19,412
SAMN05954867	29260710	Dsim_16 (Drosophila simulans, SAMN05954867)	8,447,570	74%	37%	9,884
SAMN05954870	29260710	Dsim_15 (Drosophila simulans, SAMN05954870)	3,568,130	70%	27%	6,955
SAMN05954871	29260710	Dsim_22 (Drosophila simulans, SAMN05954871)	8,222,462	73%	32%	8,596
SAMN05954872	29260710	Dsim_21 (Drosophila simulans, SAMN05954872)	7,068,212	73%	32%	8,340
SAMN05954873	29260710	Dsim_20 (Drosophila simulans, SAMN05954873)	3,483,996	73%	31%	5,938
SAMN06318509	28695823	Dsim-ref-2 (Drosophila simulans, SAMN06318509)	23,514,676	54%	15%	34,032
SAMN06318510	28695823	Dsim-ref-1 (Drosophila simulans, SAMN06318510)	51,795,326	83%	14%	42,323
SAMN07447608	30383747	simXsim_cyc14C_sl21 (Drosophila simulans, SAMN07447608)	15,968,908	54%	9%	24,435
SAMN07447618	30383747	simXsim_cyc14C_sl20 (Drosophila simulans, SAMN07447618)	20,043,084	60%	10%	25,930
SAMN07447619	30383747	simXsim_cyc14C_sl19 (Drosophila simulans, SAMN07447619)	503,586	47%	6%	5,156
SAMN07447620	30383747	simXsim_cyc14C_sl18 (Drosophila simulans, SAMN07447620)	16,061,016	70%	9%	24,639
SAMN07447621	30383747	simXsim_cyc14C_sl17 (Drosophila simulans, SAMN07447621)	13,942,014	77%	9%	25,014
SAMN07447622	30383747	simXsim_cyc14C_sl16 (Drosophila simulans, SAMN07447622)	16,390,312	71%	9%	24,940
SAMN07447623	30383747	simXsim_cyc14C_sl15 (Drosophila simulans, SAMN07447623)	13,471,222	56%	9%	23,964
SAMN07447624	30383747	simXsim_cyc14C_sl14 (Drosophila simulans, SAMN07447624)	20,808,318	54%	9%	25,889
SAMN07447625	30383747	simXsim_cyc14C_sl13 (Drosophila simulans, SAMN07447625)	20,841,918	56%	9%	25,498
SAMN07447626	30383747	simXsim_cyc14C_sl12 (Drosophila simulans, SAMN07447626)	15,150,704	68%	10%	25,103
SAMN07447627	30383747	simXsim_cyc14C_sl11 (Drosophila simulans, SAMN07447627)	11,338,150	66%	9%	23,434
SAMN07447628	30383747	simXsim_cyc14C_sl10 (Drosophila simulans, SAMN07447628)	12,569,270	77%	9%	24,360
SAMN07447629	30383747	simXsim_cyc14C_sl09 (Drosophila simulans, SAMN07447629)	15,366,066	64%	9%	24,987
SAMN07447630	30383747	simXsim_cyc14C_sl08 (Drosophila simulans, SAMN07447630)	8,510,442	63%	9%	22,549
SAMN07447631	30383747	simXsim_cyc14C_sl07 (Drosophila simulans, SAMN07447631)	10,222,700	61%	9%	22,783
SAMN07447632	30383747	simXsim_cyc14C_sl06 (Drosophila simulans, SAMN07447632)	10,556,752	66%	9%	23,745
SAMN07447633	30383747	simXsim_cyc14C_sl05 (Drosophila simulans, SAMN07447633)	16,487,758	68%	9%	25,489
SAMN07447634	30383747	simXsim_cyc14C_sl04 (Drosophila simulans, SAMN07447634)	15,629,498	54%	9%	24,063
SAMN07447635	30383747	simXsim_cyc14C_sl03 (Drosophila simulans, SAMN07447635)	12,654,666	78%	9%	23,885
SAMN07447636	30383747	simXsim_cyc14C_sl02 (Drosophila simulans, SAMN07447636)	8,561,906	57%	9%	22,217
SAMN07447637	30383747	simXsim_cyc14C_sl01 (Drosophila simulans, SAMN07447637)	9,955,642	72%	9%	23,298
SAMN07447650	30383747	simXsim_cyc14C_sl27 (Drosophila simulans, SAMN07447650)	16,711,952	61%	9%	24,817
SAMN07447651	30383747	simXsim_cyc14C_sl26 (Drosophila simulans, SAMN07447651)	15,712,506	47%	10%	23,629
SAMN07447652	30383747	simXsim_cyc14C_sl25 (Drosophila simulans, SAMN07447652)	10,926,312	58%	9%	22,929
SAMN07447653	30383747	simXsim_cyc14C_sl24 (Drosophila simulans, SAMN07447653)	10,363,606	73%	9%	23,412
SAMN07447654	30383747	simXsim_cyc14C_sl23 (Drosophila simulans, SAMN07447654)	12,564,474	47%	9%	22,743
SAMN07447655	30383747	simXsim_cyc14C_sl22 (Drosophila simulans, SAMN07447655)	13,981,008	71%	9%	22,556
SAMN12261853	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261853)	37,555,116	75%	19%	47,645
SAMN12261854	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261854)	29,666,692	76%	19%	46,286
SAMN12261855	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261855)	40,594,386	77%	19%	48,306
SAMN12261856	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261856)	30,397,292	78%	19%	46,552
SAMN12261857	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261857)	38,099,520	78%	19%	47,777
SAMN12261858	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261858)	30,267,692	79%	19%	46,593
SAMN12261859	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261859)	73,757,884	70%	19%	48,831
SAMN12261860	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261860)	65,976,396	75%	20%	50,579
SAMN12261861	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261861)	61,909,354	76%	20%	50,513
SAMN12261919	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261919)	66,012,596	80%	19%	51,715
SAMN12261920	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261920)	70,512,746	81%	18%	51,831
SAMN12261921	32554780	brain (Drosophila simulans, 4-day-old, female, SAMN12261921)	62,563,728	79%	18%	50,783
SAMN12261922	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261922)	84,146,658	75%	20%	52,782
SAMN12261923	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261923)	67,924,298	74%	19%	51,212
SAMN12261924	32554780	brain (Drosophila simulans, 4-day-old, male, SAMN12261924)	81,585,160	75%	20%	52,205
SAMN12915177	NA	accessory gland (Drosophila simulans, male, SAMN12915177)	41,460,846	79%	14%	39,807
SAMN12915188	NA	head (Drosophila simulans, female, SAMN12915188)	70,629,484	84%	14%	51,587
SAMN12915189	NA	head (Drosophila simulans, female, SAMN12915189)	99,816,528	85%	14%	53,598
SAMN12915201	NA	whole body (Drosophila simulans, male and female, SAMN12915201)	85,797,964	85%	15%	52,646
SAMN12915202	NA	whole body (Drosophila simulans, male and female, SAMN12915202)	55,110,586	84%	15%	50,849
SAMN12915214	NA	salivary gland (Drosophila simulans, male and female, SAMN12915214)	69,733,286	78%	19%	36,268
SAMN12915215	NA	salivary gland (Drosophila simulans, male and female, SAMN12915215)	76,339,672	72%	17%	37,878
SAMN15927682	NA	Ovaries (Drosophila simulans, female, SAMN15927682)	36,075,582	62%	22%	30,925
SAMN19655201	NA	testes (Drosophila simulans, 3-5d, male, SAMN19655201)	624,001,436	45%	17%	61,441

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR128292	DRX121035	DRP005373	SAMD00115811	72,493,740	88%	12%
DRR128293	DRX121036	DRP005373	SAMD00115812	75,230,314	88%	12%
DRR128294	DRX121037	DRP005373	SAMD00115813	64,829,404	88%	12%
DRR128295	DRX121038	DRP005373	SAMD00115814	65,816,134	89%	13%
DRR128296	DRX121039	DRP005373	SAMD00115815	77,315,346	87%	13%
DRR128297	DRX121040	DRP005373	SAMD00115816	39,047,752	89%	14%
SRR869573	SRX287392	SRP023274	SAMN02179238	15,739,368	34%	5%
SRR869575	SRX287392	SRP023274	SAMN02179238	20,273,978	44%	5%
SRR1159284	SRX463183	SRP036420	SAMN02593614	37,295,665	85%	5%
SRR1159513	SRX463340	SRP036420	SAMN02593615	38,153,168	86%	5%
SRR1159912	SRX463661	SRP036420	SAMN02593616	44,430,048	78%	5%
SRR1159987	SRX463725	SRP036420	SAMN02593617	37,669,083	84%	5%
SRR1159988	SRX463726	SRP036420	SAMN02593618	37,838,101	84%	5%
SRR1159990	SRX463727	SRP036420	SAMN02593619	39,354,961	83%	5%
SRR1161127	SRX464651	SRP036420	SAMN02593628	17,669,464	74%	12%
SRR1161139	SRX464704	SRP036420	SAMN02593629	23,246,647	73%	12%
SRR1212716	SRX506978	SRP040757	SAMN02713493	369,512,350	80%	16%
SRR1523558	SRX660278	SRP044761	SAMN02934562	13,140,413	53%	5%
SRR1523567	SRX660287	SRP044761	SAMN02934571	10,302,158	81%	5%
SRR1523576	SRX660296	SRP044761	SAMN02934580	15,137,242	77%	5%
SRR1576517	SRX702216	SRP047141	SAMN03067555	61,626,294	86%	11%
SRR1576518	SRX702217	SRP047141	SAMN03067556	63,247,458	86%	11%
SRR1576519	SRX702218	SRP047141	SAMN03067557	58,902,114	84%	11%
SRR1576520	SRX702219	SRP047141	SAMN03067558	56,045,466	87%	12%
SRR1576522	SRX702221	SRP047141	SAMN03067559	53,232,010	86%	11%
SRR1576523	SRX702222	SRP047141	SAMN03067560	58,651,590	86%	12%
SRR1576524	SRX702223	SRP047141	SAMN03067561	58,155,902	86%	11%
SRR1576525	SRX702224	SRP047141	SAMN03067562	52,472,170	87%	11%
SRR1576526	SRX702225	SRP047141	SAMN03067563	50,898,584	81%	11%
SRR1576527	SRX702226	SRP047141	SAMN03067564	68,073,070	87%	11%
SRR1576528	SRX702227	SRP047141	SAMN03067565	75,661,706	85%	11%
SRR1576529	SRX702228	SRP047141	SAMN03067566	63,029,540	86%	11%
SRR1956911	SRX982352	SRP056962	SAMN03465302	92,949,544	56%	13%
SRR1973492	SRX994777	SRP057154	SAMN03481968	56,450,834	77%	13%
SRR1973493	SRX994778	SRP057154	SAMN03481969	52,016,642	77%	13%
SRR1973494	SRX994779	SRP057154	SAMN03481970	54,276,344	79%	13%
SRR1973495	SRX994780	SRP057154	SAMN03481971	76,681,368	80%	14%
SRR1973496	SRX994781	SRP057154	SAMN03481972	70,491,240	80%	14%
SRR1973497	SRX994782	SRP057154	SAMN03481973	73,661,726	81%	14%
SRR2568366	SRX1287831	SRP064193	SAMN04112806	65,259,274	77%	16%
SRR2644786	SRX1287831	SRP064193	SAMN04112806	60,989,658	75%	16%
SRR2568367	SRX1287832	SRP064193	SAMN04112807	71,648,994	78%	16%
SRR2644787	SRX1287832	SRP064193	SAMN04112807	65,357,738	79%	16%
SRR2568365	SRX1287833	SRP064193	SAMN04112808	74,158,274	80%	15%
SRR2644782	SRX1287833	SRP064193	SAMN04112808	56,784,342	81%	15%
SRR2568374	SRX1287834	SRP064193	SAMN04112809	74,453,842	80%	15%
SRR2644790	SRX1287834	SRP064193	SAMN04112809	101,855,344	87%	15%
SRR2568364	SRX1287843	SRP064193	SAMN04112810	83,871,434	84%	15%
SRR2644779	SRX1287843	SRP064193	SAMN04112810	76,490,730	85%	15%
SRR3045231	SRX1496474	SRP067685	SAMN04361918	94,988,342	87%	14%
SRR3045232	SRX1496475	SRP067685	SAMN04361919	119,486,598	84%	14%
SRR3082394	SRX1496477	SRP067685	SAMN04361920	147,975,110	79%	13%
SRR4064364	SRX2053350	SRP082966	SAMN05645889	2,020,834	92%	15%
SRR4064367	SRX2053353	SRP082966	SAMN05645893	1,406,852	93%	14%
SRR4064366	SRX2053352	SRP082966	SAMN05645894	1,541,210	92%	15%
SRR4064365	SRX2053351	SRP082966	SAMN05645895	4,726,342	95%	14%
SRR4733646	SRX2310923	SRP092332	SAMN05954844	8,883,190	72%	18%
SRR4733645	SRX2310922	SRP092332	SAMN05954845	21,309,126	67%	17%
SRR4733644	SRX2310921	SRP092332	SAMN05954846	14,779,996	74%	15%
SRR4733643	SRX2310920	SRP092332	SAMN05954847	11,299,658	74%	14%
SRR4733642	SRX2310919	SRP092332	SAMN05954848	30,527,098	73%	15%
SRR4733655	SRX2310932	SRP092332	SAMN05954855	5,405,836	69%	22%
SRR4733654	SRX2310931	SRP092332	SAMN05954856	1,130,880	63%	21%
SRR4733653	SRX2310930	SRP092332	SAMN05954857	6,189,934	70%	19%
SRR4733652	SRX2310929	SRP092332	SAMN05954858	11,326,784	71%	20%
SRR4733651	SRX2310928	SRP092332	SAMN05954859	51,899,670	68%	19%
SRR4733650	SRX2310927	SRP092332	SAMN05954860	45,265,654	73%	17%
SRR4733649	SRX2310926	SRP092332	SAMN05954861	7,634,394	79%	17%
SRR4733648	SRX2310925	SRP092332	SAMN05954862	26,537,658	71%	18%
SRR4733647	SRX2310924	SRP092332	SAMN05954863	5,322,294	67%	18%
SRR4733660	SRX2310937	SRP092332	SAMN05954864	4,280,630	73%	35%
SRR4733659	SRX2310936	SRP092332	SAMN05954865	5,645,720	69%	40%
SRR4733658	SRX2310935	SRP092332	SAMN05954866	29,525,720	73%	33%
SRR4733657	SRX2310934	SRP092332	SAMN05954867	8,447,570	74%	37%
SRR4733656	SRX2310933	SRP092332	SAMN05954870	3,568,130	70%	27%
SRR4733663	SRX2310940	SRP092332	SAMN05954871	8,222,462	73%	32%
SRR4733662	SRX2310939	SRP092332	SAMN05954872	7,068,212	73%	32%
SRR4733661	SRX2310938	SRP092332	SAMN05954873	3,483,996	73%	31%
SRR5241765	SRX2548615	SRP099257	SAMN06318509	8,625,772	54%	15%
SRR5241766	SRX2548615	SRP099257	SAMN06318509	14,888,904	54%	15%
SRR5241763	SRX2548614	SRP099257	SAMN06318510	16,765,618	83%	14%
SRR5241764	SRX2548614	SRP099257	SAMN06318510	35,029,708	83%	14%
SRR5893934	SRX3059514	SRP114787	SAMN07447608	15,968,908	54%	9%
SRR5893933	SRX3059513	SRP114787	SAMN07447618	20,043,084	60%	10%
SRR5893932	SRX3059512	SRP114787	SAMN07447619	503,586	47%	6%
SRR5893931	SRX3059511	SRP114787	SAMN07447620	16,061,016	70%	9%
SRR5893930	SRX3059510	SRP114787	SAMN07447621	13,942,014	77%	9%
SRR5893929	SRX3059509	SRP114787	SAMN07447622	16,390,312	71%	9%
SRR5893928	SRX3059508	SRP114787	SAMN07447623	13,471,222	56%	9%
SRR5893927	SRX3059507	SRP114787	SAMN07447624	20,808,318	54%	9%
SRR5893926	SRX3059506	SRP114787	SAMN07447625	20,841,918	56%	9%
SRR5893925	SRX3059505	SRP114787	SAMN07447626	15,150,704	68%	10%
SRR5893924	SRX3059504	SRP114787	SAMN07447627	11,338,150	66%	9%
SRR5893923	SRX3059503	SRP114787	SAMN07447628	12,569,270	77%	9%
SRR5893922	SRX3059502	SRP114787	SAMN07447629	15,366,066	64%	9%
SRR5893921	SRX3059501	SRP114787	SAMN07447630	8,510,442	63%	9%
SRR5893920	SRX3059500	SRP114787	SAMN07447631	10,222,700	61%	9%
SRR5893919	SRX3059499	SRP114787	SAMN07447632	10,556,752	66%	9%
SRR5893918	SRX3059498	SRP114787	SAMN07447633	16,487,758	68%	9%
SRR5893917	SRX3059497	SRP114787	SAMN07447634	15,629,498	54%	9%
SRR5893916	SRX3059496	SRP114787	SAMN07447635	12,654,666	78%	9%
SRR5893915	SRX3059495	SRP114787	SAMN07447636	8,561,906	57%	9%
SRR5893914	SRX3059494	SRP114787	SAMN07447637	9,955,642	72%	9%
SRR5893940	SRX3059520	SRP114787	SAMN07447650	16,711,952	61%	9%
SRR5893939	SRX3059519	SRP114787	SAMN07447651	15,712,506	47%	10%
SRR5893938	SRX3059518	SRP114787	SAMN07447652	10,926,312	58%	9%
SRR5893937	SRX3059517	SRP114787	SAMN07447653	10,363,606	73%	9%
SRR5893936	SRX3059516	SRP114787	SAMN07447654	12,564,474	47%	9%
SRR5893935	SRX3059515	SRP114787	SAMN07447655	13,981,008	71%	9%
SRR9678434	SRX6438775	SRP214528	SAMN12261853	37,555,116	75%	19%
SRR9678435	SRX6438774	SRP214528	SAMN12261854	29,666,692	76%	19%
SRR9678452	SRX6438757	SRP214528	SAMN12261855	40,594,386	77%	19%
SRR9678453	SRX6438756	SRP214528	SAMN12261856	30,397,292	78%	19%
SRR9678454	SRX6438755	SRP214528	SAMN12261857	38,099,520	78%	19%
SRR9678455	SRX6438754	SRP214528	SAMN12261858	30,267,692	79%	19%
SRR9678456	SRX6438753	SRP214528	SAMN12261859	73,757,884	70%	19%
SRR9678457	SRX6438752	SRP214528	SAMN12261860	65,976,396	75%	20%
SRR9678458	SRX6438751	SRP214528	SAMN12261861	61,909,354	76%	20%
SRR9678417	SRX6438792	SRP214528	SAMN12261919	66,012,596	80%	19%
SRR9678469	SRX6438740	SRP214528	SAMN12261920	70,512,746	81%	18%
SRR9678416	SRX6438793	SRP214528	SAMN12261921	62,563,728	79%	18%
SRR9678423	SRX6438786	SRP214528	SAMN12261922	84,146,658	75%	20%
SRR9678467	SRX6438742	SRP214528	SAMN12261923	67,924,298	74%	19%
SRR9678465	SRX6438744	SRP214528	SAMN12261924	81,585,160	75%	20%
SRR10253130	SRX6971179	SRP224973	SAMN12915177	41,460,846	79%	14%
SRR10253166	SRX6971143	SRP224973	SAMN12915188	70,629,484	84%	14%
SRR10253165	SRX6971144	SRP224973	SAMN12915189	99,816,528	85%	14%
SRR10253152	SRX6971157	SRP224973	SAMN12915201	85,797,964	85%	15%
SRR10253151	SRX6971158	SRP224973	SAMN12915202	55,110,586	84%	15%
SRR10253138	SRX6971171	SRP224973	SAMN12915214	69,733,286	78%	19%
SRR10253137	SRX6971172	SRP224973	SAMN12915215	76,339,672	72%	17%
SRR12536529	SRX9026270	SRP279112	SAMN15927682	36,075,582	62%	22%
SRR14777837	SRX11111155	SRP323555	SAMN19655201	324,063,272	49%	17%
SRR14777836	SRX11111156	SRP323555	SAMN19655201	299,938,164	42%	17%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Insecta GenBank	85,119	61,711 (72.50%)	61,711 (72.50%)	66.36%	67.16%
Drosophila melanogaster GenBank	27,949	13,352 (47.77%)	13,352 (47.77%)	89.01%	94.41%
Drosophila melanogaster known RefSeq (NP_)	30,157	21,484 (71.24%)	21,484 (71.24%)	90.70%	95.75%
Same-species GenBank	333	275 (82.58%)	275 (82.58%)	93.67%	98.35%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
Prin_Dsim_3.1 (Current) Coverage: 99.55%	Prin_Dsim_3.1 (Current) Coverage: 99.55%
Prin_Dsim_3.0 (Previous) Coverage: 99.54%	Prin_Dsim_3.0 (Previous) Coverage: 99.54%
Percent Identity: 99.99%	Percent Identity: 99.99%

Comparison of the current and previous annotations

The annotation produced for this release (103) was compared to the annotation in the previous release (102) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	Prin_Dsim_3.1 (Current) to Prin_Dsim_3.0 (Previous)
Identical	14%
Minor changes	80%
Major changes	3%
New	3%
Deprecated	2%
Other	<1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences