NCBI Aegilops tauschii Annotation Release 102

The RefSeq genome records for Aegilops tauschii were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Aegilops tauschii Annotation Release 102

Annotation release ID: 102
Date of Entrez queries for transcripts and proteins: Nov 17 2021
Date of submission of annotation to the public databases: Dec 2 2021
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
Aet v5.0	GCF_002575655.2	Johns Hopkins University	09-30-2021	Reference	8 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	Aet v5.0
Genes and pseudogenes	62,260
protein-coding	45,657
non-coding	11,408
Transcribed pseudogenes	1
Non-transcribed pseudogenes	5,194
genes with variants	9,833
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	58,575
fully-supported	45,112
with > 5% ab initio	12,224
partial	583
with filled gap(s)	204
known RefSeq (NM_)	5
model RefSeq (XM_)	58,570
non-coding RNAs	26,843
fully-supported	22,755
with > 5% ab initio	0
partial	14
with filled gap(s)	4
known RefSeq (NR_)	0
model RefSeq (XR_)	25,687
pseudo transcripts	1
fully-supported	1
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	1
model RefSeq (XR_)	0
CDSs	58,655
fully-supported	45,112
with > 5% ab initio	12,456
partial	521
with major correction(s)	975
known RefSeq (NP_)	5
model RefSeq (XP_)	58,650

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	57,065	3,740	2,226	57	192,072
All transcripts	85,418	1,904	1,605	56	27,061
mRNA	58,575	1,842	1,546	170	18,059
misc_RNA	6,804	2,545	2,209	105	18,174
tRNA	1,152	75	73	57	89
lncRNA	15,980	2,312	2,004	56	27,061
snoRNA	907	110	97	63	228
snRNA	291	142	124	98	202
rRNA	1,709	152	119	95	3,402
Single-exon transcripts	11,518	1,143	905	170	18,059
coding transcripts (NM_/XM_ )	11,492	1,142	904	170	18,059
non-coding transcripts (NR_/XR_ )	26	1,306	1,236	630	2,505
CDSs	58,655	1,262	1,041	90	16,071
Exons	262,059	404	217	1	23,508
in coding transcripts (NM_/XM_ )	215,552	380	203	1	18,059
in non-coding transcripts (NR_/XR_ )	55,325	477	265	2	23,508
Introns	194,821	744	159	30	191,186
in coding transcripts (NM_/XM_ )	164,360	742	148	30	191,186
in non-coding transcripts (NR_/XR_ )	38,849	713	233	30	93,582

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.51	1	1	50
Number of exons per transcript	5.23	4	1	78

BUSCO analysis of gene annotation

BUSCO v4.1.4 (Simão et al 2015, PMID: 26059717) was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the poales_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation (C:complete [S:single-copy, D:duplicated], F:fragmented, M:missing, n:number of genes used).

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 45577 coding genes, 32749 genes had a protein with an alignment covering 50% or more of the query and 8066 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
Aet v5.0	GCF_002575655.2	71.10%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	6	6 (100.00%)	6 (100.00%)	100.00%	100.00%
Same-species Genbank	160	157 (98.13%)	141 (88.13%)	99.42%	99.51%
Same-species EST	192	114 (59.38%)	93 (48.44%)	98.41%	93.84%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	8,182,868,509	89%	21%	224,032
SAMD00049228	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049228)	9,692,744	80%	57%	120,371
SAMD00049229	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049229)	10,503,508	80%	59%	135,270
SAMD00049230	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049230)	11,567,626	85%	38%	106,959
SAMD00049231	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049231)	10,764,372	85%	58%	135,577
SAMD00049232	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049232)	9,647,536	86%	57%	133,029
SAMD00049233	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049233)	11,389,522	83%	56%	135,165
SAMD00049234	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049234)	9,527,716	85%	58%	134,642
SAMD00049235	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049235)	11,295,380	82%	57%	139,230
SAMD00049236	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049236)	10,456,226	85%	58%	136,016
SAMD00049237	27142109	leaf (Aegilops tauschii, 10 days, SAMD00049237)	10,517,910	83%	51%	134,205
SAMD00192745	31986184	leaf (Aegilops tauschii, 1 week after sowing, SAMD00192745)	9,945,864	84%	58%	116,530
SAMEA104368388	NA	TauRoot3 (Aegilops tauschii, SAMEA104368388)	106,730,222	96%	21%	150,434
SAMEA104368389	NA	TauRoot1 (Aegilops tauschii, SAMEA104368389)	105,354,020	96%	22%	158,219
SAMEA104368390	NA	TauRoot2 (Aegilops tauschii, SAMEA104368390)	110,032,168	96%	22%	159,890
SAMEA104368391	NA	TauLeaf1 (Aegilops tauschii, SAMEA104368391)	101,355,206	96%	23%	155,766
SAMEA104368392	NA	TauLeaf2 (Aegilops tauschii, SAMEA104368392)	110,754,106	96%	23%	157,034
SAMEA104368393	NA	TauLeaf3 (Aegilops tauschii, SAMEA104368393)	115,268,828	96%	23%	157,616
SAMEA104368394	NA	TauSeedling1 (Aegilops tauschii, SAMEA104368394)	149,942,740	96%	23%	171,933
SAMEA104368395	NA	TauSeedling2 (Aegilops tauschii, SAMEA104368395)	143,728,780	96%	25%	173,068
SAMEA104368396	NA	TauSeedling3 (Aegilops tauschii, SAMEA104368396)	114,864,860	96%	26%	170,400
SAMEA104368397	NA	TauPoolDevGrain (Aegilops tauschii, SAMEA104368397)	121,810,076	87%	15%	166,754
SAMEA104368398	NA	TauDevGrain10dd1 (Aegilops tauschii, SAMEA104368398)	162,314,940	95%	24%	173,841
SAMEA104368399	NA	TauDevGrain10dd2 (Aegilops tauschii, SAMEA104368399)	155,331,158	95%	23%	173,686
SAMEA104368400	NA	TauDevGrain10dd3 (Aegilops tauschii, SAMEA104368400)	138,851,246	94%	23%	169,863
SAMEA104368401	NA	TauDevGrain27dd1 (Aegilops tauschii, SAMEA104368401)	180,033,810	96%	21%	173,762
SAMEA104368402	NA	TauDevGrain27dd2 (Aegilops tauschii, SAMEA104368402)	152,516,570	92%	13%	156,836
SAMEA104368403	NA	TauDevGrain27dd3 (Aegilops tauschii, SAMEA104368403)	162,122,742	88%	12%	158,748
SAMEA5096368	NA	leaf (Aegilops tauschii, SAMEA5096368)	59,718,326	91%	27%	121,856
SAMEA5096369	NA	leaf (Aegilops tauschii, SAMEA5096369)	79,394,648	91%	30%	138,781
SAMEA5096370	NA	leaf (Aegilops tauschii, SAMEA5096370)	71,910,866	91%	29%	132,616
SAMN00013421	NA	Aegilops tauschii (Aegilops tauschii, SAMN00013421)	57,109	33%	40%	8,068
SAMN05162574	NA	Endosperm (Aegilops tauschii, MIX of 15, 20DAP, SAMN05162574)	217,696,190	82%	8%	144,233
SAMN05162575	NA	Endosperm (Aegilops tauschii, MIX of 15, 20DAP, SAMN05162575)	262,664,586	83%	10%	145,823
SAMN05162576	NA	Endosperm (Aegilops tauschii, 15DAP, SAMN05162576)	206,133,704	84%	9%	143,596
SAMN05162577	NA	Endosperm (Aegilops tauschii, 20DAP, SAMN05162577)	205,315,916	84%	7%	140,733
SAMN05162578	NA	Endosperm (Aegilops tauschii, 15DAP, SAMN05162578)	235,988,388	83%	8%	141,345
SAMN05162579	NA	Endosperm (Aegilops tauschii, 20DAP, SAMN05162579)	206,276,784	79%	7%	135,777
SAMN08013125	NA	coleoptile (Aegilops tauschii, 10 days, SAMN08013125)	61,726,210	89%	37%	149,629
SAMN08013127	NA	coleoptile (Aegilops tauschii, 10 days, SAMN08013127)	39,383,984	89%	36%	143,158
SAMN11400242	31628162,33706417	SC (Pericarp) (Aegilops tauschii, SAMN11400242)	79,234,778	90%	19%	137,418
SAMN11400243	31628162,33706417	endorsperm (Aegilops tauschii, SAMN11400243)	196,067,336	90%	12%	132,947
SAMN11400244	31628162,33706417	endorsperm (Aegilops tauschii, SAMN11400244)	127,399,644	69%	22%	132,521
SAMN11400245	31628162,33706417	embryo (Aegilops tauschii, SAMN11400245)	133,707,334	92%	23%	141,849
SAMN11400246	31628162,33706417	embryo (Aegilops tauschii, SAMN11400246)	113,003,076	91%	23%	136,339
SAMN11400247	31628162,33706417	embryo (Aegilops tauschii, SAMN11400247)	117,075,800	77%	23%	135,231
SAMN11400248	31628162,33706417	embryo (Aegilops tauschii, SAMN11400248)	126,250,950	91%	20%	130,159
SAMN11400249	31628162,33706417	embryo (Aegilops tauschii, SAMN11400249)	127,497,324	78%	8%	87,538
SAMN11400250	31628162,33706417	embryo (Aegilops tauschii, SAMN11400250)	92,774,324	67%	8%	75,322
SAMN11400251	31628162,33706417	embryo (Aegilops tauschii, SAMN11400251)	92,287,542	70%	8%	76,758
SAMN11400326	31628162,33706417	SC (Pericarp) (Aegilops tauschii, SAMN11400326)	226,925,444	92%	14%	151,015
SAMN11400327	31628162,33706417	SC (Pericarp) (Aegilops tauschii, SAMN11400327)	169,928,052	92%	14%	149,359
SAMN11400328	31628162,33706417	endorsperm (Aegilops tauschii, SAMN11400328)	87,249,058	89%	8%	110,261
SAMN11400329	31628162,33706417	endorsperm (Aegilops tauschii, SAMN11400329)	113,792,834	93%	19%	135,093
SAMN11400330	31628162,33706417	embryo (Aegilops tauschii, SAMN11400330)	106,003,060	92%	17%	133,884
SAMN11400331	31628162,33706417	embryo (Aegilops tauschii, SAMN11400331)	80,745,372	88%	17%	126,762
SAMN11400332	31628162,33706417	embryo (Aegilops tauschii, SAMN11400332)	117,483,562	92%	19%	139,551
SAMN11400333	31628162,33706417	embryo (Aegilops tauschii, SAMN11400333)	110,913,320	92%	16%	126,432
SAMN11400334	31628162,33706417	embryo (Aegilops tauschii, SAMN11400334)	63,745,124	75%	5%	67,280
SAMN11400335	31628162,33706417	embryo (Aegilops tauschii, SAMN11400335)	116,966,724	75%	8%	76,668
SAMN12097637	NA	leaf (Aegilops tauschii, Third leaf stage, SAMN12097637)	55,898,374	93%	36%	133,725
SAMN12097638	NA	leaf (Aegilops tauschii, Third leaf stage, SAMN12097638)	53,711,756	93%	37%	140,431
SAMN12097639	NA	leaf (Aegilops tauschii, Third leaf stage, SAMN12097639)	55,127,340	93%	36%	141,706
SAMN12097649	NA	root (Aegilops tauschii, Third leaf stage, SAMN12097649)	53,360,304	93%	33%	144,439
SAMN12097650	NA	root (Aegilops tauschii, Third leaf stage, SAMN12097650)	54,484,104	93%	32%	145,788
SAMN12097651	NA	root (Aegilops tauschii, Third leaf stage, SAMN12097651)	54,877,658	93%	32%	139,249
SAMN12234475	NA	leaf (Aegilops tauschii, 10 days, SAMN12234475)	62,306,736	96%	23%	145,430
SAMN12234476	NA	root (Aegilops tauschii, 10 days, SAMN12234476)	74,107,356	90%	29%	145,457
SAMN13956086	NA	leaf (Aegilops tauschii subsp. strangulata, SAMN13956086)	90,326,078	92%	21%	117,192
SAMN13956087	NA	leaf (Aegilops tauschii subsp. strangulata, SAMN13956087)	91,678,162	92%	21%	124,257
SAMN13956088	NA	leaf (Aegilops tauschii subsp. strangulata, SAMN13956088)	86,647,302	94%	21%	121,538
SAMN13956089	NA	leaf (Aegilops tauschii subsp. strangulata, SAMN13956089)	96,182,796	94%	24%	141,889
SAMN14589774	NA	tiller bud (Aegilops tauschii, SAMN14589774)	71,602,376	90%	38%	141,351
SAMN14589775	NA	tiller bud (Aegilops tauschii, SAMN14589775)	84,448,020	89%	36%	143,246
SAMN14589776	NA	tiller bud (Aegilops tauschii, SAMN14589776)	71,002,932	89%	37%	144,944
SAMN14589777	NA	tiller bud (Aegilops tauschii, SAMN14589777)	62,269,278	90%	39%	140,317
SAMN14589778	NA	tiller bud (Aegilops tauschii, SAMN14589778)	52,574,058	90%	38%	132,768
SAMN14589779	NA	tiller bud (Aegilops tauschii, SAMN14589779)	72,120,394	90%	38%	140,791
SAMN14589780	NA	tiller bud (Aegilops tauschii, SAMN14589780)	50,013,520	89%	37%	137,831
SAMN14589781	NA	tiller bud (Aegilops tauschii, SAMN14589781)	50,694,154	89%	36%	142,823
SAMN14589782	NA	tiller bud (Aegilops tauschii, SAMN14589782)	54,550,414	89%	37%	140,873
SAMN14589783	NA	tiller bud (Aegilops tauschii, SAMN14589783)	84,108,896	90%	39%	137,810
SAMN14589784	NA	tiller bud (Aegilops tauschii, SAMN14589784)	46,154,886	90%	38%	131,868
SAMN14589785	NA	tiller bud (Aegilops tauschii, SAMN14589785)	54,763,528	87%	38%	132,939
SAMN14589786	NA	tiller bud (Aegilops tauschii, SAMN14589786)	51,046,354	89%	38%	135,962
SAMN14589787	NA	tiller bud (Aegilops tauschii, SAMN14589787)	53,358,836	89%	37%	137,041
SAMN14589788	NA	tiller bud (Aegilops tauschii, SAMN14589788)	50,675,400	90%	37%	134,823
SAMN14589789	NA	tiller bud (Aegilops tauschii, SAMN14589789)	51,773,468	89%	39%	134,474
SAMN14589790	NA	tiller bud (Aegilops tauschii, SAMN14589790)	47,993,666	89%	39%	131,318
SAMN14589791	NA	tiller bud (Aegilops tauschii, SAMN14589791)	53,373,118	89%	39%	135,547

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR058959	DRX053538	DRP003112	SAMD00049228	9,692,744	80%	57%
DRR058960	DRX053539	DRP003112	SAMD00049229	10,503,508	80%	59%
DRR058961	DRX053540	DRP003112	SAMD00049230	11,567,626	85%	38%
DRR058962	DRX053541	DRP003112	SAMD00049231	10,764,372	85%	58%
DRR058963	DRX053542	DRP003112	SAMD00049232	9,647,536	86%	57%
DRR058964	DRX053543	DRP003112	SAMD00049233	11,389,522	83%	56%
DRR058965	DRX053544	DRP003112	SAMD00049234	9,527,716	85%	58%
DRR058966	DRX053545	DRP003112	SAMD00049235	11,295,380	82%	57%
DRR058967	DRX053546	DRP003112	SAMD00049236	10,456,226	85%	58%
DRR058968	DRX053547	DRP003112	SAMD00049237	10,517,910	83%	51%
DRR197485	DRX187902	DRP005901	SAMD00192745	9,945,864	84%	58%
ERR2190530	ERX2246645	ERP105061	SAMEA104368388	106,730,222	96%	21%
ERR2190531	ERX2246646	ERP105061	SAMEA104368389	105,354,020	96%	22%
ERR2190532	ERX2246647	ERP105061	SAMEA104368390	110,032,168	96%	22%
ERR2190533	ERX2246648	ERP105061	SAMEA104368391	101,355,206	96%	23%
ERR2190534	ERX2246649	ERP105061	SAMEA104368392	110,754,106	96%	23%
ERR2190535	ERX2246650	ERP105061	SAMEA104368393	115,268,828	96%	23%
ERR2190536	ERX2246651	ERP105061	SAMEA104368394	149,942,740	96%	23%
ERR2190537	ERX2246652	ERP105061	SAMEA104368395	143,728,780	96%	25%
ERR2190538	ERX2246653	ERP105061	SAMEA104368396	114,864,860	96%	26%
ERR2190539	ERX2246654	ERP105061	SAMEA104368397	121,810,076	87%	15%
ERR2190540	ERX2246655	ERP105061	SAMEA104368398	162,314,940	95%	24%
ERR2190541	ERX2246656	ERP105061	SAMEA104368399	155,331,158	95%	23%
ERR2190542	ERX2246657	ERP105061	SAMEA104368400	138,851,246	94%	23%
ERR2190543	ERX2246658	ERP105061	SAMEA104368401	180,033,810	96%	21%
ERR2190544	ERX2246659	ERP105061	SAMEA104368402	152,516,570	92%	13%
ERR2190545	ERX2246660	ERP105061	SAMEA104368403	162,122,742	88%	12%
ERR2919966	ERX2923126	ERP112210	SAMEA5096368	59,718,326	91%	27%
ERR2919967	ERX2923127	ERP112210	SAMEA5096369	79,394,648	91%	30%
ERR2919968	ERX2923128	ERP112210	SAMEA5096370	71,910,866	91%	29%
SRR043335	SRX020418	SRP002455	SAMN00013421	57,109	33%	40%
SRR3569947	SRX1791722	SRP075528	SAMN05162574	41,409,976	85%	9%
SRR6281335	SRX1791722	SRP075528	SAMN05162574	78,152,980	80%	9%
SRR5337643	SRX2635032	SRP075528	SAMN05162574	98,133,234	83%	7%
SRR3569948	SRX1791723	SRP075528	SAMN05162575	43,354,524	86%	11%
SRR6281363	SRX1791723	SRP075528	SAMN05162575	116,302,692	79%	11%
SRR5337644	SRX2635033	SRP075528	SAMN05162575	103,007,370	84%	10%
SRR3569949	SRX1791724	SRP075528	SAMN05162576	26,611,632	85%	8%
SRR6281365	SRX1791724	SRP075528	SAMN05162576	80,276,028	82%	10%
SRR5337647	SRX2635036	SRP075528	SAMN05162576	99,246,044	85%	9%
SRR3569950	SRX1791725	SRP075528	SAMN05162577	24,164,808	85%	7%
SRR6281367	SRX1791725	SRP075528	SAMN05162577	78,732,480	82%	8%
SRR5337648	SRX2635037	SRP075528	SAMN05162577	102,418,628	84%	7%
SRR3569951	SRX1791726	SRP075528	SAMN05162578	26,548,860	84%	8%
SRR6281369	SRX1791726	SRP075528	SAMN05162578	98,975,312	82%	9%
SRR5337645	SRX2635034	SRP075528	SAMN05162578	110,464,216	84%	8%
SRR3569952	SRX1791727	SRP075528	SAMN05162579	26,374,624	85%	8%
SRR6281370	SRX1791727	SRP075528	SAMN05162579	87,181,500	76%	8%
SRR5337646	SRX2635035	SRP075528	SAMN05162579	92,720,660	80%	7%
SRR6282641	SRX3384757	SRP124831	SAMN08013125	61,726,210	89%	37%
SRR6282642	SRX3384756	SRP124831	SAMN08013127	39,383,984	89%	36%
SRR9332407	SRX6098929	SRP152508	SAMN12097637	55,898,374	93%	36%
SRR9332406	SRX6098930	SRP152508	SAMN12097638	53,711,756	93%	37%
SRR9332409	SRX6098927	SRP152508	SAMN12097639	55,127,340	93%	36%
SRR9332421	SRX6098915	SRP152508	SAMN12097649	53,360,304	93%	33%
SRR9332420	SRX6098916	SRP152508	SAMN12097650	54,484,104	93%	32%
SRR9332419	SRX6098917	SRP152508	SAMN12097651	54,877,658	93%	32%
SRR8885594	SRX5671321	SRP192163	SAMN11400242	79,234,778	90%	19%
SRR8885593	SRX5671320	SRP192163	SAMN11400243	196,067,336	90%	12%
SRR8885592	SRX5671319	SRP192163	SAMN11400244	127,399,644	69%	22%
SRR8885591	SRX5671318	SRP192163	SAMN11400245	133,707,334	92%	23%
SRR8885590	SRX5671317	SRP192163	SAMN11400246	113,003,076	91%	23%
SRR8885589	SRX5671316	SRP192163	SAMN11400247	117,075,800	77%	23%
SRR8885588	SRX5671315	SRP192163	SAMN11400248	126,250,950	91%	20%
SRR8885587	SRX5671314	SRP192163	SAMN11400249	127,497,324	78%	8%
SRR8885586	SRX5671313	SRP192163	SAMN11400250	92,774,324	67%	8%
SRR8885585	SRX5671312	SRP192163	SAMN11400251	92,287,542	70%	8%
SRR8885584	SRX5671311	SRP192163	SAMN11400326	226,925,444	92%	14%
SRR8885583	SRX5671310	SRP192163	SAMN11400327	169,928,052	92%	14%
SRR8885582	SRX5671309	SRP192163	SAMN11400328	87,249,058	89%	8%
SRR8885581	SRX5671308	SRP192163	SAMN11400329	113,792,834	93%	19%
SRR8885580	SRX5671307	SRP192163	SAMN11400330	106,003,060	92%	17%
SRR8885579	SRX5671306	SRP192163	SAMN11400331	80,745,372	88%	17%
SRR8885578	SRX5671305	SRP192163	SAMN11400332	117,483,562	92%	19%
SRR8885577	SRX5671304	SRP192163	SAMN11400333	110,913,320	92%	16%
SRR8885576	SRX5671303	SRP192163	SAMN11400334	63,745,124	75%	5%
SRR8885575	SRX5671302	SRP192163	SAMN11400335	116,966,724	75%	8%
SRR9657453	SRX6418662	SRP213797	SAMN12234475	32,965,010	96%	22%
SRR9657452	SRX6418663	SRP213797	SAMN12234475	29,341,726	95%	23%
SRR9657463	SRX6418652	SRP213797	SAMN12234476	35,309,162	90%	30%
SRR9657462	SRX6418653	SRP213797	SAMN12234476	38,798,194	90%	29%
SRR10996756	SRX7657837	SRP246360	SAMN13956086	90,326,078	92%	21%
SRR10996755	SRX7657838	SRP246360	SAMN13956087	91,678,162	92%	21%
SRR10996754	SRX7657839	SRP246360	SAMN13956088	86,647,302	94%	21%
SRR10996753	SRX7657840	SRP246360	SAMN13956089	96,182,796	94%	24%
SRR11531683	SRX8102749	SRP256104	SAMN14589774	71,602,376	90%	38%
SRR11531682	SRX8102750	SRP256104	SAMN14589775	84,448,020	89%	36%
SRR11531691	SRX8102741	SRP256104	SAMN14589776	71,002,932	89%	37%
SRR11531690	SRX8102742	SRP256104	SAMN14589777	62,269,278	90%	39%
SRR11531689	SRX8102743	SRP256104	SAMN14589778	52,574,058	90%	38%
SRR11531688	SRX8102744	SRP256104	SAMN14589779	72,120,394	90%	38%
SRR11531687	SRX8102745	SRP256104	SAMN14589780	50,013,520	89%	37%
SRR11531686	SRX8102746	SRP256104	SAMN14589781	50,694,154	89%	36%
SRR11531685	SRX8102747	SRP256104	SAMN14589782	54,550,414	89%	37%
SRR11531684	SRX8102748	SRP256104	SAMN14589783	84,108,896	90%	39%
SRR11531681	SRX8102751	SRP256104	SAMN14589784	46,154,886	90%	38%
SRR11531680	SRX8102752	SRP256104	SAMN14589785	54,763,528	87%	38%
SRR11531697	SRX8102735	SRP256104	SAMN14589786	51,046,354	89%	38%
SRR11531696	SRX8102736	SRP256104	SAMN14589787	53,358,836	89%	37%
SRR11531695	SRX8102737	SRP256104	SAMN14589788	50,675,400	90%	37%
SRR11531694	SRX8102738	SRP256104	SAMN14589789	51,773,468	89%	39%
SRR11531693	SRX8102739	SRP256104	SAMN14589790	47,993,666	89%	39%
SRR11531692	SRX8102740	SRP256104	SAMN14589791	53,373,118	89%	39%

SRA Long Read Alignment Statistics

The following long read RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive were also used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	3112377	3029619 (97.34%)	2198134 (70.62%)	99.15	98.88
DRR001933	SAMD00002830	700124	684218 (97.72%)	413715 (59.09%)	99.21	99.06
DRR001934	SAMD00002829	669383	653145 (97.57%)	382059 (57.07%)	99.08	99.01
DRR012598	SAMD00008956	893917	868448 (97.15%)	729485 (81.60%)	99.21	98.81
DRR012599	SAMD00008957	848953	823808 (97.03%)	672875 (79.25%)	99.09	98.78

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Pooideae GenBank	36,123	28,838 (79.83%)	28,838 (79.83%)	74.61%	85.61%
Pooideae known RefSeq (NP_)	111	109 (98.20%)	109 (98.20%)	74.21%	82.15%
Arabidopsis thaliana GenBank	53,371	12,178 (22.82%)	12,178 (22.82%)	66.69%	73.08%
Arabidopsis thaliana known RefSeq (NP_)	48,148	32,845 (68.22%)	32,845 (68.22%)	64.87%	66.96%
Same-species GenBank	157	129 (82.17%)	129 (82.17%)	76.91%	84.90%
Same-species known RefSeq (NP_)	5	5 (100.00%)	5 (100.00%)	81.07%	90.28%
Oryza sativa GenBank	20,750	7,910 (38.12%)	7,910 (38.12%)	69.83%	78.87%
Oryza sativa high-quality model RefSeq (XP_)	16,838	15,933 (94.63%)	15,933 (94.63%)	69.43%	80.85%
Zea mays GenBank	50,545	22,015 (43.56%)	22,015 (43.56%)	71.29%	79.21%
Zea mays known RefSeq (NP_)	20,122	18,895 (93.90%)	18,895 (93.90%)	70.52%	79.85%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
Aet v5.0 (Current) Coverage: 93.53%	Aet v5.0 (Current) Coverage: 93.90%
Aet v4.0 (Previous) Coverage: 93.41%	Aet v4.0 (Previous) Coverage: 93.70%
Percent Identity: 99.99%	Percent Identity: 99.97%

Comparison of the current and previous annotations

The annotation produced for this release (102) was compared to the annotation in the previous release (101) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	Aet_v5.0 (Current) to Aet_v4.0 (Previous)
Identical	30%
Minor changes	55%
Major changes	8%
New	7%
Deprecated	10%
Other	<1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences