NCBI Nicotiana sylvestris Annotation Release 100

The RefSeq genome records for Nicotiana sylvestris were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Nicotiana sylvestris Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Oct 16 2014
Date of submission of annotation to the public databases: Oct 22 2014
Software version: 6.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
Nsyl	GCF_000393655.1	Philip Morris International R&D	05-16-2013	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	Nsyl
Genes and pseudogenes	40,317
protein-coding	33,678
non-coding	4,667
pseudogenes	1,972
genes with variants	9,033
mRNAs	48,059
fully-supported	39,821
with > 5% ab initio	7,546
partial	1,408
with filled gap(s)	2
known RefSeq (NM_)	0
model RefSeq (XM_)	48,059
Other RNAs	10,984
fully-supported	10,212
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	10,212
CDSs	48,059
fully-supported	39,821
with > 5% ab initio	7,642
partial	1,408
with major correction(s)	233
known RefSeq (NP_)	0
model RefSeq (XP_)	48,059

Detailed reports

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	38,345	4,604	3,127	51	99,873
All transcripts	59,043	1,749	1,466	33	17,384
mRNA	48,059	1,778	1,525	33	16,037
misc_RNA	3,055	2,368	1,989	129	14,364
tRNA	772	74	73	70	88
lncRNA	7,157	1,467	966	79	17,384
Single-exon transcripts	5,692	1,082	854	51	8,148
coding transcripts (NM_/XM_ )	5,692	1,082	854	51	8,148
CDSs	48,059	1,280	1,044	33	15,309
Exons	208,262	330	177	1	12,180
in coding transcripts (NM_/XM_ )	187,231	320	172	1	9,314
in non-coding transcripts (NR_/XR_ )	27,795	380	194	1	12,180
Introns	162,596	868	307	30	78,537
in coding transcripts (NM_/XM_ )	148,429	816	293	30	78,537
in non-coding transcripts (NR_/XR_ )	20,463	1,228	427	30	62,432

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.52	1	1	40
Number of exons per transcript	5.75	4	1	79

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
Nsyl	GCF_000393655.1	1.33%	48.35%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with short reads and reported in the Short read transcript alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	121	117 (96.69%)	112 (92.56%)	99.69%	96.88%
Same-species EST	8,582	6,389 (74.45%)	4,242 (49.43%)	99.41%	98.34%

Short read transcript alignments

The following short reads (RNA-Seq) from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Number (%) of aligned reads	Number (%) spliced reads	Number of introns
All	Aggregate of all aligned samples	6,750,310,717	4,848,001,804 (71.82%)	1,311,132,375 (19.42%)	220,854
SAMEA1904860	root (Nicotiana sylvestris, SAMEA1904860)	204,986,464	181,399,669 (88.49%)	45,547,520 (22.22%)	157,902
SAMEA1904862	root (Nicotiana sylvestris, SAMEA1904862)	180,674,672	161,438,650 (89.35%)	40,642,887 (22.50%)	156,521
SAMEA1904864	leaf (Nicotiana sylvestris, SAMEA1904864)	171,464,930	154,660,679 (90.20%)	38,990,126 (22.74%)	155,200
SAMEA1904866	flower (Nicotiana sylvestris, SAMEA1904866)	142,195,408	129,390,704 (91.00%)	35,349,396 (24.86%)	167,025
SAMEA1904868	leaf (Nicotiana sylvestris, SAMEA1904868)	117,850,840	104,746,792 (88.88%)	26,905,983 (22.83%)	148,434
SAMEA1904871	leaf (Nicotiana sylvestris, SAMEA1904871)	192,406,910	173,602,748 (90.23%)	43,705,237 (22.72%)	155,689
SAMEA1904874	root (Nicotiana sylvestris, SAMEA1904874)	176,123,002	156,867,308 (89.07%)	40,435,548 (22.96%)	157,855
SAMEA1904877	flower (Nicotiana sylvestris, SAMEA1904877)	251,716,202	228,805,156 (90.90%)	61,267,361 (24.34%)	173,983
SAMEA1904880	flower (Nicotiana sylvestris, SAMEA1904880)	225,059,290	204,030,108 (90.66%)	55,280,727 (24.56%)	172,063
SAMEA1904896	leaf (Nicotiana tomentosiformis, SAMEA1904896)	142,479,598	83,411,576 (58.54%)	25,234,329 (17.71%)	114,357
SAMEA1904903	root (Nicotiana tomentosiformis, SAMEA1904903)	212,731,912	119,268,575 (56.07%)	35,246,282 (16.57%)	121,838
SAMEA1904916	root (Nicotiana tomentosiformis, SAMEA1904916)	210,025,788	121,146,912 (57.68%)	36,736,261 (17.49%)	123,194
SAMEA1904929	flower (Nicotiana tomentosiformis, SAMEA1904929)	206,891,652	120,475,314 (58.23%)	36,646,996 (17.71%)	128,120
SAMEA1904940	flower (Nicotiana tomentosiformis, SAMEA1904940)	128,117,186	74,105,911 (57.84%)	20,704,686 (16.16%)	118,923
SAMEA1904945	leaf (Nicotiana tomentosiformis, SAMEA1904945)	167,391,118	94,361,438 (56.37%)	28,236,327 (16.87%)	115,928
SAMEA1904950	flower (Nicotiana tomentosiformis, SAMEA1904950)	335,358,960	192,739,591 (57.47%)	59,217,412 (17.66%)	126,845
SAMEA1904961	root (Nicotiana tomentosiformis, SAMEA1904961)	168,099,458	98,137,279 (58.38%)	29,140,067 (17.34%)	120,635
SAMEA1904966	leaf (Nicotiana tomentosiformis, SAMEA1904966)	191,495,418	108,409,892 (56.61%)	32,249,785 (16.84%)	116,432
SAMEA1904968	flower (Nicotiana tomentosiformis, SAMEA1904968)	285,604,086	166,930,790 (58.45%)	50,856,579 (17.81%)	125,817
SAMN01090744	leaves (Nicotiana benthamiana, SAMN01090744)	172,019	83,372 (48.47%)	43,850 (25.49%)	32,124
SAMN01090745	roots (Nicotiana benthamiana, SAMN01090745)	137,710	50,147 (36.41%)	29,629 (21.52%)	25,057
SAMN01090746	floral corollas (Nicotiana benthamiana, SAMN01090746)	147,183	43,739 (29.72%)	20,578 (13.98%)	18,878
SAMN01090747	developing seed capsules (Nicotiana benthamiana, SAMN01090747)	334,026	121,315 (36.32%)	65,839 (19.71%)	33,997
SAMN01911394	Apex (Nicotiana benthamiana, SAMN01911394)	13,830,706	8,217,972 (59.42%)	2,352,936 (17.01%)	101,807
SAMN01911398	Capsule (Nicotiana benthamiana, SAMN01911398)	29,627,948	18,916,731 (63.85%)	5,015,115 (16.93%)	108,065
SAMN01911402	Leaf drought stressed (Nicotiana benthamiana, SAMN01911402)	22,876,338	13,072,864 (57.15%)	3,694,763 (16.15%)	98,165
SAMN01911406	Flower (Nicotiana benthamiana, SAMN01911406)	24,450,508	13,844,604 (56.62%)	3,761,173 (15.38%)	101,543
SAMN01911410	Leaf (Nicotiana benthamiana, SAMN01911410)	27,172,778	17,163,842 (63.17%)	4,602,458 (16.94%)	99,795
SAMN01911414	Roots (Nicotiana benthamiana, SAMN01911414)	25,915,042	17,078,160 (65.90%)	4,622,199 (17.84%)	102,605
SAMN01911418	Seedling (Nicotiana benthamiana, SAMN01911418)	157,542,674	108,016,658 (68.56%)	29,317,690 (18.61%)	116,549
SAMN01911422	Stem (Nicotiana benthamiana, SAMN01911422)	24,677,876	15,688,274 (63.57%)	4,443,985 (18.01%)	105,361
SAMN01911426	Tissue culture (Nicotiana benthamiana, SAMN01911426)	19,014,746	12,250,460 (64.43%)	3,460,265 (18.20%)	103,988
SAMN02316609	Leaf (Nicotiana tabacum, SAMN02316609)	109,139,490	77,418,246 (70.94%)	20,873,972 (19.13%)	153,369
SAMN02316610	Leaf (Nicotiana tabacum, SAMN02316610)	164,957,186	114,221,987 (69.24%)	28,931,263 (17.54%)	157,338
SAMN02316611	Leaf (Nicotiana tabacum, SAMN02316611)	119,836,094	87,421,696 (72.95%)	23,800,150 (19.86%)	153,470
SAMN02316612	Root (Nicotiana tabacum, SAMN02316612)	98,714,710	68,753,268 (69.65%)	17,892,701 (18.13%)	154,563
SAMN02316613	Root (Nicotiana tabacum, SAMN02316613)	91,788,176	62,634,053 (68.24%)	15,501,606 (16.89%)	156,458
SAMN02316614	Root (Nicotiana tabacum, SAMN02316614)	104,367,224	72,576,415 (69.54%)	18,836,250 (18.05%)	155,524
SAMN02429707	Whole Plant (Nicotiana benthamiana, SAMN02429707)	175,488,499	111,301,494 (63.42%)	25,293,966 (14.41%)	117,376
SAMN02645674	Immature Flower (Nicotiana tabacum, SAMN02645674)	97,903,872	73,584,259 (75.16%)	20,477,454 (20.92%)	162,419
SAMN02645675	Mature Flower (Nicotiana tabacum, SAMN02645675)	100,846,380	76,162,495 (75.52%)	19,956,325 (19.79%)	160,132
SAMN02645676	Senescent Flower (Nicotiana tabacum, SAMN02645676)	57,299,122	43,177,219 (75.35%)	10,106,850 (17.64%)	141,980
SAMN02645677	Dry Capsule (Nicotiana tabacum, SAMN02645677)	43,539,758	31,560,078 (72.49%)	8,026,790 (18.44%)	128,760
SAMN02645678	Stem (Nicotiana tabacum, SAMN02645678)	52,035,046	37,640,379 (72.34%)	9,354,468 (17.98%)	135,936
SAMN02645679	Root (Nicotiana tabacum, SAMN02645679)	77,425,142	56,187,779 (72.57%)	13,735,568 (17.74%)	144,168
SAMN02645680	Young Leaf (Nicotiana tabacum, SAMN02645680)	54,262,284	41,505,297 (76.49%)	11,415,589 (21.04%)	137,105
SAMN02645681	Mature Leaf (Nicotiana tabacum, SAMN02645681)	69,984,574	52,992,658 (75.72%)	14,125,381 (20.18%)	139,693
SAMN02645682	Senescent Leaf (Nicotiana tabacum, SAMN02645682)	80,142,054	59,739,638 (74.54%)	15,964,598 (19.92%)	144,958
SAMN02645683	Immature Flower (Nicotiana tabacum, SAMN02645683)	92,768,572	69,578,466 (75.00%)	19,362,677 (20.87%)	159,656
SAMN02645684	Mature Flower (Nicotiana tabacum, SAMN02645684)	86,486,460	65,008,254 (75.17%)	16,802,014 (19.43%)	154,475
SAMN02645685	Senescent Flower (Nicotiana tabacum, SAMN02645685)	46,017,156	31,329,789 (68.08%)	8,286,768 (18.01%)	140,349
SAMN02645686	Dry Capsule (Nicotiana tabacum, SAMN02645686)	57,343,858	40,683,686 (70.95%)	10,463,697 (18.25%)	135,552
SAMN02645687	Stem (Nicotiana tabacum, SAMN02645687)	54,634,530	38,857,206 (71.12%)	9,255,496 (16.94%)	139,088
SAMN02645688	Root (Nicotiana tabacum, SAMN02645688)	23,162,812	16,524,813 (71.34%)	4,208,148 (18.17%)	127,906
SAMN02645689	Young Leaf (Nicotiana tabacum, SAMN02645689)	69,627,388	51,681,804 (74.23%)	13,809,131 (19.83%)	139,570
SAMN02645690	Mature Leaf (Nicotiana tabacum, SAMN02645690)	57,260,944	42,978,617 (75.06%)	11,910,326 (20.80%)	138,072
SAMN02645691	Senescent Leaf (Nicotiana tabacum, SAMN02645691)	79,186,800	59,506,811 (75.15%)	15,964,552 (20.16%)	143,009
SAMN02645692	Immature Flower (Nicotiana tabacum, SAMN02645692)	81,592,536	61,561,887 (75.45%)	17,767,580 (21.78%)	156,824
SAMN02645693	Mature Flower (Nicotiana tabacum, SAMN02645693)	88,309,184	65,544,532 (74.22%)	17,478,726 (19.79%)	158,226
SAMN02645694	Dry Capsule (Nicotiana tabacum, SAMN02645694)	43,118,862	30,675,685 (71.14%)	7,671,744 (17.79%)	126,365
SAMN02645695	Stem (Nicotiana tabacum, SAMN02645695)	86,850,900	63,677,803 (73.32%)	16,372,208 (18.85%)	150,891
SAMN02645696	Root (Nicotiana tabacum, SAMN02645696)	86,243,630	61,493,720 (71.30%)	14,220,972 (16.49%)	149,485
SAMN02645697	Young Leaf (Nicotiana tabacum, SAMN02645697)	73,650,890	55,884,662 (75.88%)	15,438,825 (20.96%)	143,003
SAMN02645698	Mature Leaf (Nicotiana tabacum, SAMN02645698)	76,538,320	57,740,258 (75.44%)	16,228,659 (21.20%)	143,088
SAMN02645699	Senescent Leaf (Nicotiana tabacum, SAMN02645699)	93,215,816	69,919,620 (75.01%)	17,773,932 (19.07%)	139,751

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Number (%) of aligned reads	Number (%) spliced reads
ERR274394	ERX248670	ERP002501	SAMEA1904860	204,986,464	181,399,669 (88.49%)	45,547,520 (22.22%)
ERR274393	ERX248669	ERP002501	SAMEA1904862	180,674,672	161,438,650 (89.35%)	40,642,887 (22.50%)
ERR274391	ERX248667	ERP002501	SAMEA1904864	171,464,930	154,660,679 (90.20%)	38,990,126 (22.74%)
ERR274389	ERX248665	ERP002501	SAMEA1904866	142,195,408	129,390,704 (91.00%)	35,349,396 (24.86%)
ERR274390	ERX248666	ERP002501	SAMEA1904868	117,850,840	104,746,792 (88.88%)	26,905,983 (22.83%)
ERR274392	ERX248668	ERP002501	SAMEA1904871	192,406,910	173,602,748 (90.23%)	43,705,237 (22.72%)
ERR274395	ERX248671	ERP002501	SAMEA1904874	176,123,002	156,867,308 (89.07%)	40,435,548 (22.96%)
ERR274388	ERX248664	ERP002501	SAMEA1904877	251,716,202	228,805,156 (90.90%)	61,267,361 (24.34%)
ERR274387	ERX248663	ERP002501	SAMEA1904880	225,059,290	204,030,108 (90.66%)	55,280,727 (24.56%)
ERR274400	ERX248676	ERP002502	SAMEA1904896	142,479,598	83,411,576 (58.54%)	25,234,329 (17.71%)
ERR274404	ERX248680	ERP002502	SAMEA1904903	212,731,912	119,268,575 (56.07%)	35,246,282 (16.57%)
ERR274405	ERX248681	ERP002502	SAMEA1904916	210,025,788	121,146,912 (57.68%)	36,736,261 (17.49%)
ERR274396	ERX248672	ERP002502	SAMEA1904929	206,891,652	120,475,314 (58.23%)	36,646,996 (17.71%)
ERR274399	ERX248675	ERP002502	SAMEA1904940	128,117,186	74,105,911 (57.84%)	20,704,686 (16.16%)
ERR274401	ERX248677	ERP002502	SAMEA1904945	167,391,118	94,361,438 (56.37%)	28,236,327 (16.87%)
ERR274398	ERX248674	ERP002502	SAMEA1904950	335,358,960	192,739,591 (57.47%)	59,217,412 (17.66%)
ERR274403	ERX248679	ERP002502	SAMEA1904961	168,099,458	98,137,279 (58.38%)	29,140,067 (17.34%)
ERR274402	ERX248678	ERP002502	SAMEA1904966	191,495,418	108,409,892 (56.61%)	32,249,785 (16.84%)
ERR274397	ERX248673	ERP002502	SAMEA1904968	285,604,086	166,930,790 (58.45%)	50,856,579 (17.81%)
SRR574606	SRX713889	SRP014556	SAMN01090744	172,019	83,372 (48.47%)	43,850 (25.49%)
SRR574608	SRX713888	SRP014556	SAMN01090745	137,710	50,147 (36.41%)	29,629 (21.52%)
SRR574607	SRX179858	SRP014556	SAMN01090746	147,183	43,739 (29.72%)	20,578 (13.98%)
SRR574609	SRX713890	SRP014556	SAMN01090747	334,026	121,315 (36.32%)	65,839 (19.71%)
SRR685298	SRX228494	SRP018508	SAMN01911394	13,830,706	8,217,972 (59.42%)	2,352,936 (17.01%)
SRR696884	SRX231910	SRP018508	SAMN01911398	29,627,948	18,916,731 (63.85%)	5,015,115 (16.93%)
SRR696915	SRX231916	SRP018508	SAMN01911402	22,876,338	13,072,864 (57.15%)	3,694,763 (16.15%)
SRR696938	SRX231918	SRP018508	SAMN01911406	24,450,508	13,844,604 (56.62%)	3,761,173 (15.38%)
SRR696940	SRX231920	SRP018508	SAMN01911410	27,172,778	17,163,842 (63.17%)	4,602,458 (16.94%)
SRR696961	SRX231925	SRP018508	SAMN01911414	25,915,042	17,078,160 (65.90%)	4,622,199 (17.84%)
SRR696988	SRX231941	SRP018508	SAMN01911418	157,542,674	108,016,658 (68.56%)	29,317,690 (18.61%)
SRR696992	SRX231951	SRP018508	SAMN01911422	24,677,876	15,688,274 (63.57%)	4,443,985 (18.01%)
SRR697013	SRX231953	SRP018508	SAMN01911426	19,014,746	12,250,460 (64.43%)	3,460,265 (18.20%)
SRR1043177	SRX387250	SRP018508	SAMN02429707	53,598,244	33,797,206 (63.06%)	9,102,347 (16.98%)
SRR1043178	SRX387250	SRP018508	SAMN02429707	81,262,130	50,947,232 (62.69%)	13,730,296 (16.90%)
SRR1043179	SRX387266	SRP018508	SAMN02429707	40,628,125	26,557,056 (65.37%)	2,461,323 (6.06%)
SRR955761	SRX338101	SRP029183	SAMN02316609	109,139,490	77,418,246 (70.94%)	20,873,972 (19.13%)
SRR955762	SRX338102	SRP029183	SAMN02316610	164,957,186	114,221,987 (69.24%)	28,931,263 (17.54%)
SRR955763	SRX338103	SRP029183	SAMN02316611	119,836,094	87,421,696 (72.95%)	23,800,150 (19.86%)
SRR955765	SRX338104	SRP029183	SAMN02316612	98,714,710	68,753,268 (69.65%)	17,892,701 (18.13%)
SRR955766	SRX338105	SRP029183	SAMN02316613	91,788,176	62,634,053 (68.24%)	15,501,606 (16.89%)
SRR955767	SRX338106	SRP029183	SAMN02316614	104,367,224	72,576,415 (69.54%)	18,836,250 (18.05%)
SRR1199197	SRX495602	SRP029183	SAMN02645674	97,903,872	73,584,259 (75.16%)	20,477,454 (20.92%)
SRR1199069	SRX495520	SRP029183	SAMN02645675	100,846,380	76,162,495 (75.52%)	19,956,325 (19.79%)
SRR1199124	SRX495530	SRP029183	SAMN02645676	57,299,122	43,177,219 (75.35%)	10,106,850 (17.64%)
SRR1199063	SRX495517	SRP029183	SAMN02645677	43,539,758	31,560,078 (72.49%)	8,026,790 (18.44%)
SRR1199130	SRX495598	SRP029183	SAMN02645678	52,035,046	37,640,379 (72.34%)	9,354,468 (17.98%)
SRR1199121	SRX495526	SRP029183	SAMN02645679	77,425,142	56,187,779 (72.57%)	13,735,568 (17.74%)
SRR1199200	SRX495606	SRP029183	SAMN02645680	54,262,284	41,505,297 (76.49%)	11,415,589 (21.04%)
SRR1199072	SRX495523	SRP029183	SAMN02645681	69,984,574	52,992,658 (75.72%)	14,125,381 (20.18%)
SRR1199127	SRX495532	SRP029183	SAMN02645682	80,142,054	59,739,638 (74.54%)	15,964,598 (19.92%)
SRR1199198	SRX495603	SRP029183	SAMN02645683	92,768,572	69,578,466 (75.00%)	19,362,677 (20.87%)
SRR1199070	SRX495521	SRP029183	SAMN02645684	86,486,460	65,008,254 (75.17%)	16,802,014 (19.43%)
SRR1199125	SRX495531	SRP029183	SAMN02645685	46,017,156	31,329,789 (68.08%)	8,286,768 (18.01%)
SRR1199066	SRX495518	SRP029183	SAMN02645686	57,343,858	40,683,686 (70.95%)	10,463,697 (18.25%)
SRR1199132	SRX495600	SRP029183	SAMN02645687	54,634,530	38,857,206 (71.12%)	9,255,496 (16.94%)
SRR1199122	SRX495527	SRP029183	SAMN02645688	23,162,812	16,524,813 (71.34%)	4,208,148 (18.17%)
SRR1199202	SRX495607	SRP029183	SAMN02645689	69,627,388	51,681,804 (74.23%)	13,809,131 (19.83%)
SRR1199073	SRX495524	SRP029183	SAMN02645690	57,260,944	42,978,617 (75.06%)	11,910,326 (20.80%)
SRR1199128	SRX495534	SRP029183	SAMN02645691	79,186,800	59,506,811 (75.15%)	15,964,552 (20.16%)
SRR1199199	SRX495605	SRP029183	SAMN02645692	81,592,536	61,561,887 (75.45%)	17,767,580 (21.78%)
SRR1199071	SRX495522	SRP029183	SAMN02645693	88,309,184	65,544,532 (74.22%)	17,478,726 (19.79%)
SRR1199068	SRX495519	SRP029183	SAMN02645694	43,118,862	30,675,685 (71.14%)	7,671,744 (17.79%)
SRR1199135	SRX495601	SRP029183	SAMN02645695	86,850,900	63,677,803 (73.32%)	16,372,208 (18.85%)
SRR1199123	SRX495529	SRP029183	SAMN02645696	86,243,630	61,493,720 (71.30%)	14,220,972 (16.49%)
SRR1199203	SRX495608	SRP029183	SAMN02645697	73,650,890	55,884,662 (75.88%)	15,438,825 (20.96%)
SRR1199074	SRX495525	SRP029183	SAMN02645698	76,538,320	57,740,258 (75.44%)	16,228,659 (21.20%)
SRR1199129	SRX495535	SRP029183	SAMN02645699	93,215,816	69,919,620 (75.01%)	17,773,932 (19.07%)

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Arabidopsis thaliana known RefSeq (NP_)	35,173	30,201 (85.86%)	30,201 (85.86%)	66.78%	67.76%
Solanaceae GenBank	8,537	8,318 (97.43%)	8,318 (97.43%)	73.70%	80.40%
Solanaceae known RefSeq (NP_)	2,085	2,069 (99.23%)	2,069 (99.23%)	75.37%	83.07%
Same-species GenBank	116	111 (95.69%)	111 (95.69%)	79.45%	83.57%
Nicotiana tabacum GenBank	2,674	2,577 (96.37%)	2,577 (96.37%)	77.62%	86.35%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences