NCBI Canis lupus dingo Annotation Release 100

The RefSeq genome records for Canis lupus dingo were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Canis lupus dingo Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jun 26 2018
Date of submission of annotation to the public databases: Jul 5 2018
Software version: 8.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM325472v1	GCF_003254725.1	James Cook University	06-18-2018	Reference	unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM325472v1
Genes and pseudogenes	39,198
protein-coding	20,248
non-coding	12,981
transcribed pseudogenes	0
non-transcribed pseudogenes	5,749
genes with variants	14,703
immunoglobulin/T-cell receptor gene segments	220
other	0
mRNAs	62,946
fully-supported	61,388
with > 5% ab initio	796
partial	194
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	62,946
non-coding RNAs	24,799
fully-supported	22,586
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	24,368
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	63,166
fully-supported	61,388
with > 5% ab initio	911
partial	209
with major correction(s)	1,016
known RefSeq (NP_)	0
model RefSeq (XP_)	62,946

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	33,229	39,519	12,398	37	2,430,655
All transcripts	87,745	3,066	2,442	37	104,600
mRNA	62,946	3,528	2,888	135	104,600
misc_RNA	3,803	2,884	2,460	171	25,228
tRNA	431	74	73	71	84
lncRNA	18,783	1,899	1,340	106	20,887
snoRNA	541	109	106	47	328
snRNA	1,205	115	107	37	197
guide_RNA	28	162	136	83	417
rRNA	8	922	120	119	4,757
Single-exon transcripts	1,742	1,470	978	135	9,311
coding transcripts (NM_/XM_ )	1,742	1,470	978	135	9,311
CDSs	62,946	2,047	1,524	96	103,335
Exons	302,945	337	142	1	24,775
in coding transcripts (NM_/XM_ )	249,802	304	138	1	24,775
in non-coding transcripts (NR_/XR_ )	72,084	408	158	2	16,400
Introns	266,770	6,680	1,595	30	1,102,313
in coding transcripts (NM_/XM_ )	227,341	6,486	1,515	30	1,102,313
in non-coding transcripts (NR_/XR_ )	57,705	7,121	1,993	30	428,658

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.66	1	1	50
Number of exons per transcript	11.22	8	1	316

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 20248 coding genes, 19724 genes had a protein with an alignment covering 50% or more of the query and 16736 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
ASM325472v1	GCF_003254725.1	42.86%	32.96%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

No transcript evidence was used in this annotation

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	9,110,032,756	83%	29%	382,977
SAMEA103936001	NA	LE2041_SP (Canis lupus familiaris, SAMEA103936001)	227,475,976	89%	33%	251,374
SAMEA103936002	NA	LA2307_SP (Canis lupus familiaris, SAMEA103936002)	192,326,480	89%	34%	242,869
SAMEA2240594	24202129	Dog cerebral cortex (Canis lupus familiaris, SAMEA2240594)	178,091,798	74%	21%	221,318
SAMEA2240903	24202129	dog hypothalamus (Canis lupus familiaris, SAMEA2240903)	199,904,092	75%	21%	224,968
SAMEA4547738	NA	testes (Canis lupus familiaris, male, SAMEA4547738)	171,661,718	74%	43%	300,830
SAMEA4547739	NA	testes (Canis lupus familiaris, male, SAMEA4547739)	191,253,070	71%	37%	250,137
SAMEA4547740	NA	testes (Canis lupus familiaris, male, SAMEA4547740)	151,171,998	72%	43%	296,439
SAMEA4547741	NA	testes (Canis lupus familiaris, male, SAMEA4547741)	180,589,362	73%	44%	301,310
SAMEA4547742	NA	testes (Canis lupus familiaris, male, SAMEA4547742)	201,026,848	74%	40%	306,784
SAMEA4547743	NA	testes (Canis lupus familiaris, male, SAMEA4547743)	215,604,092	74%	41%	304,550
SAMEA4547744	NA	testes (Canis lupus familiaris, male, SAMEA4547744)	205,052,488	75%	40%	289,397
SAMEA4547745	NA	testes (Canis lupus familiaris, male, SAMEA4547745)	193,790,086	77%	44%	295,508
SAMEA4547746	NA	testes (Canis lupus familiaris, male, SAMEA4547746)	206,519,314	72%	37%	248,701
SAMN00013513	NA	skin (Canis lupus familiaris, SAMN00013513)	57,659,686	61%	13%	93,500
SAMN00013514	NA	brain (Canis lupus familiaris, SAMN00013514)	55,584,860	72%	19%	195,753
SAMN00013515	NA	liver (Canis lupus familiaris, SAMN00013515)	61,802,444	74%	26%	152,506
SAMN00013596	NA	heart (Canis lupus familiaris, SAMN00013596)	61,373,812	62%	27%	167,502
SAMN00013597	NA	ovary (Canis lupus familiaris, SAMN00013597)	62,529,276	72%	21%	198,257
SAMN00013598	NA	lung (Canis lupus familiaris, SAMN00013598)	56,686,478	75%	22%	194,481
SAMN00013599	NA	testis (Canis lupus familiaris, SAMN00013599)	68,310,732	74%	27%	261,915
SAMN00013600	NA	skeletal muscle (Canis lupus familiaris, SAMN00013600)	62,756,710	67%	29%	153,511
SAMN00013601	NA	blood (Canis lupus familiaris, SAMN00013601)	66,745,220	72%	19%	153,071
SAMN00013638	NA	kidney (Canis lupus familiaris, SAMN00013638)	69,418,268	69%	23%	180,517
SAMN00991609	NA	brain (Canis lupus familiaris, SAMN00991609)	121,022,150	83%	16%	210,727
SAMN00991610	NA	skin (Canis lupus familiaris, SAMN00991610)	106,210,838	82%	18%	189,431
SAMN00991611	NA	kidney (Canis lupus familiaris, SAMN00991611)	129,466,750	81%	20%	190,686
SAMN02460683	NA	Healthy (Canis lupus familiaris, 1-2 years old, Sexual equality, SAMN02460683)	75,544,588	78%	14%	178,868
SAMN07419779	NA	Head (Canis lupus familiaris, pooled male and female, SAMN07419779)	157,778,296	91%	29%	217,441
SAMN07419780	NA	Liver (Canis lupus familiaris, male, SAMN07419780)	165,204,016	87%	30%	199,444
SAMN07419781	NA	Heart (Canis lupus familiaris, male, SAMN07419781)	169,920,048	86%	28%	212,074
SAMN07419782	NA	Kidney (Canis lupus familiaris, male, SAMN07419782)	165,247,040	89%	27%	214,549
SAMN07419783	NA	Head (Canis lupus familiaris, male, SAMN07419783)	155,385,498	90%	24%	215,888
SAMN07419784	NA	Lung (Canis lupus familiaris, male, SAMN07419784)	149,765,290	89%	26%	211,679
SAMN07419785	NA	Liver (Canis lupus familiaris, female, SAMN07419785)	157,467,942	88%	31%	201,729
SAMN07419786	NA	Heart (Canis lupus familiaris, female, SAMN07419786)	138,571,030	86%	28%	205,300
SAMN07419787	NA	Lung (Canis lupus familiaris, female, SAMN07419787)	163,079,184	90%	28%	207,732
SAMN07419788	NA	Head (Canis lupus familiaris, female, SAMN07419788)	152,655,090	89%	24%	219,004
SAMN07419789	NA	Lung (Canis lupus familiaris, female, SAMN07419789)	162,889,082	89%	25%	212,809
SAMN07419790	NA	Liver (Canis lupus familiaris, female, SAMN07419790)	169,983,036	89%	30%	195,152
SAMN07419791	NA	Kidney (Canis lupus familiaris, female, SAMN07419791)	174,828,412	85%	25%	220,613
SAMN07419792	NA	Heart (Canis lupus familiaris, female, SAMN07419792)	189,567,254	89%	27%	215,417
SAMN07419793	NA	Head (Canis lupus familiaris, pooled male and female, SAMN07419793)	224,062,418	97%	25%	173,482
SAMN07419794	NA	Liver (Canis lupus familiaris, male, SAMN07419794)	162,730,406	89%	29%	195,354
SAMN07419795	NA	Lung (Canis lupus familiaris, male, SAMN07419795)	160,527,218	88%	25%	212,465
SAMN07419796	NA	Kidney (Canis lupus familiaris, male, SAMN07419796)	153,233,292	86%	25%	219,654
SAMN07419797	NA	Heart (Canis lupus familiaris, male, SAMN07419797)	68,761,628	87%	29%	193,322
SAMN07419798	NA	Stomach (Canis lupus familiaris, 16, male, SAMN07419798)	114,902,274	85%	25%	199,063
SAMN07419799	NA	Adult, Skin (Canis lupus familiaris, 16, male, SAMN07419799)	78,589,654	89%	29%	181,311
SAMN07419800	NA	Occipital Cortex (Canis lupus familiaris, 16, male, SAMN07419800)	74,136,114	85%	21%	193,244
SAMN07419801	NA	Right Atrium (Canis lupus familiaris, 16, male, SAMN07419801)	79,407,696	77%	28%	182,932
SAMN07419802	NA	Kidney Cortex (Canis lupus familiaris, 16, male, SAMN07419802)	77,772,854	83%	27%	191,123
SAMN07419803	NA	Spleen (Canis lupus familiaris, 16, male, SAMN07419803)	74,406,422	90%	23%	174,926
SAMN07419804	NA	Frontal Cortex (Canis lupus familiaris, 16, male, SAMN07419804)	118,597,168	88%	21%	158,739
SAMN07419805	NA	Right Ventricle (Canis lupus familiaris, 16, male, SAMN07419805)	79,060,938	67%	27%	175,511
SAMN07419806	NA	Colon (Canis lupus familiaris, 16, male, SAMN07419806)	71,697,528	85%	28%	182,606
SAMN07419807	NA	Left Atrium (Canis lupus familiaris, 16, male, SAMN07419807)	77,800,242	74%	27%	180,189
SAMN07419808	NA	Small Intestine (Canis lupus familiaris, 16, male, SAMN07419808)	88,452,288	81%	28%	184,113
SAMN07419809	NA	Thyroid Gland (Canis lupus familiaris, 16, male, SAMN07419809)	85,485,012	91%	30%	176,814
SAMN07419810	NA	Cartilage (Canis lupus familiaris, 16, male, SAMN07419810)	78,694,816	92%	32%	179,724
SAMN07419811	NA	Salivary Gland (Canis lupus familiaris, 16, male, SAMN07419811)	85,081,976	91%	33%	171,803
SAMN07419812	NA	Pituitary Gland (Canis lupus familiaris, 16, male, SAMN07419812)	75,351,736	89%	24%	208,566
SAMN07419813	NA	Lung (Canis lupus familiaris, 16, male, SAMN07419813)	63,353,092	89%	24%	183,606
SAMN07419814	NA	Bladder (Canis lupus familiaris, 16, male, SAMN07419814)	86,175,718	90%	29%	188,676
SAMN07419815	NA	Lymph Node (Canis lupus familiaris, 16, male, SAMN07419815)	88,454,666	88%	27%	195,080
SAMN07419816	NA	Cerebellum (Canis lupus familiaris, 16, male, SAMN07419816)	79,488,254	87%	22%	201,494
SAMN07419817	NA	Liver (Canis lupus familiaris, 16, male, SAMN07419817)	74,103,610	87%	34%	157,282
SAMN07419818	NA	Kidney Medulla (Canis lupus familiaris, 16, male, SAMN07419818)	79,617,748	78%	26%	178,652
SAMN07419819	NA	Adrenal Gland (Canis lupus familiaris, 16, male, SAMN07419819)	70,692,408	84%	25%	175,347
SAMN07419820	NA	Skeletal Muscle (Canis lupus familiaris, 16, male, SAMN07419820)	77,832,814	78%	35%	153,895
SAMN07419821	NA	Bone Marrow (Canis lupus familiaris, 16, male, SAMN07419821)	91,095,534	87%	25%	184,332
SAMN07419822	NA	Pancreas (Canis lupus familiaris, 16, male, SAMN07419822)	79,202,590	87%	44%	153,826
SAMN07419823	NA	Left Ventricle (Canis lupus familiaris, 16, male, SAMN07419823)	72,481,928	71%	28%	165,900
SAMN07419824	NA	Adipose Tissue (Canis lupus familiaris, 16, male, SAMN07419824)	78,553,054	90%	27%	180,120
SAMN09425111	NA	muscle (Canis lupus dingo, 8, female, SAMN09425111)	84,409,946	87%	17%	158,248
SAMN09425112	NA	ovary (Canis lupus dingo, 8, female, SAMN09425112)	107,854,300	86%	17%	146,725
SAMN09425113	NA	blood (Canis lupus dingo, 8, female, SAMN09425113)	106,045,062	86%	16%	164,922

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR351173	ERX324009	ERP003979	SAMEA2240594	178,091,798	74%	21%
ERR348232	ERX321068	ERP003979	SAMEA2240903	199,904,092	75%	21%
ERR1948878	ERX2014665	ERP019830	SAMEA4547738	171,661,718	74%	43%
ERR1948875	ERX2014662	ERP019830	SAMEA4547739	191,253,070	71%	37%
ERR1948879	ERX2014666	ERP019830	SAMEA4547740	151,171,998	72%	43%
ERR1948882	ERX2014669	ERP019830	SAMEA4547741	180,589,362	73%	44%
ERR1948880	ERX2014667	ERP019830	SAMEA4547742	201,026,848	74%	40%
ERR1948881	ERX2014668	ERP019830	SAMEA4547743	215,604,092	74%	41%
ERR1948883	ERX2014670	ERP019830	SAMEA4547744	205,052,488	75%	40%
ERR1948876	ERX2014663	ERP019830	SAMEA4547745	193,790,086	77%	44%
ERR1948877	ERX2014664	ERP019830	SAMEA4547746	206,519,314	72%	37%
ERR1894930	ERX1955326	ERP022240	SAMEA103936001	227,475,976	89%	33%
ERR1894929	ERX1955325	ERP022240	SAMEA103936002	192,326,480	89%	34%
SRR388738	SRX111064	SRP009687	SAMN00013513	19,142,014	61%	13%
SRR388751	SRX111064	SRP009687	SAMN00013513	19,248,310	61%	13%
SRR388760	SRX111064	SRP009687	SAMN00013513	19,269,362	61%	13%
SRR388737	SRX111063	SRP009687	SAMN00013514	18,561,120	73%	19%
SRR388740	SRX111063	SRP009687	SAMN00013514	18,534,772	72%	19%
SRR388766	SRX111063	SRP009687	SAMN00013514	18,488,968	72%	19%
SRR388736	SRX111062	SRP009687	SAMN00013515	20,610,750	75%	26%
SRR388741	SRX111062	SRP009687	SAMN00013515	20,625,952	74%	26%
SRR388752	SRX111062	SRP009687	SAMN00013515	20,565,742	74%	26%
SRR388743	SRX111067	SRP009687	SAMN00013596	20,459,720	62%	27%
SRR388744	SRX111067	SRP009687	SAMN00013596	20,484,644	63%	27%
SRR388761	SRX111067	SRP009687	SAMN00013596	20,429,448	62%	27%
SRR388742	SRX111066	SRP009687	SAMN00013597	20,855,202	72%	21%
SRR388756	SRX111066	SRP009687	SAMN00013597	20,767,224	71%	21%
SRR388764	SRX111066	SRP009687	SAMN00013597	20,906,850	71%	21%
SRR388754	SRX111071	SRP009687	SAMN00013598	18,843,580	75%	22%
SRR388763	SRX111071	SRP009687	SAMN00013598	18,919,384	76%	22%
SRR388765	SRX111071	SRP009687	SAMN00013598	18,923,514	75%	22%
SRR388746	SRX111069	SRP009687	SAMN00013599	22,808,422	74%	27%
SRR388755	SRX111069	SRP009687	SAMN00013599	22,766,292	75%	27%
SRR388758	SRX111069	SRP009687	SAMN00013599	22,736,018	74%	27%
SRR388745	SRX111068	SRP009687	SAMN00013600	20,922,610	67%	29%
SRR388750	SRX111068	SRP009687	SAMN00013600	20,953,276	68%	29%
SRR388762	SRX111068	SRP009687	SAMN00013600	20,880,824	67%	29%
SRR388749	SRX111070	SRP009687	SAMN00013601	22,272,552	72%	19%
SRR388753	SRX111070	SRP009687	SAMN00013601	22,162,088	72%	19%
SRR388757	SRX111070	SRP009687	SAMN00013601	22,310,580	71%	19%
SRR388734	SRX111061	SRP009687	SAMN00013638	23,137,592	69%	23%
SRR388735	SRX111061	SRP009687	SAMN00013638	23,190,688	69%	23%
SRR388747	SRX111061	SRP009687	SAMN00013638	23,089,988	69%	23%
SRR536881	SRX146606	SRP009687	SAMN00991609	41,554,898	83%	16%
SRR536883	SRX146606	SRP009687	SAMN00991609	39,271,576	83%	16%
SRR543733	SRX146606	SRP009687	SAMN00991609	40,195,676	83%	16%
SRR536884	SRX146607	SRP009687	SAMN00991610	36,522,130	82%	18%
SRR543732	SRX146607	SRP009687	SAMN00991610	35,260,454	82%	18%
SRR543734	SRX146607	SRP009687	SAMN00991610	34,428,254	82%	18%
SRR536882	SRX146608	SRP009687	SAMN00991611	42,052,242	82%	20%
SRR536885	SRX146608	SRP009687	SAMN00991611	43,037,592	82%	20%
SRR543735	SRX146608	SRP009687	SAMN00991611	44,376,916	81%	20%
SRR1051352	SRX393125	SRP034544	SAMN02460683	25,465,463	80%	6%
SRR1057059	SRX393125	SRP034544	SAMN02460683	50,079,125	77%	18%
SRR5889321	SRX3055167	SRP114662	SAMN07419779	157,778,296	91%	29%
SRR5889322	SRX3055166	SRP114662	SAMN07419780	165,204,016	87%	30%
SRR5889319	SRX3055169	SRP114662	SAMN07419781	169,920,048	86%	28%
SRR5889320	SRX3055168	SRP114662	SAMN07419782	165,247,040	89%	27%
SRR5889317	SRX3055171	SRP114662	SAMN07419783	155,385,498	90%	24%
SRR5889318	SRX3055170	SRP114662	SAMN07419784	149,765,290	89%	26%
SRR5889315	SRX3055173	SRP114662	SAMN07419785	157,467,942	88%	31%
SRR5889316	SRX3055172	SRP114662	SAMN07419786	138,571,030	86%	28%
SRR5889323	SRX3055165	SRP114662	SAMN07419787	163,079,184	90%	28%
SRR5889324	SRX3055164	SRP114662	SAMN07419788	152,655,090	89%	24%
SRR5889333	SRX3055155	SRP114662	SAMN07419789	162,889,082	89%	25%
SRR5889334	SRX3055154	SRP114662	SAMN07419790	169,983,036	89%	30%
SRR5889331	SRX3055157	SRP114662	SAMN07419791	174,828,412	85%	25%
SRR5889332	SRX3055156	SRP114662	SAMN07419792	189,567,254	89%	27%
SRR5889329	SRX3055159	SRP114662	SAMN07419793	224,062,418	97%	25%
SRR5889330	SRX3055158	SRP114662	SAMN07419794	162,730,406	89%	29%
SRR5889327	SRX3055161	SRP114662	SAMN07419795	160,527,218	88%	25%
SRR5889328	SRX3055160	SRP114662	SAMN07419796	153,233,292	86%	25%
SRR5889325	SRX3055163	SRP114662	SAMN07419797	68,761,628	87%	29%
SRR5889350	SRX3055138	SRP114662	SAMN07419798	114,902,274	85%	25%
SRR5889345	SRX3055143	SRP114662	SAMN07419799	78,589,654	89%	29%
SRR5889306	SRX3055182	SRP114662	SAMN07419800	74,136,114	85%	21%
SRR5889342	SRX3055146	SRP114662	SAMN07419801	79,407,696	77%	28%
SRR5889314	SRX3055174	SRP114662	SAMN07419802	77,772,854	83%	27%
SRR5889347	SRX3055141	SRP114662	SAMN07419803	74,406,422	90%	23%
SRR5889305	SRX3055183	SRP114662	SAMN07419804	118,597,168	88%	21%
SRR5889343	SRX3055145	SRP114662	SAMN07419805	79,060,938	67%	27%
SRR5889307	SRX3055181	SRP114662	SAMN07419806	71,697,528	85%	28%
SRR5889335	SRX3055153	SRP114662	SAMN07419807	77,800,242	74%	27%
SRR5889348	SRX3055140	SRP114662	SAMN07419808	88,452,288	81%	28%
SRR5889349	SRX3055139	SRP114662	SAMN07419809	85,485,012	91%	30%
SRR5889309	SRX3055179	SRP114662	SAMN07419810	78,694,816	92%	32%
SRR5889344	SRX3055144	SRP114662	SAMN07419811	85,081,976	91%	33%
SRR5889341	SRX3055147	SRP114662	SAMN07419812	75,351,736	89%	24%
SRR5889338	SRX3055150	SRP114662	SAMN07419813	63,353,092	89%	24%
SRR5889311	SRX3055177	SRP114662	SAMN07419814	86,175,718	90%	29%
SRR5889339	SRX3055149	SRP114662	SAMN07419815	88,454,666	88%	27%
SRR5889308	SRX3055180	SRP114662	SAMN07419816	79,488,254	87%	22%
SRR5889337	SRX3055151	SRP114662	SAMN07419817	74,103,610	87%	34%
SRR5889313	SRX3055175	SRP114662	SAMN07419818	79,617,748	78%	26%
SRR5889312	SRX3055176	SRP114662	SAMN07419819	70,692,408	84%	25%
SRR5889346	SRX3055142	SRP114662	SAMN07419820	77,832,814	78%	35%
SRR5889310	SRX3055178	SRP114662	SAMN07419821	91,095,534	87%	25%
SRR5889340	SRX3055148	SRP114662	SAMN07419822	79,202,590	87%	44%
SRR5889336	SRX3055152	SRP114662	SAMN07419823	72,481,928	71%	28%
SRR5889326	SRX3055162	SRP114662	SAMN07419824	78,553,054	90%	27%
SRR7334234	SRX4213532	SRP150475	SAMN09425111	84,409,946	87%	17%
SRR7334235	SRX4213531	SRP150475	SAMN09425112	107,854,300	86%	17%
SRR7334236	SRX4213530	SRP150475	SAMN09425113	106,045,062	86%	16%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Homo sapiens known RefSeq (NP_)	50,982	49,937 (97.95%)	49,937 (97.95%)	76.54%	83.95%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences