NCBI Sapajus apella Annotation Release 100

The RefSeq genome records for Sapajus apella were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Sapajus apella Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jan 9 2020
Date of submission of annotation to the public databases: Feb 2 2020
Software version: 8.3

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
GSC_monkey_1.0	GCF_009761245.1	Canada's Genomic Enterprise	12-17-2019	Reference	unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	GSC_monkey_1.0
Genes and pseudogenes	42,719
protein-coding	20,564
non-coding	13,219
transcribed pseudogenes	2,438
non-transcribed pseudogenes	6,350
genes with variants	12,090
immunoglobulin/T-cell receptor gene segments	148
other	0
mRNAs	62,192
fully-supported	61,057
with > 5% ab initio	542
partial	426
with filled gap(s)	1
known RefSeq (NM_)	0
model RefSeq (XM_)	62,192
non-coding RNAs	16,740
fully-supported	12,608
with > 5% ab initio	0
partial	2
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	16,397
pseudo transcripts	2,449
fully-supported	2,159
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,449
CDSs	62,340
fully-supported	61,057
with > 5% ab initio	656
partial	437
with major correction(s)	873
known RefSeq (NP_)	0
model RefSeq (XP_)	62,192

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	33,783	38,604	10,524	49	3,057,282
All transcripts	78,932	3,264	2,596	49	103,945
mRNA	62,192	3,767	3,056	153	103,945
misc_RNA	2,332	3,242	2,732	117	21,500
tRNA	343	74	73	71	84
lncRNA	10,277	1,489	755	73	27,234
snoRNA	1,154	112	104	49	330
snRNA	2,538	117	107	59	200
guide_RNA	43	160	136	83	421
rRNA	53	274	119	118	4,765
Single-exon transcripts	1,904	1,653	1,165	207	16,628
coding transcripts (NM_/XM_ )	1,900	1,653	1,165	207	16,628
non-coding transcripts (NR_/XR_ )	4	1,502	1,686	607	2,889
CDSs	62,192	2,161	1,554	75	103,704
Exons	274,829	358	142	1	22,479
in coding transcripts (NM_/XM_ )	245,179	339	142	1	20,762
in non-coding transcripts (NR_/XR_ )	45,977	402	134	2	22,479
Introns	237,946	7,083	1,771	30	1,094,810
in coding transcripts (NM_/XM_ )	218,863	6,984	1,719	30	1,094,810
in non-coding transcripts (NR_/XR_ )	34,869	7,000	2,061	30	775,343

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.35	1	1	50
Number of exons per transcript	11.8	8	1	321

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 20564 coding genes, 20109 genes had a protein with an alignment covering 50% or more of the query and 17577 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with RepeatMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
GSC_monkey_1.0	GCF_009761245.1	47.40%	35.39%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	36	36 (100.00%)	28 (77.78%)	98.92%	99.43%
Homo sapiens known RefSeq (NM_/NR_)	73,048	71,644 (98.08%)	32,771 (44.86%)	93.01%	96.29%
Homo sapiens Genbank	306,035	269,271 (87.99%)	149,485 (48.85%)	92.35%	87.92%
Homo sapiens EST	8,647,230	7,081,520 (81.89%)	5,706,346 (65.99%)	92.76%	95.40%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	7,171,560,003	96%	15%	365,452
SAMN00000006	NA	Generic sample from Callithrix jacchus (Callithrix jacchus, SAMN00000006)	264,850	88%	40%	35,996
SAMN01823467	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823467)	269,344	88%	41%	36,283
SAMN01823468	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823468)	56,674,708	76%	7%	138,655
SAMN01823469	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823469)	58,460,210	70%	7%	129,763
SAMN01823470	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823470)	59,237,818	75%	7%	139,826
SAMN01823471	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823471)	54,911,808	78%	8%	137,182
SAMN01823472	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823472)	444,125	88%	21%	36,907
SAMN01823473	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823473)	501,731	89%	26%	33,808
SAMN01823474	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823474)	1,160,916	49%	22%	30,329
SAMN01823475	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823475)	531,896	85%	21%	38,876
SAMN01823476	NA	Sample from Callithrix jacchus (Callithrix jacchus, SAMN01823476)	473,972	86%	25%	42,111
SAMN02055992	23203872	Generic sample from Callithrix jacchus (Callithrix jacchus, SAMN02055992)	269,969,905	74%	11%	142,017
SAMN02055993	23203872	Generic sample from Callithrix jacchus (Callithrix jacchus, SAMN02055993)	395,566,949	176%	15%	272,349
SAMN02055994	23203872	Generic sample from Callithrix jacchus (Callithrix jacchus, SAMN02055994)	482,802,297	177%	16%	278,776
SAMN03282360	25392405	Bone Marrow (Callithrix jacchus, female, SAMN03282360)	158,879,166	81%	9%	233,216
SAMN03282361	25392405	Brain Left Hemisphere (Callithrix jacchus, female, SAMN03282361)	101,794,204	83%	8%	227,440
SAMN03282362	25392405	Brain Pituitary (Callithrix jacchus, female, SAMN03282362)	123,148,862	80%	7%	238,664
SAMN03282363	25392405	Brain Right Hemisphere (Callithrix jacchus, female, SAMN03282363)	139,067,092	83%	8%	237,260
SAMN03282364	25392405	Colon (Callithrix jacchus, female, SAMN03282364)	132,811,174	79%	7%	217,779
SAMN03282365	25392405	Heart (Callithrix jacchus, female, SAMN03282365)	257,194,760	84%	11%	250,479
SAMN03282366	25392405	Kidney (Callithrix jacchus, female, SAMN03282366)	115,575,908	81%	10%	227,436
SAMN03282367	25392405	Liver (Callithrix jacchus, female, SAMN03282367)	116,244,712	85%	17%	192,932
SAMN03282368	25392405	Lung (Callithrix jacchus, female, SAMN03282368)	148,277,628	80%	9%	244,856
SAMN03282369	25392405	Lymph Node (Callithrix jacchus, female, SAMN03282369)	109,162,936	80%	8%	213,292
SAMN03282370	25392405	Skeletal Muscle (Callithrix jacchus, female, SAMN03282370)	112,432,592	86%	17%	186,804
SAMN03282371	25392405	Spleen (Callithrix jacchus, female, SAMN03282371)	115,423,250	82%	9%	223,598
SAMN03282407	25392405	Bone Marrow (Saimiri sciureus, SAMN03282407)	131,200,476	84%	11%	181,002
SAMN03282408	25392405	Brain Cerebellum (Saimiri sciureus, SAMN03282408)	131,643,354	81%	6%	198,434
SAMN03282409	25392405	Brain Frontal Cortex (Saimiri sciureus, SAMN03282409)	99,238,558	83%	6%	197,956
SAMN03282410	25392405	Brain Pituitary (Saimiri sciureus, SAMN03282410)	122,245,386	82%	6%	210,552
SAMN03282411	25392405	Brain Temporal Lobe (Saimiri sciureus, SAMN03282411)	104,310,768	83%	8%	200,791
SAMN03282412	25392405	Colon (Saimiri sciureus, SAMN03282412)	145,946,540	81%	10%	209,412
SAMN03282413	25392405	Heart (Saimiri sciureus, SAMN03282413)	118,758,772	86%	12%	182,344
SAMN03282414	25392405	Kidney (Saimiri sciureus, SAMN03282414)	138,529,882	82%	8%	208,397
SAMN03282415	25392405	Liver (Saimiri sciureus, SAMN03282415)	98,945,884	87%	20%	169,142
SAMN03282416	25392405	Lung (Saimiri sciureus, SAMN03282416)	127,306,832	82%	8%	208,411
SAMN03282417	25392405	Lymph Node (Saimiri sciureus, SAMN03282417)	165,719,916	81%	8%	202,397
SAMN03282418	25392405	Skeletal Muscle (Saimiri sciureus, SAMN03282418)	122,574,814	89%	18%	168,936
SAMN03282419	25392405	Spleen (Saimiri sciureus, SAMN03282419)	158,894,592	80%	7%	202,823
SAMN03659570	NA	brain (Sapajus apella, <1, male, SAMN03659570)	63,399,438	85%	13%	163,207
SAMN04875999	NA	blood (Cebus capucinus imitator, adult, male, SAMN04875999)	305,158,820	82%	13%	171,080
SAMN07495411	NA	Brain (Callithrix jacchus, neonate, female, SAMN07495411)	94,129,620	91%	20%	209,253
SAMN07495412	NA	Eye (Callithrix jacchus, neonate, female, SAMN07495412)	89,640,366	91%	25%	216,293
SAMN07495413	NA	Heart (Callithrix jacchus, neonate, female, SAMN07495413)	101,140,056	91%	24%	196,628
SAMN07495414	NA	Intestine (Callithrix jacchus, neonate, female, SAMN07495414)	121,097,052	90%	23%	208,759
SAMN07495415	NA	Kidney (Callithrix jacchus, neonate, female, SAMN07495415)	103,310,278	90%	22%	204,313
SAMN07495416	NA	Liver (Callithrix jacchus, neonate, female, SAMN07495416)	109,289,548	91%	29%	180,075
SAMN07495417	NA	Lung (Callithrix jacchus, neonate, female, SAMN07495417)	104,807,704	90%	22%	204,664
SAMN07495418	NA	Muscle (Callithrix jacchus, neonate, female, SAMN07495418)	99,785,662	91%	30%	185,057
SAMN07495419	NA	Retina (Callithrix jacchus, neonate, female, SAMN07495419)	93,759,298	91%	21%	200,977
SAMN07495420	NA	Skin (Callithrix jacchus, neonate, female, SAMN07495420)	108,943,906	91%	25%	205,996
SAMN13295021	NA	Brain Stem (Callithrix jacchus, 2 years, female, SAMN13295021)	137,357,146	91%	20%	224,181
SAMN13295022	NA	Cerebellum (Callithrix jacchus, 2 years, female, SAMN13295022)	150,874,144	90%	19%	222,280
SAMN13295023	NA	Hippocampus (Callithrix jacchus, 2 years, female, SAMN13295023)	164,254,222	91%	20%	221,242
SAMN13295024	NA	Prefrontal Cortex (Callithrix jacchus, 2 years, female, SAMN13295024)	122,424,552	91%	18%	210,434
SAMN13295025	NA	Spinal Cord (Callithrix jacchus, 2 years, female, SAMN13295025)	128,072,738	91%	21%	222,078
SAMN13295026	NA	Striatum (Callithrix jacchus, 2 years, female, SAMN13295026)	130,238,674	91%	18%	216,856
SAMN13295027	NA	Thalamus (Callithrix jacchus, 2 years, female, SAMN13295027)	111,527,404	91%	19%	219,233
SAMN13295028	NA	Visual Cortex (Callithrix jacchus, 2 years, female, SAMN13295028)	155,750,788	91%	19%	220,334

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR000078	SRX000012	SRP000006	SAMN00000006	136,244	88%	42%
SRR000079	SRX000012	SRP000006	SAMN00000006	128,606	88%	39%
SRR629458	SRX208975	SRP000006	SAMN01823467	1,907	88%	43%
SRR629464	SRX208975	SRP000006	SAMN01823467	1,282	88%	45%
SRR629467	SRX208975	SRP000006	SAMN01823467	110	94%	32%
SRR629476	SRX208975	SRP000006	SAMN01823467	136,244	88%	42%
SRR629477	SRX208975	SRP000006	SAMN01823467	547	89%	35%
SRR629480	SRX208975	SRP000006	SAMN01823467	648	85%	44%
SRR629483	SRX208975	SRP000006	SAMN01823467	128,606	88%	39%
SRR629515	SRX208980	SRP000006	SAMN01823468	56,674,708	76%	7%
SRR629519	SRX208981	SRP000006	SAMN01823469	58,460,210	70%	7%
SRR629517	SRX208976	SRP000006	SAMN01823470	59,237,818	75%	7%
SRR629520	SRX208977	SRP000006	SAMN01823471	54,911,808	78%	8%
SRR629456	SRX208979	SRP000006	SAMN01823472	1,223	88%	23%
SRR629463	SRX208979	SRP000006	SAMN01823472	217,532	87%	21%
SRR629469	SRX208979	SRP000006	SAMN01823472	1,579	88%	21%
SRR629470	SRX208979	SRP000006	SAMN01823472	215,800	88%	21%
SRR629478	SRX208979	SRP000006	SAMN01823472	3,637	85%	21%
SRR629479	SRX208979	SRP000006	SAMN01823472	4,354	87%	21%
SRR629459	SRX208978	SRP000006	SAMN01823473	2,227	89%	27%
SRR629465	SRX208978	SRP000006	SAMN01823473	3,974	89%	25%
SRR629466	SRX208978	SRP000006	SAMN01823473	242,220	89%	26%
SRR629474	SRX208978	SRP000006	SAMN01823473	242,520	89%	26%
SRR629482	SRX208978	SRP000006	SAMN01823473	5,435	88%	26%
SRR629485	SRX208978	SRP000006	SAMN01823473	5,355	88%	25%
SRR629453	SRX208973	SRP000006	SAMN01823474	524,450	48%	20%
SRR629454	SRX208973	SRP000006	SAMN01823474	636,466	49%	23%
SRR629457	SRX208974	SRP000006	SAMN01823475	3,036	88%	23%
SRR629468	SRX208974	SRP000006	SAMN01823475	298,468	83%	21%
SRR629473	SRX208974	SRP000006	SAMN01823475	4,680	89%	21%
SRR629475	SRX208974	SRP000006	SAMN01823475	2,975	88%	23%
SRR629481	SRX208974	SRP000006	SAMN01823475	1,445	87%	24%
SRR629484	SRX208974	SRP000006	SAMN01823475	221,292	88%	21%
SRR629455	SRX208972	SRP000006	SAMN01823476	5,403	86%	23%
SRR629460	SRX208972	SRP000006	SAMN01823476	2,107	88%	20%
SRR629461	SRX208972	SRP000006	SAMN01823476	210,678	87%	25%
SRR629462	SRX208972	SRP000006	SAMN01823476	4,702	87%	23%
SRR629471	SRX208972	SRP000006	SAMN01823476	4,408	87%	24%
SRR629472	SRX208972	SRP000006	SAMN01823476	246,674	86%	25%
SRR850167	SRX277320	SRP021223	SAMN02055992	269,969,905	74%	11%
SRR850168	SRX277321	SRP021223	SAMN02055993	395,566,949	176%	15%
SRR850169	SRX277322	SRP021223	SAMN02055994	482,802,297	177%	16%
SRR1758976	SRX843207	SRP051959	SAMN03282360	158,879,166	81%	9%
SRR1758977	SRX843208	SRP051959	SAMN03282361	101,794,204	83%	8%
SRR1758978	SRX843209	SRP051959	SAMN03282362	123,148,862	80%	7%
SRR1758979	SRX843210	SRP051959	SAMN03282363	139,067,092	83%	8%
SRR1758980	SRX843211	SRP051959	SAMN03282364	132,811,174	79%	7%
SRR1758981	SRX843212	SRP051959	SAMN03282365	116,932,626	84%	12%
SRR1758982	SRX843213	SRP051959	SAMN03282365	140,262,134	83%	11%
SRR1758983	SRX843214	SRP051959	SAMN03282366	115,575,908	81%	10%
SRR1758984	SRX843215	SRP051959	SAMN03282367	116,244,712	85%	17%
SRR1758985	SRX843216	SRP051959	SAMN03282368	148,277,628	80%	9%
SRR1758986	SRX843217	SRP051959	SAMN03282369	109,162,936	80%	8%
SRR1758987	SRX843218	SRP051959	SAMN03282370	112,432,592	86%	17%
SRR1758988	SRX843219	SRP051959	SAMN03282371	115,423,250	82%	9%
SRR1759033	SRX843264	SRP051959	SAMN03282407	131,200,476	84%	11%
SRR1759034	SRX843265	SRP051959	SAMN03282408	131,643,354	81%	6%
SRR1759035	SRX843266	SRP051959	SAMN03282409	99,238,558	83%	6%
SRR1759036	SRX843267	SRP051959	SAMN03282410	122,245,386	82%	6%
SRR1759037	SRX843268	SRP051959	SAMN03282411	104,310,768	83%	8%
SRR1759038	SRX843269	SRP051959	SAMN03282412	145,946,540	81%	10%
SRR1759039	SRX843270	SRP051959	SAMN03282413	118,758,772	86%	12%
SRR1759040	SRX843271	SRP051959	SAMN03282414	138,529,882	82%	8%
SRR1759041	SRX843272	SRP051959	SAMN03282415	98,945,884	87%	20%
SRR1759042	SRX843273	SRP051959	SAMN03282416	127,306,832	82%	8%
SRR1759043	SRX843274	SRP051959	SAMN03282417	165,719,916	81%	8%
SRR1759044	SRX843275	SRP051959	SAMN03282418	122,574,814	89%	18%
SRR1759045	SRX843276	SRP051959	SAMN03282419	86,751,964	80%	7%
SRR1759046	SRX843277	SRP051959	SAMN03282419	72,142,628	80%	7%
SRR2048502	SRX1046601	SRP058420	SAMN03659570	35,856,088	85%	13%
SRR2048505	SRX1046603	SRP058420	SAMN03659570	27,543,350	85%	13%
SRR3412937	SRX1718385	SRP073676	SAMN04875999	305,158,820	82%	13%
SRR5928357	SRX3088670	SRP115291	SAMN07495411	94,129,620	91%	20%
SRR5928356	SRX3088671	SRP115291	SAMN07495412	89,640,366	91%	25%
SRR5928355	SRX3088672	SRP115291	SAMN07495413	101,140,056	91%	24%
SRR5928354	SRX3088673	SRP115291	SAMN07495414	121,097,052	90%	23%
SRR5928361	SRX3088666	SRP115291	SAMN07495415	103,310,278	90%	22%
SRR5928360	SRX3088667	SRP115291	SAMN07495416	109,289,548	91%	29%
SRR5928359	SRX3088668	SRP115291	SAMN07495417	104,807,704	90%	22%
SRR5928358	SRX3088669	SRP115291	SAMN07495418	99,785,662	91%	30%
SRR5928353	SRX3088674	SRP115291	SAMN07495419	93,759,298	91%	21%
SRR5928352	SRX3088675	SRP115291	SAMN07495420	108,943,906	91%	25%
SRR10466929	SRX7158187	SRP115291	SAMN13295021	137,357,146	91%	20%
SRR10466928	SRX7158188	SRP115291	SAMN13295022	150,874,144	90%	19%
SRR10466927	SRX7158189	SRP115291	SAMN13295023	164,254,222	91%	20%
SRR10466926	SRX7158190	SRP115291	SAMN13295024	122,424,552	91%	18%
SRR10466925	SRX7158191	SRP115291	SAMN13295025	128,072,738	91%	21%
SRR10466924	SRX7158192	SRP115291	SAMN13295026	130,238,674	91%	18%
SRR10466923	SRX7158193	SRP115291	SAMN13295027	111,527,404	91%	19%
SRR10466922	SRX7158194	SRP115291	SAMN13295028	155,750,788	91%	19%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Primates GenBank	20,953	14,294 (68.22%)	14,294 (68.22%)	79.73%	90.96%
Primates known RefSeq (NP_)	14,610	11,843 (81.06%)	11,843 (81.06%)	85.71%	92.19%
Same-species GenBank	28	26 (92.86%)	26 (92.86%)	87.44%	95.20%
Homo sapiens GenBank	144,411	82,996 (57.47%)	82,996 (57.47%)	80.02%	84.06%
Homo sapiens known RefSeq (NP_)	56,065	42,750 (76.25%)	42,750 (76.25%)	86.47%	90.43%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences