Schema docsum_2005.xsd


schema location:  ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_2005.xsd
attribute form default:  unqualified
element form default:  qualified
targetNamespace:  http://www.ncbi.nlm.nih.gov/SNP/docsum
 
Elements 
Assay 
Assembly 
BaseURL 
Component 
ExchangeSet 
FxnSet 
MapLoc 
PrimarySequence 
Rs 
RsLinkout 
RsStruct 
Ss 


element Assay
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children Method Taxonomy Strains Comment Citation
used by
element ExchangeSet
attributes
Name  Type  Use  Default  Fixed  Annotation
handle  xsd:string      
batch  xsd:string      
batchId  xsd:int      
batchType  derived by: xsd:string      
molType  derived by: xsd:string      
sampleSize  xsd:int      
population  xsd:string      
linkoutUrl  xsd:string      

element Assay/Method
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
children Exception
attributes
Name  Type  Use  Default  Fixed  Annotation
name  xsd:string      
documentation
Submitters method identifier
Id  xsd:string      
documentation
dbSNP method identifier

element Assay/Method/Exception
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
content simple
annotation
documentation
description of deviation from/addition to given method

element Assay/Taxonomy
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
id  xsd:intrequired      
documentation
NCBI taxonomy ID for variation
organism  xsd:string      

element Assay/Strains
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc unbounded
content simple

element Assay/Comment
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc 1
content simple

element Assay/Citation
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc unbounded
content simple

element Assembly
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children Component SnpStat
used by
element Rs
attributes
Name  Type  Use  Default  Fixed  Annotation
dbSnpBuild  xsd:intrequired      
documentation
dbSNP build number defining the rsid set aligned to this assembly
genomeBuild  xsd:stringrequired      
documentation
assembly build number with possible 'subbuild' version numbers to reflect updates in gene annotation (human e.g. 34_3, 35_1, 36_1)
groupLabel  xsd:stringoptional      
documentation
High-level classification of the assembly to distinguish reference projects from alternate solutions. GroupLabel field from organism/build-specific ContigInfo tables. "reference" is occasionally used as the preferred assembly; standards will converge as additional organism genome projects are finished. Note that some organism assembly names include extended characters like '~' and '/' that may be incompatible with OS filename conventions.
assemblySource  xsd:stringoptional      
documentation
Name of the group(s) or organization(s) that generated the assembly
current  xsd:boolean      
documentation
Marks the current genomic assembly
reference  xsd:boolean      
annotation
documentation
A collection of genome sequence records (curated gene regions (NG's), contigs (NWNT's)  and chromosomes (NC/AC's) produced by a genome sequence project. Structure is populated from ContigInfo tables.

element Assembly/SnpStat
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
mapWeight  derived by: xsd:stringrequired      
documentation
summary measure of placement precision in the assembly
chromCount  xsd:intoptional      
documentation
number of distinct chromosomes in the mapset
placedContigCount  xsd:intoptional      
documentation
number of distinct contigs [ gi | accession[.version] ] in the mapset
unplacedContigCount  xsd:intoptional      
documentation
number of sequence postions to a contig with unknown chromosomal assignment
seqlocCount  xsd:intoptional      
documentation
total number of sequence positions in the mapset
hapCount  xsd:intoptional      
documentation
Number of hits to alternative genomic haplotypes (e.g. HLA DR region, KIR, or pseudo-autosomal regions like PAR) within the assembly mapset. Note that positions on haplotypes defined in other assemblies (a different assembly_group_label value) will not be counted in this value.

element BaseURL
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type extension of xsd:string
properties
content complex
used by
element ExchangeSet
attributes
Name  Type  Use  Default  Fixed  Annotation
urlId  xsd:int      
documentation
Resource identifier from dbSNP_main.baseURL.
resourceName  xsd:string      
documentation
Name of linked resource
resourceId  xsd:string      
documentation
identifier expected by resource for URL
annotation
documentation
URL value from dbSNP_main.BaseURL links table. attributes provide context information and URL id that is referenced within individual refSNP objects.

element Component
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children MapLoc
used by
element Assembly
attributes
Name  Type  Use  Default  Fixed  Annotation
componentType  derived by: xsd:string      
documentation
type of component: chromosome, contig, gene_region, etc.
ctgId  xsd:int      
documentation
dbSNP contig_id used to join on contig hit / mapset data to these assembly properties
accession  xsd:string      
documentation
Accession[.version] for the sequence component
name  xsd:string      
documentation
contig name defined as either a submitter local id, element of a whole genome assembly set, or internal NCBI local id
chromosome  xsd:string      
documentation
Organism appropriate chromosome tag, 'Un' reserved for default case of unplaced components
start  xsd:int      
documentation
component starting position on the chromosome (base 0 inclusive)
end  xsd:int      
documentation
component ending position on the chromosome (base 0 inclusive)
orientation  derived by: xsd:string      
documentation
orientation of this component to chromosome, forward (fwd) = 0, reverse (rev) = 1, unknown = NULL in ContigInfo.orient.
gi  xsd:string      
documentation
NCBI gi for component sequence (equivalent to accession.version) for nucleotide sequence.
groupTerm  xsd:string      
documentation
Identifier label for the genome assembly that defines the contigs in this mapset and their placement within the organism genome.
contigLabel  xsd:string      
documentation
Display label for component

element ExchangeSet
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children SourceDatabase Rs Assay Query Summary BaseURL
attributes
Name  Type  Use  Default  Fixed  Annotation
setType  xsd:string      
documentation
set-type: full dump; from query; single refSNP
setDepth  xsd:stringoptional      
documentation
content depth: brief XML (only refSNP properties and summary subSNP element content); full XML (full refSNP, full subSNP content; all flanking sequences)
specVersion  xsd:string      
documentation
version number of docsum.asn/docsum.dtd specification
dbSnpBuild  xsd:int      
documentation
build number of database for this export
generated  xsd:string      
documentation
Generated date
annotation
documentation
Set of dbSNP refSNP docsums

element ExchangeSet/SourceDatabase
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
taxId  xsd:intrequired      
documentation
NCBI taxonomy ID for variation
organism  xsd:stringrequired      
documentation
common name for species used as part of database name.
dbSnpOrgAbbr  xsd:string      
documentation
organism abbreviation used in dbSNP.
gpipeOrgAbbr  xsd:string      
documentation
organism abbreviation used within NCBI genome pipeline data dumps.

element ExchangeSet/Query
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
minOcc 0
maxOcc 1
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
date  xsd:string      
documentation
yyyy-mm-dd
string  xsd:string      
documentation
Query terms or search constraints

element ExchangeSet/Summary
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
numRsIds  xsd:integeroptional      
documentation
Total number of refsnp-ids in this exchange set
totalSeqLength  xsd:integeroptional      
documentation
Total length of exemplar flanking sequences
numContigHits  xsd:integeroptional      
documentation
Total number of contig locations from SNPContigLoc
numGeneHits  xsd:integeroptional      
documentation
Total number of locus ids from SNPContigLocusId
numGiHits  xsd:integeroptional      
documentation
Total number of gi hits from MapLink
num3dStructs  xsd:integeroptional      
documentation
Total number of 3D structures from SNP3D
numAlleleFreqs  xsd:integeroptional      
documentation
Total number of allele frequences from SubPopAllele
numStsHits  xsd:integeroptional      
documentation
Total number of STS hits from SnpInSts
numUnigeneCids  xsd:integeroptional      
documentation
Total number of unigene cluster ids from UnigeneSnp

element FxnSet
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
used by
element MapLoc
attributes
Name  Type  Use  Default  Fixed  Annotation
geneId  xsd:int      
documentation
gene-id of gene as aligned to contig
symbol  xsd:string      
documentation
symbol (official if present in Entrez Gene) of gene
mrnaAcc  xsd:string      
documentation
mRNA accession if variation in transcript
mrnaVer  xsd:int      
documentation
mRNA sequence version if variation is in transcripot
protAcc  xsd:string      
documentation
protein accession if variation in protein
protVer  xsd:int      
documentation
protein version if variation is in protein
fxnClass  derived by: xsd:string      
readingFrame  xsd:int      
allele  xsd:string      
documentation
variation allele: * suffix indicates allele of contig at this location
residue  xsd:string      
documentation
translated amino acid residue for allele
aaPosition  xsd:int      
documentation
position of the variant residue in peptide sequence
annotation
documentation
functional relationship of SNP (and possibly alleles) to genes at contig location as defined in organism-specific bxxx_SNPContigLocusId_xxx tables.

element MapLoc
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children FxnSet
used by
elements Component PrimarySequence
attributes
Name  Type  Use  Default  Fixed  Annotation
asnFrom  xsd:integerrequired      
documentation
beginning of variation as feature on contig
asnTo  xsd:integerrequired      
documentation
end position of variation as feature on contig
locType  derived by: xsd:stringrequired      
documentation
defines the seq-loc symbol if asn_from != asn_to
alnQuality  xsd:doubleoptional      
documentation
alignment qualiity
orient  derived by: xsd:stringoptional      
documentation
orientation of refSNP sequence to contig sequence
physMapInt  xsd:int      
documentation
chromosome position as integer for sorting
leftFlankNeighborPos  xsd:int      
documentation
nearest aligned position in 5' flanking sequence of snp
rightFlankNeighborPos  xsd:int      
documentation
nearest aligned position in 3' flanking sequence of snp
leftContigNeighborPos  xsd:int      
documentation
nearest aligned position in 5' contig alignment of snp
rightContigNeighborPos  xsd:int      
documentation
nearest aligned position in 3' contig alignment of snp
numberOfMismatches  xsd:int      
documentation
number of Mismatched positions in this alignment
numberOfDeletions  xsd:int      
documentation
number of deletions in this alignment
numberOfInsertions  xsd:int      
documentation
number of insetions in this alignment
annotation
documentation
Position of a single hit of a variation on a contig

element PrimarySequence
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children MapLoc
used by
element Rs
attributes
Name  Type  Use  Default  Fixed  Annotation
dbSnpBuild  xsd:intrequired      
gi  xsd:intrequired      
source  derived by: xsd:string      
accession  xsd:string      

element Rs
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children Het Validation Create Update Sequence Ss Assembly PrimarySequence RsStruct RsLinkout MergeHistory
used by
element ExchangeSet
attributes
Name  Type  Use  Default  Fixed  Annotation
rsId  xsd:intrequired      
documentation
refSNP (rs) number
snpClass  derived by: xsd:stringrequired      
snpType  derived by: xsd:stringrequired      
molType  derived by: xsd:stringrequired      
validProbMin  xsd:integeroptional      
documentation
minimum reported success rate of all submissions in cluster
validProbMax  xsd:integeroptional      
documentation
maximum reported success rate of all submissions in cluster
genotype  xsd:booleanoptional      
documentation
at least one genotype reported for this refSNP
annotation
documentation
defines the docsum structure for refSNP clusters, where a refSNP cluster (rs) is a grouping of individual dbSNP submissions that all refer to the same variation. The refsnp provides a single unified record for annotation of NCBI resources such as reference genome sequence.

element Rs/Het
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
minOcc 0
maxOcc 1
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
type  derived by: xsd:stringrequired      
documentation
Est=Estimated average het from allele frequencies, Obs=Observed from genotype data
value  xsd:floatrequired      
documentation
Heterozygosity
stdError  xsd:float      
documentation
Standard error of Het estimate

element Rs/Validation
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
children otherPopBatchId twoHit2AlleleBatchId
attributes
Name  Type  Use  Default  Fixed  Annotation
byCluster  xsd:boolean      
documentation
at least one subsnp in cluster has frequency data submitted
byFrequency  xsd:boolean      
documentation
cluster has 2+ submissions, with 1+ submissions assayed with a non-computational method
byOtherPop  xsd:boolean      
by2Hit2Allele  xsd:boolean      
documentation
cluster has 2+ submissions, with 1+ submissions assayed with a non-computational method
byHapMap  xsd:boolean      
documentation
TBD

element Rs/Validation/otherPopBatchId
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:int
properties
isRef 0
minOcc 0
maxOcc unbounded
content simple
annotation
documentation
dbSNP batch-id's for other pop snp validation data.

element Rs/Validation/twoHit2AlleleBatchId
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:int
properties
isRef 0
minOcc 0
maxOcc unbounded
content simple
annotation
documentation
dbSNP batch-id's for double-hit snp validation data. Use batch-id to get methods, etc.

element Rs/Create
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
build  xsd:int      
documentation
build number when the cluster was created
date  xsd:string      
documentation
yyyy-mm-dd
annotation
documentation
date the refsnp cluster was instantiated

element Rs/Update
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
minOcc 0
maxOcc 1
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
build  xsd:int      
documentation
build number when the cluster was updated
date  xsd:string      
documentation
yyyy-mm-dd
annotation
documentation
most recent date the cluster was updated (member added or deleted)

element Rs/Sequence
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
children Seq5 Observed Seq3
attributes
Name  Type  Use  Default  Fixed  Annotation
exemplarSs  xsd:intrequired      
documentation
dbSNP ss# selected as source of refSNP flanking sequence, ss# part of ss-list below

element Rs/Sequence/Seq5
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc 1
content simple
annotation
documentation
5' sequence that flanks the variation

element Rs/Sequence/Observed
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
content simple
annotation
documentation
list of all nucleotide alleles observed in ss-list members, correcting for reverse complementation of memebers reported in reverse orientation

element Rs/Sequence/Seq3
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc 1
content simple
annotation
documentation
3' sequence that flanks the variation

element Rs/MergeHistory
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
minOcc 0
maxOcc unbounded
content complex
attributes
Name  Type  Use  Default  Fixed  Annotation
rsId  xsd:intrequired      
documentation
previously issued rs id whose member assays have now been merged
buildId  xsd:int      
documentation
build id when rs id was merged into parent rs
orientFlip  xsd:boolean      
documentation
TRUE if strand of rs id is reverse to parent object's current strand

element RsLinkout
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
used by
element Rs
attributes
Name  Type  Use  Default  Fixed  Annotation
resourceId  xsd:stringrequired      
documentation
BaseURLList.url_id
linkValue  xsd:stringrequired      
documentation
value to append to ResourceURL.base-url for complete link
annotation
documentation
link data for another resource

element RsStruct
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
used by
element Rs
attributes
Name  Type  Use  Default  Fixed  Annotation
protAcc  xsd:string      
documentation
accession of the protein with variation
protGi  xsd:int      
documentation
GI of the protein with variation
protLoc  xsd:int      
documentation
position of the residue for the protein GI
protResidue  xsd:string      
documentation
residue specified for protein at prot-loc location
rsResidue  xsd:string      
documentation
alternative residue specified by variation sequence
structGi  xsd:int      
documentation
GI of the structure neighbor
structLoc  xsd:int      
documentation
position of the residue for the structure GI
structResidue  xsd:string      
documentation
residue specified for protein at struct-loc location
annotation
documentation
structure information for SNP

element Ss
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
content complex
children Sequence
used by
element Rs
attributes
Name  Type  Use  Default  Fixed  Annotation
ssId  xsd:intrequired      
documentation
dbSNP accession number for submission
handle  xsd:stringrequired      
documentation
Tag for the submitting laboratory
batchId  xsd:intrequired      
documentation
dbSNP number for batch submission
locSnpId  xsd:string      
documentation
submission (ss#)
submitter ID
subSnpClass  derived by: xsd:string      
documentation
SubSNP classification by type of variation
orient  derived by: xsd:string      
documentation
orientation of refsnp cluster members to refsnp cluster sequence
strand  derived by: xsd:string      
documentation
strand is defined as TOP/BOTTOM by nature of flanking nucleotide sequence
molType  derived by: xsd:string      
documentation
moltype from Batch table
buildId  xsd:int      
documentation
dbSNP build number when ss# was added to a refSNP (rs#) cluster
methodClass  derived by: xsd:string      
documentation
class of method used to assay for the variation
validated  derived by: xsd:string      
linkoutUrl  xsd:string      
documentation
append loc-snp-id to this base URL to construct a pointer to submitter data.
annotation
documentation
data for an individual submission to dbSNP

element Ss/Sequence
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
properties
isRef 0
content complex
children Seq5 Observed Seq3

element Ss/Sequence/Seq5
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc 1
content simple
annotation
documentation
5' sequence that flanks the variation

element Ss/Sequence/Observed
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
content simple
annotation
documentation
list of all nucleotide alleles observed in ss-list members, correcting for reverse complementation of memebers reported in reverse orientation

element Ss/Sequence/Seq3
diagram
namespace http://www.ncbi.nlm.nih.gov/SNP/docsum
type xsd:string
properties
isRef 0
minOcc 0
maxOcc 1
content simple
annotation
documentation
3' sequence that flanks the variation


XML Schema documentation generated by
XMLSpy Schema Editor http://www.altova.com/xmlspy