A single nucleotide polymorphism object. These are unique SNP clusters mapping
to the human genome assembly, corresponding to dbSNP rs# records which are
submitted to the DCC by the dbSNP group. The XML is based largely on the dbSNP
reference SNP records, except significant bits of dbSNP's model were not used since
they are not directly relevant to this project (e.g. dbSNP internal tracking) and/or are
redundant and can be fetched on demand if need be. Main child elements and attributes:
-lsid: The Life Science Identifier assigned to the SNP by NCBI's dbSNP, the authority
on the reference SNP records selected for genotyping. See LSID details page /docs/lsid_details.html
for more on LSIDs and their role in the project. Note that refSNP records have version
numbers corresponding to dbSNP builds; these will be tracked in the DCC database.
-snp_class: The category or class assigned to the SNP based on analysis done by the
dbSNP group. See SNP categories page /docs/snp_categories.html for details on each
category. The class signifies how desirable it is to genotype this particular SNP as opposed
to a SNP in another category (e.g. a group would much rather genotype a 'verified' SNP than
a 'bac-overlap' one). Note that only the 'highest' category is assigned to a SNP, even though
it may fall in more than one category.
-sequence: 5' and 3' sequences flanking the variation and the variation itself. The flank
sequences have standard IUPAC ambiguity codes embedded in them to signify the presence
of neighbor SNPs, if any. These SNPs, plus any others whose presence cannot easily be
indicated in the sequence, appear in the neighbor_snps list (see below).
-genomic_locations: one or more set of coordinates on the NCBI reference genome assembly.
The notation for each set follows closely the NCBI attributes for a SNP in their FTP dataset
releases in terms of the kind of locations that can appear ('exact', 'range' etc.), strandedness etc.
-neighbor_snps: a list of LSIDs that point to other SNP records that appear in either flank for the
current SNP record. These can be either SNPs selected for the HapMap project or other SNPs
(category 'non-validated'). The purpose of the neighbor list is to enable groups to design assays
from the SNPs while taking into account other SNPs that may reside in the flanks (i.e. the 'neighbors').
Minimal snp container