Chromosome coordinates for this commercial array were determined idependently using Sanger and NCBI. There are about 500 out of 5,000 spots that are not used in our final calculations. In both datasets, about 170 spots the majority of which were the same spots were flagged during the data extraction process. There are also about 330 spots that are either controls or those with names that cannot be matched in the pombe gene database. Due to the fact that pombe genes contain a significant number of introns, the pombe gene database gives coordinates for each exon. However, the database does not distinguish between different exons in the nomenclature. Consequently, there are multiple entries of coordinate information for a given ORF sharing the same name. For example, the following ORF is listed twice in the pombe genome database with different coordinates (we calculated the midpoint ourselves).
Our program uses the first entry (39.6395) as the coordinate for a given ORF. In hindsight, we should have taken the average of all the exons. As it stands, our coordinates do contain a 5' bias for every ORF. However, we assessed the difference between the midpoint of the first exon and the midpoint of the whole ORF and the overwhelming majority of the ORFs show less than 2kb difference with only 34 out of 4,500 greater than 2kb.