GEOarchive submission instructions
Starting in January 2025, GEO will no longer accept SAGE submissions. Please contact GEO if you have any questions.
- GEOarchive format
- How to submit
- GEOarchive templates and examples
- Microarray
- Affymetrix
- Agilent
- Nimblegen
- Illumina
- Generic
- Platform only
- High-throughput sequencing
- Other data types
- NanoString (nCounter: RCC raw data files)
- Xenium
- MERFISH
- RT-PCR
- Traditional SAGE
- Microarray
- Notes for Microsoft Excel users
GEOarchive formatBack to top
GEOarchive is a flexible spreadsheet-based submission format useful for batch deposit of experiments. GEOarchive submissions can be created in any spreadsheet software, usually Microsoft Excel.
A GEOarchive submission consists of several parts as follows:
Metadata spreadsheet | 'Metadata' refers to descriptive information and protocols for the overall experiment and individual Samples. This information is supplied by completing all fields of the appropriate metadata spreadsheet template which can be downloaded from the GEOarchive templates and examples section below. |
---|---|
Matrix table | The matrix table is a spreadsheet containing the final, normalized values that are comparable across rows and Samples, and preferably processed as described in any accompanying manuscript. A complete data matrix should be supplied, not a summary subset. It is possible to include additional data columns in the table, for example, Affymetrix Detection calls and P-values, or background or flag columns. See the Affymetrix template for an example. |
Raw data files |
In addition to the normalized data provided in the Matrix table, submitters are required to
provide raw data, usually in the form of supplementary raw data files. This facilitates the
unambiguous interpretation of the data and potential verification of the conclusions as
described in the MIAME and MINSEQE standards. Affymetrix submissions must include CEL files. Non-Affymetrix GEOarchive submissions should include the original software-generated scan quantification files, for example, GenePix GPR files. Next-generation sequence submissions must include files containing reads and quality scores. |
Platform |
If your experiments are performed using a commercial array (e.g., Affymetrix GeneChip) or other array already deposited in
GEO, please use the
FIND PLATFORM
tool to find the GEO accession number (GPLxxxx) for inclusion in the 'platform'
column in the SAMPLES section of the metadata spreadsheet. If your array does not
already exist in GEO, please include a PLATFORM section in your metadata
spreadsheet and include Platform annotation columns in your matrix table. The Platform data must include meaningful, trackable, sequence identifiers (e.g. GenBank/RefSeq accessions, locus tags, clone IDs, oligo sequences, chromosome locations, etc - see the Platform content guidelines for full list). References to in-house databases or top BLAST hits are not sufficient. Platform submission is not necessary for SAGE or next-generation sequence submissions. |
How to submitBack to top
Bundle all parts (Excel file containing the metadata spreadsheet and matrix spreadsheet, raw data files) together into a .zip, .rar, or .tar archive using a program like WinZip. There are two options to transfer the resulting archive to GEO:
- Use the web form to Submit microarray or additional files to GEO.
- Use FTP for large submissions, see detailed instructions here.
GEOarchive templates and examplesBack to top
The first step in creating your GEOarchive submission is to download the appropriate template (Excel spreadsheet) from the list below. Each Excel file consists of several worksheets, including a metadata template, and examples of metadata and matrix tables. Click the tabs at the bottom of the worksheet window to switch between worksheets. Mouse over field names in the templates to view content guidelines.
MicroarrayBack to top
For the following microarray vendors, please download templates from the vendor-specific instructions pages:
For microarrays not from the vendors above, please use a 'Generic' template. For generic microarray submissions where the Platform is already deposited in GEO, please download the most appropriate template:
- Generic single channel submission template
- Generic dual channel submission template
- Generic merged dye-swap submission template
- Generic tiling ChIP-chip submission template
For generic microarray submissions where the Platform is not deposited in GEO, please download the most appropriate template:
- Generic single channel submission template, including Platform
- Generic dual channel submission template, including Platform
- Generic merged dye-swap submission template, including Platform
- Generic tiling ChIP-chip submission template, including Platform
To submit only a Platform, please download the following template (this option is appropriate only if you have no hybridization or sequence data to deposit):
High-throughput sequencingBack to top
For high-throughput sequence submissions, please refer to full instructions at:
Other data typesBack to top
For Xenium, MERFISH, NanoString nCounter®, or NanoString GeoMx® Digital Spatial Profiling studies with raw data in RCC format, please use one of the 'Generic single channel' templates as appropriate:
- Generic single channel submission template
- Generic single channel submission template, including Platform
For NanoString GeoMx® Digital Spatial Profiling studies with raw data in fastq format, please submit your study using GEO's high throughput sequence data submission instructions.
For high-throughput RT-PCR submissions, please refer to full instructions at:
For traditional SAGE submissions, please refer to full instructions at:
Notes for Microsoft excel usersBack to top
The following notes draw attention to common Excel-related problems.
-
Please be aware that Excel may automatically apply irreversible formatting to your data. According to Microsoft support:
- If a number contains a slash mark (/) or hyphen (-), it may be converted to a date format.
- If a number contains a colon (:), or is followed by a space and the letter A or P, it may be converted to a time format.
- If a number contains the letter E (in uppercase or lowercase letters; for example, 10e5), or the number contains more characters than can be displayed based on the column width and font, the number may be converted to scientific notation, or exponential, format.
- If a number contains leading zeros, the leading zeros are dropped.
Certain clone identifiers, gene names, and plate coordinates are particularly susceptible to these issues. To avoid the problem, make sure to first select the whole spreadsheet and Format -> Cells -> Number -> Text when pasting data into Excel (the default is "General"). For more information, see Zeeberg et al., 2004. - If you Format -> Cells -> Number -> Text as described above, very long data strings (e.g., sequence data) may be converted to hash (#) characters. If this occurs, it is necessary to switch these cells back to "General" format.