U.S. flag

An official website of the United States government

Get Started in Athena

Setup

To get started with Athena, you will need an Amazon AWS account. Please follow the AWS-provided tutorial to become familiar with:

Please make sure you create your bucket for saving results in the US-east-1 region.
We recommend using AWS Glue to create the tables from the bucket. In order to create the tables, you need to include the S3 location of the metadata. SRA provides data in two different locations:

  1. Coronaviridae dataset in the AWS Public Dataset Program: s3://sra-pub-sars-cov2-metadata-us-east-1/
  2. Entire SRA metadata: s3://sra-pub-metadata-us-east-1

AWS Glue does have a small charge associated with it, based on the number of tables in the catalog and the amount of time it takes to run the crawler to find all the datasets. The crawler charge will generally be less than $1.
With the AWS Glue Data Catalog, you can store up to a million objects for free. An object in the AWS Glue Data Catalog is a table, table version, partition, or database. The first million access requests to the AWS Glue Data Catalog per month are free.
Alternatively, you can opt to manually create a database yourself and add the tables.
You can find the table definitions here: SRA Cloud Based Table Definitions.

The table S3 locations

These data can be accessed using the command line interface with the --no-sign-request option, see examples below.

For all SRA metadata

  • metadata: aws s3 ls s3://sra-pub-metadata-us-east-1/sra/metadata/ --no-sign-request
  • taxonomy analysis: aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis/ --no-sign-request
  • tax analysis info: aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis_info/ --no-sign-request
  • taxonomy: aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/taxonomy/ --no-sign-request
  • kmer: aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/kmer/ --no-sign-request

For the Coronaviridae specific dataset

  • annotated variations: aws s3 ls s3://sra-pub-sars-cov2-metadata-us-east-1/annotated_variations/ --no-sign-request

Access methods

We recommend first using the Athena query editor to become familiar with writing queries before attempting to use the command line tools or client libraries.

Athena can be accessed through a web browser query editor:
https://console.aws.amazon.com/athena/.
 

Athena client library documentation is also available for reference if you plan to access it through the AWS CLI or one of the supported SDKs:
https://docs.aws.amazon.com/cli/latest/reference/athena/.
 

AWS command line tools can be downloaded and set up from here:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html.
 

Please see the AWS documentation for more information on these options.

Payment

The user pays to run queries against public data sets and for storage of results in S3. We recommend all users review the payment requirements for on-demand queries from Athena.

Engage

NCBI wants your feedback on SRA in the Cloud. Contact sra@ncbi.nlm.nih.gov with questions or if you would like to provide input on new functionality.

Support Center

Last updated: 2023-07-25T17:42:28Z