Support for Genome Workbench will end on March 31 2024. You may still use the application, but supporting documentation will not be available after this date. Read more.
Working with BAM Files
- Introduction
- Step 2: BAM file with index file
- Step 3: Viewing BAM Data
- Step 4: BAM file with no index file
- Step 5: BAM data for SRA run accessions
- Step 6: Export BAM file as a table
Step 1: Introduction
BAM files can be opened from remote locations (ftp, http) and from local computers. For viewing BAM files, an index file must be found in the same directory as the BAM file. The index should be named by appending “.bai” to the BAM file name. If there is no index file, you can use SAMTools to create one (please download SAMTools from http://samtools.sourceforge.net and install locally).
BAM data that is aligned to an assembly can be viewed as run accessions from the SRA database. To find aligned data, search the SRA database with a query that includes the parameter “aligned data"[Properties], for example:((("mus musculus"[Organism]) AND BALB/c) AND "lymph") AND "rna seq"[Strategy] AND “aligned data"[Properties].
This tutorial will take you through several scenarios to view BAM files in Genome Workbench:
- A sorted BAM file with index file
- A unsorted BAM file with no index file (requires SAMTools)
- SRR13020989 – SRA run accession for data originated from GenBank
Since BAM files can be VERY large, they are not loaded entirely into the Genome Workbench project as other types of data and are accessed externally. Example files for this tutorial can be downloaded here (note the file is large ~356MB):
To see an example for a BAM file from a remote (ftp) location, please check BAM haplotype filtering tutorial. For more information on BAM files, see the BAM file FAQ section and Import BAMs video tutorial. For more information on SRR run accessions of GenBank data (and ERR/DRR run accessions for data originating from EMBL-EBI/DDBJ correspondingly) see SRA knowledge base page. See this video for information about viewing BAM files on the web GDV browser (note, this data will not be secure).
If you want to display de-novo BAM files that are aligned to novel (non-NCBI) genomes, you will need to first import a FASTA file with the sequences referred to in the BAM file and then import BAM file into the same project. For more information please refer to the Displaying de-novo BAM files tutorial.
Step 2: BAM file with index file
From the File menu choose Open and select BAM/CSRA files from the left side.
Select button on the right that says Add BAM/CSRA file. Navigate to the BAM Test Files folder you downloaded, select scenario1_with_index, select file mapt.NA12156.altex.bam and click Open. Click Next three times (skip mapping dialog, since this data has mapped already) and then click Finish.
Now there is a 'New Project' in the Project Tree View. Double click on NT_ accession to open the Open View dialog. Select the Graphical Sequence View and see that the graphical view opens in a new tab (if the record has been updated you might see a warning message, click the OK button to close it). You can optionally open the NC_ accession to see the bam file mapped to the whole chromosome.
If you do not see the alignments in the graphical view tab, you will have to turn them on by clicking on the Context Menu (see figure below) and choosing Alignments. You might also select Graphs to see the coverage graph of this data listed among the graph tracks as well. Another way to find the track you just uploaded is to open the Configure Tracks dialog using the Gear icon and search by the track name/partial name (mapt.NA12156.altex) among All Tracks.
Step 3: Viewing BAM Data
In the graphical view, in the alignment tracks section, you should have a track titled "mapt.NA12156.altex, Coverage graph – log 2 scaled". All the standard Genome Workbench navigation tools are available for panning and zooming (see Basic Operation tutorial).
Double clicking on the coverage graph track will open the Graph Rendering Options dialog where the rendering style, graph scale, color, etc. can be adjusted.
If you zoom in far enough, you will see the coverage graph for the alignment track turn into a pileup graph and individual alignment features will become visible. (Note: coverage graph for Graph tracks always represents coverage graph and is not very informative at high zoom levels).
Mouse over the track name to make track settings located at the right of the track visible. You can adjust these settings if you wish. If you zoom in to the sequence level, you will see reads aligned to the anchor sequence with insertions and mismatches highlighted.
Pointing a mouse to an individual alignment feature will open a tooltip with a lot of useful information about the alignment, including the CIGAR string, percent identity, and coverage.
Step 4: BAM file with no index file
This exercise requires the use of SAMTools, a freely available package for working with BAM files. Download and expand the package and put it in a convenient folder/directory.
Then the steps are similar to scenario 1. From the File menu, choose Open and select BAM files from the left side of the dialog. Select button on the right that says Add a BAM file. Navigate to the BAM Test Files folder you have downloaded, select scenario2_no_index_unsorted_need_id_mapping and file GSM409307_UCSD.H3K4me1.bam, and click Next. You will see the dialog shown where Genome Workbench will ask where to find the SAMTools executable.
When you navigate to SAMTools on your computer click Open and then Next. New dialog appears asking about mapping the file to sequences.
In order to view the BAM file, the project must contain the sequences (e.g. accessions or chromosomes/scaffolds) that are referred to in the BAM file. Genome Workbench automatically finds sequences from NCBI that are referenced by GenBank or RefSeq accessions. If the BAM file uses a different style of sequence identifiers, the map assembly function allows Genome Workbench to convert them into NCBI assembly identifiers.
This example requires mapping, since reference accessions in the bam file are not typical CenBank/RefSeq accessions. (Note: in case you want to check what sequences are referenced in the BAM file, you can click on Next button in the mapping dialog and see it, and then use Back button to get back to the mapping dialog).
In our test file, the chromosomes are named chr1, chr2, etc (the UCSC style of sequence identifiers), so we need to map them to the corresponding GenBank/RefSeq accessions for the particular assembly.
Add a checkmark to the Use Mapping check box, click on the Find Assembly button, and in the Select Assembly dialog type hg18. Then click on the Find Assembly button, select the RefSeq radio button, select NCBI36 (hg18) in the table, and click pn the OK button.
See that mapping information was added to the Open BAM dialog.
Click Next and see RefSeq accessions instead of chr1, chr2, etc:
All accessions are selected by default, but you can unselect/select any of them and only the selected accessions will be added to the new project. Click Next and Finish.
SAMTools can take couple of minutes to process this data. You can see your progress in the task view window.
Once it is finished, a new project with BAM data will be created in the Project Tree View.
Open any molecules that are in the project in the Graphical Sequence View and see the BAM alignment track among the Alignments tracks. All the standard Genome Workbench tracks settings and navigation tools are available (see Basic Operation tutorial and scenario 1/step 3 of this tutorial). If you zoom in far enough, you will see the pileup graph and individual reads aligned to the anchored reference sequence.
Step 5: BAM data for SRA run accessions
Run accessions from the SRA database can be visualized in Genome Workbench if they are aligned to a GenBank or RefSeq sequence. To find such data, search for example query ("SARS-CoV-2"[Organism]) AND “aligned data"[Properties] on the NCBI SRA page.
You can view an SRA run accession (could be SRR, ERR or DRR) to Genome Workbench. As an example, we selected experiment SRX9471388, run SRR13020989.
Paste SRA run accession into the Open BAM dialog:
Click Next and see that Genome Workbench has detected that the BAM file references MN908947 (a GenBank accession).
Click Next and Finish. See that a new Genome Workbench project has been created. Open the Graphical View and see the BAM alignment track. Zoom in/out, zoon to the sequence level, and adjust track settings if desired.
Step 6: Export BAM file as a table
From the Graphical Sequence Viewer, zoom to the desired location and select a range of interest.
Right-click the selected range and click the Export Data option in the context menu.
An alignment export menu will be opened. Note that BAM files are stored as alignments, so you need to select “Alignment Table File” in the list on the left. Select the desired location in the main section. Name the target file. If you need to change the default export location use the folder button. Click the Next button.
In the next screen, select fields from the alignment file to export.
Click the Finish button. Your file will be exported.
The exported file can be opened in a spreadsheet program like Excel for further use.
Step 7: Finished!
Congratulations! You now know how open and manipulate several different flavors of BAM files in Genome Workbench.
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools