Support for Genome Workbench will end on March 31 2024. You may still use the application, but supporting documentation will not be available after this date. Read more.
Displaying new non-NCBI molecules with annotations
Introduction
Sometimes when trying to display BAM or GFF3 files on non-NCBI molecules, users receive the following error message: "graphical view failed to retrieve sequence for id lcl".
De-novo data (sequences and annotations) - are genomic molecules and assemblies without NCBI public accessions.
These may be data not yet submitted to NCBI (pre-submission) or private data (no submission planned). Such data can have identifiers that are not NCBI accessions. Very often, such identifiers start with the prefix 'lcl', and the sequences are referred to as 'local data'.
Genome Workbench organizes its data into projects. A project can be inspected using the Project Tree View. There is almost no validation on when data is loaded into a project, so you have maximum flexibility on how to load it. For example, it is possible to first load GFF3 with annotations, then load BAM files, and then load FASTA with the reference molecules ( “lcl|chr1234” ).
It is important that every project with loaded annotations or connected BAM files have access to all sequences that the sequence ids in the annotations refer to. Genome Workbench automatically loads sequences from NCBI that are referenced by GenBank or RefSeq accessions, but de-novo unsubmitted cases do not work this way. Either the sequences should be loaded into the project using FASTA files, or sequences should be made available via a BLAST database.
Let us discuss a few typical scenarios in which things may go wrong.
Typical Scenarios
The user imports a de-novo GFF3 file into a project and immediately tries to open a Graphical Sequence View to look at the molecule and its annotation. The Graphical Sequence View will not have access to the sequence content, only the annotation, and will fail with the error message.
How to fix:
- Import a FASTA file containing the sequence into the same project as the GFF3 file
The user imports a FASTA file, then imports a GFF3 file into the same project. The Graphical Sequence View will display the sequence and annotation. Then the user imports another GFF3 file, but accidentally chooses to add it to a different project. The Graphical Sequence View will not show the uploaded track with the sequence in the first project. If the user attempts to open a new Graphical Sequence View for the GFF3 in the second project, it will fail with the error message.
How to fix:
- Move second GFF3 file into the same project as the FASTA file or
- Import the FASTA file again into the second project (this is less optimal but also possible)
The user imports a de-novo BAM file into a project and attempts to open a Graphical Sequence View to look at it. The error “failed to retrieve sequence” is displayed.
How to fix:
- Import a FASTA file with the sequences referred to in the BAM file.
Conclusion
Genome Workbench requires that annotations and the sequences to which the annotations refer are imported into the same project in order to display the data in the Graphical Sequence View.
Why can’t Genome Workbench automatically find sequences from separate projects?
Ids are only known within a specific project, not across different projects.
In Genome Workbench, it is possible to have two different molecules that both use the same sequence ID, but only if these two variants are loaded into different projects. The user can then graphically compare the two different molecules, which is useful if these two molecules are variants of one another. The annotation for these two molecules will also use the same sequence IDs. Genome Workbench uses the project to determine which of these molecules the annotation refers to. Allowing annotation and sequence data to exist in separate projects would lead to inconsistency and potential conflicts in this situation. The current mechanism of using the project to set the scope for data resolution allows the user to do comparative visualization of de-novo molecules with the same sequence IDs without conflict.
For more information please refer to Working with Non-Public Data tutorial
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools