U.S. flag

An official website of the United States government

Support for Genome Workbench will end on March 31 2024. You may still use the application, but supporting documentation will not be available after this date. Read more.

Query Syntax in Genome Workbench and Tree Viewer

Queries in Genome Workbench and Tree Viewer are designed to work with structured data that is organized into a set of records with named fields. Two types of queries are supported, character string based search and query-based search. String based search searches all the fields for a single character string value. Query-based search is more like SQL and uses true/false (Boolean) expressions as the basis for selection.

Character String Search:

To perform a string-based search enter a single word with no blanks, or separated words in quotes, e.g.:

Listeria “Listeria monocytogenes”

The above queries work similarly to how a search in Google works – it searches all the fields in all the records of the data source and returns any entries where the above character string(s) appear. Note that, unlike Google, only one character string may be provided. The following queries are not valid:

Listeria blood “GCA_000810445.1” “Bodily Fluid”

Queries:

To search based on multiple criteria an SQL-like syntax is also supported. Queries are made up of one or more connected true/false (Boolean) expressions. These expressions are built up from comparison operators that compare fields in the data source to other fields or values provided in the query. The below comparison operators will work with either character, numeric or Boolean (true/false) values:

  • =             (equals)
  • <             (less than)
  • >             (greater than)
  • <=           (less than or equal to)
  • >=           (greater than or equal to)
  • Like        (equals where ‘?’ matches any single character and ‘*’ matches zero or more characters)
  • Between  (check if a value is between two other values)
  • In            (check if a value is equal to any in a list of values)

Any comparison in the query returns a Boolean value. A query can combine comparisons into more complex expressions using the following Boolean operators, which only work with Boolean values:

  • AND         (True only if both expressions are true)
  • OR           (True if either or both expressions are true)
  • XOR         (True if one expression is true and the other one is false)
  • NOT         (True if the associated expression is false)

Note that the keywords Like, Between, In, AND, XOR and NOT are not case sensitive – they can be lower case, upper case, or a mixture of both. Also expressions can be grouped together using parentheses to create different logical expressions, e.g.:

A and (B or C) True if A is true and B or C is true (A and B) or C True if both A and B are true or C is true

Example Queries:

For our example we will assume we have a phylogenetic tree with six leaf nodes and each of these nodes has properties (fields) that match one of the rows in the following table. The queries below show how to use the query language to select specific nodes.

row dist asm_acc collection_date geo_loc_name isolation_source scientific_name
1 0.0946 GCA_000833875.1 2009-05-27 Italy: Rimini blood Haemophilus influenzae
2 0.1008 GCA_000633915.1 2009-10-02 Indian Ocean deep sea sediment Pseudomonas monteilii
3 0.1396 GCA_000409525.1 2009-08-28 India wetland sediment Bacillus sp. Aph1
4 0.0007 GCA_000810445.1 2009-06-08 USA Bodily fluid Acinetobacter baumannii
5 0.0419 GCA_000729665.1 2009-06-12 USA Food Associated Environment Listeria monocytogenes
6 0.03 GCA_000823885.1 2009-06-19 United Kingdom: England N/A Legionella pneumophila
Example 1: All entries that have distance to parent is > 0.09 and were found in blood dist > 0.09 AND isolation_source = blood Result: (row 1)

Example 2: All entries that were taken in part of the UK: geo_loc_name like United Kingdom:* Result: (row 6)

Example 3: All entries that were collected from the USA or India geo_loc_name in (India, USA) Result: (rows 3,4,5)

Example 4: All entries for samples taken from sediment: isolation_source LIKE *sediment* Result: (rows 2,3)

Example 5: All entries whose distance to their parent is between 0.09 and 0.11 dist Between 0.09 and 0.11 Result: (rows 1,2)

Example 6: All entries with samples not from the USA and whose isolation source was not a sediment not geo_loc_name = USA and not isolation_source LIKE *sediment* Result: (rows 1,6)

Example 7: All entries that were collected in June 2009 or are of species Bacillus collection_date like 2009-06* OR scientific_name like Bacillus* Result: (rows 3,4,5,6)

Example 8: All entries with samples from the US or of species Acinetobacter baumannii but not both geo_loc_name = USA XOR scientific_name = “Acinetobacter baumannii” Result: (row 1)

Example 9: All entries for Listeria or Bacillus that were collected in June 2009 (scientific_name like Listeria* OR scientific_name like Bacillus*) AND collection_date like 2009-06* Result: (row 5)

Current Version is 3.8.2 (released December 12, 2022)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2017-11-04T03:25:26Z