Maintaining the integrity of human immunodeficiency virus sequence databases

G H Learn Jr; B T Korber; B Foley; B H Hahn; S M Wolinsky; J I Mullins

doi:10.1128/JVI.70.8.5720-5730.1996

Maintaining the integrity of human immunodeficiency virus sequence databases

J Virol. 1996 Aug;70(8):5720-30. doi: 10.1128/JVI.70.8.5720-5730.1996.

Authors

G H Learn Jr¹, B T Korber, B Foley, B H Hahn, S M Wolinsky, J I Mullins

Affiliation

¹ Department of Microbiology, University of Washington, Seattle 98195-7740, USA.

Abstract

Human immunodeficiency virus type 1 (HIV-1) sequences are accumulating in the literature at a rapid pace. For this ever-expanding resource to be maximally useful, it is critical that researchers strive to maintain a high level of quality assurance, both in experimental design and conduct and in analyses. Here we present detailed analyses of problematic sets of HIV-1 sequences in the database that include sequence anomalies suggestive of mislabeling or sample contamination problems. These data are examined in the context of currently available HIV-1 sequence information to provide an example of how to identify potentially flawed data. Indicators of potential problems with sequences are (i) sequences that are nearly identical that are supposed to be derived from unlinked individuals and that are markedly distinct from other sequences from the putative source or (ii) sequences that are nearly identical to those of laboratory strains. We provide an outline of methods that researchers can use to perform preliminary laboratory and computational analyses that could help identify problematic data and thus help ensure the integrity of sequence databases.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Base Sequence
Databases, Factual*
Gene Library*
HIV-1 / genetics*
Humans
Molecular Sequence Data
Phylogeny
Sequence Alignment

Abstract

Publication types

MeSH terms

Grants and funding