U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Computer-Assisted Modeling. Computer-Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function. Washington (DC): National Academies Press (US); 1987.

Cover of Computer-Assisted Modeling

Computer-Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function.

Show details

9.Hardware

Four functions are essential to computer modeling of molecules:

  • molecular energy computation
  • configurational control
  • graphics
  • reasoning

Until recently, the standard hardware configuration of a VAX and an Evans and Sutherland display terminal could only achieve the second and third items. Molecular energy calculation on a VAX is very slow, although these computers were used to develop the programs. The advent of Cray-type supercomputers connected by national communications networks has given scientists access to more computer power for molecular energy calculations. More recently, the development of special purpose array processors made it possible to have in the laboratory computational power roughly comparable to the supercomputers. Reasoning about molecular structure until recently could be done only with special purpose machines which run the programming language LISP.

As the power of computers available to individual scientists increases we expect that these four functions will be brought together. The early VAX computers (for example the 11/780) typically provide 0.5 megaflop (million floating point instructions per second) and 1.0 MIPS (million instructions per second). Typical array processors provide 100 megaflop while typical LISP machines provide 2.0 MIPS. In the last years it was necessary to have one each of these types of machines in order to have reasonable amounts of computational power for the four molecular modeling functions. The next generation of computer described as a personal supercomputer (PSC) will have between 40 and 60 megaflops of number crunching power and between 15 to 20 MIPS of general (i.e. logical) computational power. With this level of numeric and logical computational power available in the next year at a scientific workstation there will be little need for separate machines to perform special functions.

The national supercomputers, however, already in place and operational, constitutes a very real scientific resource. As scientists learn that the supercomputers can effectively carry out molecular energy calculations, these machines will be used to their fullest capacity. However, the technology of the supercomputers is advancing rapidly, and the manufacturers promise that systems with three orders of magnitude more computational power will be available in the next few years.

While the supercomputers grow more powerful, the power of workstations and the PSCs is also increasing. Current workstations have the power of VAXs, but lack the capacity to run all four functions simultaneously. As the PSCs emerge, they will offer a combination of capabilities that will make it possible to run all four functions at once. The PSCs should create the possibility of a new computational and graphic plateau:

1988 - 1995: personal supercomputer

1977 - 1987: E & S display coupled to a MicroVAX II

1970 - 1976: Tektronix display coupled to a DEC system-10.

The Tektronix display and a scientific mainframe gave us the first plateau seventeen years ago. On this plateau it was possible for many scientists to view and manipulate molecules. The VAX computers, and more recently the even less expensive MicroVAX II computers coupled to an Evans and Sutherland display, have established over the last ten years a plateau of graphic capability which has enabled scientists to go over from the physical modeling of macromolecules to completely electronic modeling. The PSCs expected to emerge in the next years will permit scientists to compute and to visualize molecules in much more powerful ways.

Using the PSCs, it should be possible to shape molecular models easily using joystick controls, creating stereo color graphics in multiple modes of representation, while doing energy calculations and molecular reasoning. The only foreseeable problem with the supercomputers is that scientists' appetites for energy calculations may exceed the computational capacities of the PSCs. Configurational control should make it possible to sketch protein models. Using collections of rules, we should be able to use molecular reasoning to generate and evaluate large numbers of possible model states.

Because of the rapidly changing technology of computers, displays, workstations, and PSCs, national effort should be directed to guaranteeing that these devices conform to the various levels of standards of the International Standards Organization (ISO).

Standardization in the United States is achieved by interested parties working together in committees under the auspices of agencies and organizations such as the National Bureau of Standards, American Society for Testing and Materials (ASTM), Institute of Electrical and Electronics Engineers (IEEE) or ISO. Considerable standardization at the level of the computer operating system must be done to make the ISO model work. Hardware vendors must choose between product uniqueness for sales and market development, and intervendor product compatibility. Compatibility has many benefits. Adherence to the standards will make it possible to move programs quickly and easily from one device to another, as well as making it possible to construct a complete system from components supplied by many vendors. The ISO model has several levels, represented below:

1.

Ethernet

2.

TCP/IP communications protocol

3.

NFS - Network File System

4.

UNIX operating system

5.

VAX/VMS and Cray FORTRAN compatibility

6.

X-windows

7.

DIALOG-like application program window and functionality specification

The Ethernet originated at the XEROX Palo Alto Research Center. The TCP/IP protocol was developed for the DARPAnet, operated for the Department of Defense, and so is in the public domain. The NFS was developed by SUN Microsystems and placed in the public domain. Bell Laboratories developed UNIX. VAX/VMS FORTRAN was originated by Digital Equipment Corporation (DEC). X-windows originated at the Massachusetts Institute of Technology where they were developed to specify a machine-independent windowing system. DIALOG is an Apollo product that is a first attempt to answer the question of how to write high level mouse-driven applications programs in a high level specification language.

Standards are really the key to future progress in molecular modeling. If all investigators adhere to the ISO standards, then it will be possible to mix various workstations and special purpose computers on a laboratory network. Adherence to standards should lower the price of equipment to end users by enlarging the market. Similarly, with adherence to the standards, it will be possible to send and receive molecular structure data sets all over the world using global communications networks such as BITNET, CSnet, DARPAnet, Japan Universities net (JUnet), and Commonwealth Scientific and Industrial Research Organization net in Australia (CSIROnet).

Special purpose computers offer many possibilities for molecular modeling. Over the years, the National Institutes of Health (NIH) has funded facilities that developed molecular graphics, computation, and control devices. The control systems laboratory at Washington University Medical School developed the MMSX molecular display. The molecular graphics laboratory at the University of North Carolina at Chapel Hill has been instrumental in exploring the development of a variety of stereo, configurational control, and display devices. The molecular graphics laboratory at Columbia University is in the process of developing FASTRUN, a special purpose computer attached to a ST-100 array processor that boosts its molecular dynamics power by a factor of 10. The molecular graphics laboratory at the University of California at San Francisco Medical School has developed stereo and color representation techniques.

Special and general purpose graphics devices are increasingly easy to produce. General Electric in Research Triangle, North Carolina has produced a very fast surface graphics processor that can be used to display different types of objects, including molecules. At least one of the PSCs will have a sphere graphics primitive embedded in a silicon chip. Every effort should be made to encourage the development of special purpose processors. However, these processors should be required to adhere to the emerging computer standards, so that they can be easily integrated into existing laboratory networks.

The last few years have seen the emergence of array processors for laboratory use. The ST-100 array processor from Star Technologies, Inc. has been programmed by microcoding to produce molecular dynamics calculations at a rate comparable to a Cray XMP. The ST-100 is rated at peak 100 megaflops, while the sustained calculation rate is about 30 megaflops. The ST-100 costs about one-thirtieth of the Cray XMP-48. The FASTRUN device currently under development in the laboratory of Cyrus Levinthal at Columbia University will increase the power of the ST-100 by a factor of 10 from 30 average megaflops to 300 average megaflops. Floating Point Systems Inc. is discussing the delivery of a 10 processor FPS-264 system with a peak of 1 gigaflops. Multiple process machines could be added to this list, including the hypercube machines from Intel and NCUBE. All are laboratory machines. The power of supercomputers will obviously be increasing at the same approximate rates.

A very strong relationship exists between the architecture of a special purpose computer and the structure of the scientific problem to be solved. The question is, how much computational power does molecular modeling really need? The protein folding problem seems to be the gauge of this question, since molecular dynamics programs calculate atom position charge in 10−15 second time steps. If proteins really take minutes to fold, then computation will have to go from 10−15 to 102 seconds. The most powerful array processors available today make it possible to calculate and examine molecular trajectories three orders of magnitude longer than hitherto possible. Extending these trajectories an additional three orders of magnitude might bring us to the range where appropriate protein-folding actions can take place. There is some indication that if amino acids were synthesized at the rate of one per microsecond, then folding would be possible. Then, computing would only have to range from 10−15 to 10−5 seconds. This would be seven orders of magnitude less computing. If this estimate is close to correct and computing power increases at a rate of 50 percent per year, then current computer processor development will give us the necessary amount of power in 5 to 10 years.

CENTRAL VERSUS DISTRIBUTED COMPUTING

The National Science Foundation (NSF) supercomputer initiative again brings to the forefront the relationship between central computational services and distributed or personal services. Proponents of centralization argue that certain types of very large calculations are available only on centralized machines. The personal computer revolution showed how profoundly scientists respond to decentralized computation. The capabilities of personal machines increase at the same pace as the supercomputers, but the baseline machines are a market of 105 to 106 machines, whereas the supercomputers are a market of 102 to 103. Special purpose boards added to the baseline machine can raise its capabilities for specific functions (i.e., energy calculation, sequence comparison, or graphics) to levels approaching those of supercomputers.

The distribution of personal computation is driven totally by market forces and is not subject to centralized planning. Scientists buy laboratory computers with funds previously allocated for glassware. Postdoctoral students returning to their country of origin bring their personal computers. Floppy disks containing data files and even whole books form a new type of currency in countries operating centrally planned economies.

These modes of behavior form a valuable dichotomy. We need a balance between centralizing and decentralizing efforts. Individual scientists can participate in the planning and use of national supercomputers, while simultaneously helping to specify and buy smaller machines for their personal and laboratory use.

COMPUTER UTILIZATION IN THE NEXT 5 TO 10 YEARS

In the next 10 years, workstations will become ordinary scientific tools, like pocket calculators and balances. The workstations will become more popular with scientists as they acquire larger, faster, and more complex working programs; better graphics; more storage and access to other computers; and new data sources. A few years ago, only specialists searched DNA sequence data bases; now, because many workers have PCs in their laboratories, almost all molecular biologists search these data bases.

Workstation use is likely to follow the same pattern. Now, molecular graphics techniques are used only by departmental or laboratory specialists. In years to come, as all workstations begin to acquire adequate graphics capabilities, all scientists will routinely do molecular graphics, modeling, and energy calculations.

One of the strongest effects in the computer marketplace is the trade-off between constant dollar and constant performance. Because computer power is doubling every two to three years, the manufacturers tend to supply their customers with new models that cost the same but have increasing computational power. A customer, then, can expect to purchase a given level of computational power for a decreasing amount of money.

Twenty years ago, one needed a DEC PDP-10 to search protein or DNA data bases, while 10 years ago one used DEC PDP-11s or DEC VAXs. Now, one can use an IBM PC or one of its many clones to do the same job. In several years, one should be able to do DNA sequence searches on a pocket machine.

The brevity of the computer design and manufacture cycles has begun to overtake our ability to use these machines adequately. Twenty years ago, both manufacturers and consumers could reasonably expect a computer to sell and be worth buying for about 10 years; today, a given level of computational power has a life cycle of 3 years. The cycle length appears to be shortening even further in the sense that special purpose boards can be added to a small general purpose machine to make it functionally equivalent to a machine that costs up to 100 times as much. Why buy a Cray when a PC with a special purpose board will do the same thing? The cure for this problem will probably be a balance of market forces favoring the small mass distribution computers. PCs will rise in power to be general purpose workstations.

THE NATIONAL SUPERCOMPUTER NETWORK

The national supercomputer initiative sponsored by NSF allocates available computer time by a peer-review process. Individual scientist's requests for time must meet granting requirements of quality of the proposed work and size of allocation. From the scientist's viewpoint, the supercomputer network must perform tasks that cannot be done either in the laboratory or at local institutions. Since the network communication rates are 9,600 BAUD, only a limited amount of data can be passed between the scientist and the supercomputer. Essentially, this means that only batch computing can be run on the supercomputers. Large jobs run in the batch mode of computing are only one form of computing. The highly interactive forms of computing and graphics available on workstations will be even more competitive with the supercomputer network when the next generation of high performance workstation, the PSC, becomes available.

The use of national supercomputers can be left to the discretion of individual scientists as it is in this country or the use of these resources can bemandated. The ability to mandate use depends on the type of the economy or pattern of interaction between scientists and the government. The Australian scientists are also in the midst of this type of central planning (personal communication, 1987, trip to Australia). The government wants scientists throughout Australia to use the centralized supercomputer by paying for the use with funds from the scientists' grants; the scientists see this as a form of taxation. The market forces in Australia will probably dominate when the scientists realize that superior computing and graphics performance can be obtained by purchasing a machine. Once a machine is in a department or laboratory, the problem of centralized national supercomputer access and allocation is essentially ended.

LOCAL AREA NETWORKS

Molecular modeling in the future will probably be done on local networks of computers and displays. For the past 5 to 10 years, advanced scientific laboratories have had one or more minicomputers. Five years ago, laboratory officials, for the most part, took the first hesitant steps to link these computers in a network. In the last two or three years, networking of laboratory computers has become much more common. Laboratory networks contain computers acting as hosts for terminal and computational servers for other workstations. The workstations range in power from the smallest PC to powerful PSCs. As computers age and are replaced because they no longer work or are too expensive to maintain, they will be replaced by networks of a variety of computers and displays.

DATA BASE USE

Access to molecular structure and sequence data bases through global communications networks is an opportunity that will be available in the near future. Currently, most data bases are updated by magnetic tape every three to six months, including the DNA sequence data bases at the Los Alamos National Laboratory and at European Molecular Biology Laboratory (EMBL) in Heidelberg, the protein sequence data base at the National Biomedical Research Foundation (NBRF) in Washington, D.C., the protein structure data base at the Brookhaven National Laboratory, and the small organic molecule crystal structure data base at Cambridge University. Generating tapes for institutional and random scientific users is becoming an increasing burden for the data base operators. The global scientific networks are organized in such a way that it is possible for the data base operators to send out one copy of the update and have that copy spread throughout the entire scientific community.

For those scientific users who need a particular molecular structure data set for display or further modeling, the global scientific networks are ideal sources of information. Only recently, the Brookhaven protein structure file was tested at the National Research Council in Ottawa. A simple mail request to a BITNET server at the National Research Council produced one or more of the protein structure data sets in a few minutes.

The small molecule organic crystal structure file from Cambridge University in England is being used by scientists for molecular modeling and calculation. The Cambridge crystal file provides an ideal data source for ligand conformations. The data file and a search program have been available on the international commercial computer network for the past 15 years. Technology moves so fast that even while this report is being prepared the panorama with respect to data bases distribution has changed. For several years 5¼ inch laser disks have been on the market for audio. Now this highly developed consumer technology has been applied to the storage and retrieval of molecular structure data. Each laser disk, which costs about $2,000 to master and $10 to reproduce, can hold a complete update for the DNA sequence, protein sequence, protein structure and small molecule data files. The laser disk and associated software will be produced by a small starting company associated with the University of Wisconsin (Fred Blattner, DNAstar, Inc. at the University of Wisconsin, 1987, personal communication).

COMPETITIVENESS

America has a world recognized ability to transfer ideas from their development in an academic setting to practice by the formation of a small commercial enterprise. Then by the infusion of capital in several stages these small companies can be transformed into stable industrial corporations. These corporations are then able to consume the supply of trained scientific personnel produced by the universities. The position of the United States in the world economy is changing very dramatically at present, and certainly will continue to change in the next 5 to 10 years. Our overall competitiveness will be determined by our ability to form links between previously separate activities. It is already clear that biotechnology as an offshoot of our national expertise in molecular biology will be increasingly determined by the way we use computers in computational chemistry, macromolecular modeling, and the design of proteins. We are in the midst of two revolutionary tendencies: genetics and silicon. Computational chemistry is the glue that will bring these tendencies together in a stable form.

Copyright © National Academy of Sciences.
Bookshelf ID: NBK218557

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (3.8M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...