O PRIMER

A MANUAL FOR BI/CH 170: PRINCIPLES OF PROTEIN STRUCTURE

by

Jed Pitera and Pamela Bjorkman (1994)

with comments and assistance from

Art Chirino,

Dan Vaughn,

Yonglin Hu,

and David Mathog

and advice from Andrew Huber and Wim Burmeister

based on the 1993 TOM Primer by David Taub and Pamela Bjorkman (1990)

with additions and improvements by Atiya Hakeem (1991),

Jonathan Bradley (1992), Andrew Huber (1992), and Michael Stowell (1992)

and much advice from Art Chirino (1990-92)

Division of Biology

Caltech

"What is the secret of life?", I asked.

"Protein," the bartender declared. "They found out something about protein."

Kurt Vonnegut, Cat's Cradle

Love hides in molecular structure.

Jim Morrison, Love Hides; from Waiting for the Sun

Table of Contents

INTRODUCTION 1

Chapter 1 -- Basics of the Silicon Graphics Iris 2-4

1.1) Logging on 2

1.2) Setting up your window 2

1.3) Opening windows 4

1.4) Changing your password 4

1.5) Logging on from other computers 4

1.6) More UNIX help 4

Chapter 2 -- Using O to look at a model structure 5-14

2.1) Before Starting O 5

2.2) Loading a database containing a sample peptide structure 5

2.3) Displaying your peptide 6

2.4) Using the mouse in the graphics window 7

2.5) Viewing your protein in stereo 9

2.6) Using the Move commands for docking 10

2.7) Databases, Saving, Stopping and Starting 11

2.8) Trouble-shooting: what to do if O won't run 12

Chapter 3 --Brookhaven coordinate files and UNIX commands 15-26

3.1) UNIX commands for file manipulation 15

3.2) Let me out! 16

3.3) Special commands for Brookhaven coordinate files 16

3.4) Loading Brookhaven files using the Sam commands 23

3.5) Renumbering a coordinate file 25

3.6) Writing out a pdb file 25

Chapter 4 -- Displaying a protein structure using O 27-35

4.1) Selecting a molecule and building an object 27

4.1a) Carbon-[[alpha]] objects. 28

4.1b) Cover_sphere and Sphere_centre. 29

4.1c) Deleting objects 29

4.2) Making more complex objects by selection 29

4.3) Colo(u)ring things 31

4.3a) Coloring by object. 31

4.3b) Coloring by molecule. 32

Chapter 5 -- More O Tricks 36-42

5.1) Graphics shortcuts 36

5.2) Command shortcuts 36

5.3) Macros 37

5.4) Some important macros 38

5.4a) Making van der Waals and solvent-accessible surfaces. 38

5.4b) Quickly loading and displaying molecules. 38

5.4c) A rainbow C[[alpha]] object. 38

5.4d) Coloring by standard atom colors. 38

5.4e) Secondary structure analysis. 38

5.4f) Changing the Save filename. 38

5.4g) Making a Ramachandran plot. 38

5.5) Making pretty pictures using the Sketch commands 38

5.5a) Drawing ball-and-stick objects. 39

5.5b) Drawing solid objects. 39

Chapter 6 -- Superimposing Structures by Least Squares 43-46

6.1) Generating a transformation with Lsq_explicit and Lsq_improve 43

6.2) Transforming objects and molecules 46

Chapter 7 -- Modifying Structures 47-55

7.1) Mutating, inserting and deleting 47

7.2) Cleaning up mutations with refi_zone and the Lego commands 48

7.2a) Lego_side_chain and lego_loop. 49

7.2b) Refi_zone. 50

7.3) Building a hypothetical structure 50

7.3a) Using Sam_Init_DB to create space for a de novo structure. 50

7.3b) Assigning de novo coordinates. 52

Chapter 8 -- Displaying DNA 56-57

Chapter 9 -- Saving Pictures as "SNAPSHOTS" 58-60

9.1) Using the snapshot program 58

9.2) Plotting your image from O 59

9.3) Taking photos of the screen 60

Appendix A -- Example 61-64

Appendix B -- Description of some of the files in your directory 65

Appendix C -- Some useful UNIX commands 66

INTRODUCTION

Greetings, welcome to O, a program that allows you to display, rotate, and manipulate three dimensional structures of proteins and DNA. An ancestor of O was "Frodo," a modeling program that used the Evans and Sutherland PS300 series of graphics terminals supported by a VMS VAX. Purportedly, O is so called because O is the last (letter) of "Frodo." The version of O you will use for this course runs on a Silicon Graphics personal IRIS computer that uses the UNIX operating system. Previously, this course used the software TOM, another "Frodo" derivative substantially modified and enhanced by Art Chirino and Mark Israel.

This manual has been written assuming you know very little about computers. If you are familiar with UNIX computers, you can skip chapter one. If you have used O before, you can skip this whole manual, except for section 3.2 about Brookhaven files.

The commands you will read about how to display structures have been simplified for the computer impaired, so if you presently know nothing about computers, after reading this manual you will still know nothing. For those of you who know something about UNIX, and are a little more interested in what is going on "behind the scenes", you will find most of the non-standard commands described in section 3.2 set up in a file called .alias.

For the sophisticated, a manual for O was written by its programmers, and you can find it somewhere in room 158 Braun. You can also access it on the computer with the UNIX command "help". The manual you are reading now is not meant as a replacement for the "real" O manual. It's meant to teach a novice what he/she needs to know to look at protein and DNA structures. Another additional source of information about O is the tutorial, O for Morons, written by Gerhard Kleywegt and also present somewhere in 158 Braun. That tutorial, like the primer you hold in your hands, provides an introduction to some of the display and manipulation capabilities of O.

Chapter 1

BASICS OF THE SILICON GRAPHICS IRIS

1.1) Logging on

You may use any of the four Silicon Graphics Iris (SGI) computers in 158 Braun: "citpig", "covalent", "goose", or "pi". Don't use the machine named "howie", as it does not have all of the hardware necessary to run O. This manual usually refers to citpig, but the other SGIs work the same way. You are now sitting in front of the computer terminal. You should see before you:

1) A big screen, that is probably blank.

2) A keyboard

3) A strange looking thing with eight large dials on it, in two rows of four, running vertically. This should be near the screen, probably just to the left of it.

4) Another strange looking thing lying flat, with countless buttons on it. This should also be near and left of the screen.

5) The mouse. No, this actually looks nothing like a mouse, but it is called a mouse. It should be just to the right of the keyboard, on a small reflective or rubber mat. Its small, about hand sized, and should have three long thin buttons on it.

If you can not identify any of these objects, just ask anyone else nearby.

Now, gently move the mouse around on the reflective mat. The screen should no longer be blank. What you should see is a box in the center of the screen containing a rather nice picture of some engrailed homeodomains associated with DNA and the words "Structure Analysis Facility." If you move the mouse around a little more, you should see a red arrow moving around the screen. The mouse controls this arrow, and this arrow will be one of your main methods of selecting options. Take a few seconds right now to just move the arrow around to get familiar controlling it. You will notice that if you pick the mouse up off the mat, the arrow no longer moves. Now that you are a mouse professional, take a look at the box to the right of the word USERNAME. You will notice there is a vertical line or "cursor" in the box. Type in your USERNAME, all in lower case (UNIX is case sensitive, so "Anybody" is not just "anybody"). Usually your USERNAME will just be your last name, unless you have a really common surname in which case it will be supplemented by an initial or two. Once finished typing your username, press the key on the keyboard marked "Enter". Under USERNAME should appear PASSWORD followed by another box. If this doesn't happen, repeat the above process until you get it right. Now you should type your password on the key board. Your password will be some randomly assigned unpronounceable concatenation of symbols, numbers, and upper and lower case letters. If you forget your password, talk to the system administrator for these machines, David Mathog. His office is in a little room just to the right as you enter 158 Braun.

Now, one of two things should happen. Either the computer will start doing things, and you have successfully logged on, or the PASSWORD box and your name will disappear, and you have failed to successfully log on. It will be very obvious to you whether or not you have been successful. If you fail, then try the above steps again; you may have just mistyped your password. Also, make sure you type your password with letters in the appropriate case.

If you still cannot log in, then complain to David Mathog, or to a TA, who will go and complain to David Mathog. Eventually the problem will be solved.

1.2) Setting up your window

When the computer is done logging you in, the screen should be mostly empty except for the upper left hand corner. There should be several greyish horizontal bars, labeled SYSTEM, WINDOWS, TOOLS, DEMOS, and OVERVIEW. These bars should be left alone unless you know what you are doing. Next to the bars there will be a small detailed picture of a computer with the word CONSOLE written across it. This is called the console window icon. First try opening your console window. To do this, place the red arrow on the console icon, and press the left button on the mouse. A window should appear in the lower left part of the screen. This will be a large grey rectangle containing the word "Console" at the top. In your window, you will notice that the last line is:

citpig%

This is known as your prompt. To the right of the percent sign, there will be a green rectangular cursor. If the red arrow is in the window, the cursor will be solid green, and if it is outside of the window, the cursor will only be outlined in green. The red arrow must be somewhere in the window for you to be able to type in it. You can move your window to the place on the screen that makes you happiest. To do this, simply move the red arrow to the window. You can now "grab" the window by pressing the center button on the mouse. Then, while holding the button down, you can move the window to a more convenient position. When you release the center button, the window will be placed at the new position. You also have the ability to change the size and shape of your window. To resize your window move the red arrow to one corner of the window. When the arrow is in the correct position it will become a smaller arrow plus a corner-shaped symbol. You can "grab" the corner of the window now by pressing the left button on the mouse. The window will change size as you move the mouse while holding down the button. When you have selected a more suitable size to fit your needs, release the center button. You should now practice moving and resizing the window, just to become familiar with it. Gee, wasn't that fun.

There is a way to look at text that has been pushed off the top of the screen. If text has scrolled off the top of the window, then when you look at the left vertical border of your window, near the bottom you will notice two things. The first is a pair of arrows pointing up and down, and the second is a vertical bar directly above these arrows. If you position the red arrow on the left vertical bar and press the left button on the mouse, you will "grab" the bar. If you now move the bar towards the top of the window, while holding the left button, you will notice that you move all of the text back down the window, and you are now able to go back to things that were printed a while ago. When you let go of the button the bar, and text, will stay where they are. But, in order to see what you are typing, you will need to return the bar to the bottom position.

You can also cut and paste text in a window. Move the mouse pointer to the beginning of the text you want to copy, then press and hold the left mouse button. Drag the mouse pointer to highlight the text, then let go of the left mouse button. A press of the middle mouse button will now paste a copy of the highlighted text at the position of the solid green cursor. This is a good way to avoid typing the same thing over and over again -- just type it once, then cut and paste!

In the upper right corner of the window, you will notice a tiny outlined square, to the left of a larger outlined square. If you click the left mouse button in this tiny square, your window will turn back into the console icon. You can then turn the icon back into your window the same way you did before.

To logout, first type the words "logout" or "exit". Either will close your window. Then, simply position the red arrow against any background space, not in a window nor on any icons, and click the right button on the mouse. A small window will appear with the word logout in it. While holding the button, move the red arrow onto the word logout which should then become highlighted. If you then release the button another small window will appear with two options in it. One is "yes, really" and the other is "no, not really!". Move the arrow onto the choice you wish to make, either "yes" to complete the logout, or "no" to cancel the logout (which will return you to where you were before you clicked on logout). Moving the arrow over the appropriate phrase will highlight it, and pressing the right button will select it. This is the only way you can logout from this terminal. Some of you may try to just type "logout", this will close your window, but will not completely log you out. To completely logout, you must use the above method. Please do not leave the terminal without completely logging out. Failure to do this may result in your computer privileges being revoked.

You can be sure that you have successfully logged out once the "Structure Analysis Facility" screen that you saw when you first sat down reappears.

1.3) Opening windows

The console window is where the system often writes error and information messages. It is best if you do your work in a different window. Also, you will sometimes want more than one window to work in. Perhaps you want to display a structure with O while simultaneously paging through a text file. You can get an additional window to type commands in, called a "shell," by moving the red arrow to the bar marked "Tools" in the upper-left corner of the screen. Press and hold the left mouse button, and a box with multiple entries appears. Move the arrow through this menu, still holding down the mouse button, until the entry marked "Shell" is highlighted. Release the mouse button to select a "Shell" and the menu disappears. Shortly thereafter, the red outlines of a window appear at the location of the red arrow. Move the mouse around, and the outline window follows. Get this outline where you want it, and press the left mouse button to place the new window there. The outline disappears and a new window of similar dimensions pops up in its place. This window, titled "Winterm," contains a citpig% prompt from which commands can be typed just like the console window. The number of shells you can have open at one time is much larger than the number you can use effectively. You can move, resize, and iconify or de-iconify these windows to control desktop clutter, just like the console window.

1.4) Changing your password

To change your password, talk to David Mathog in 158 Braun. You should not change your password yourself. Be forewarned that your new password will still be randomly assigned, and probably just as hard to remember as the last one. DO NOT FORGET YOUR PASSWORD!

1.5) Logging on from other computers

For security reasons, we request that you not log into the SGI computers over the campus network. As the course is set up, you shouldn't actually have any reason to do so. In particular, O will not run across the network. If you need to transfer files to or from these machines do so by connecting from the SGIs to the remote computer, and not vice versa. Alternatively, you can transfer files via e-mail.

1.6) More UNIX help

If you have never used UNIX before, or are not very familiar with it, there is a helpful UNIX tutorial in the back pages of O for Morons. You should be able to find a copy of O for Morons in 158 Braun.

Chapter 2

USING O TO LOOK AT A MODEL STRUCTURE

2.1) Before Starting O

The main program that you will be using is called O. This program takes a list of x,y,z coordinates for the atoms in a protein or DNA molecule and displays images of them. In this chapter, you will be introduced to the program with an artificial peptide structure consisting of fifty alanines (in an idealized [[alpha]]-helix) and twenty leucines (as a ß-strand). You will also use O to examine and display two [[alpha]]-helices, a parallel ß-sheet and an anti-parallel ß-sheet.

The very first time you want to run O, some files O needs have to be copied to your directory. The command "setup O_files" does this automatically. To execute a command, type it in at the appropriate prompt and press the Enter key (in this primer, a press of the Enter key is written as <Enter> when it is not obvious). If you do not follow the command with an Enter, it will not execute. Just so:

citpig% setup O_files<Enter>

You should only need to run the above command once. However, O isn't ready to run yet. Before starting O, some variables directing the software to look in the appropriate directory for its data files need to be defined. This is done by executing the "setup O" command. Unlike the prior "setup" command, you need to execute this command once in each window where you wish to run O.

citpig% setup O<Enter>

typing ono will start O

citpig%

Unfortunately, the "startup" commands can't tell O everything it needs to know, so you will need to start O with another file called a saved database -- just typing ono will not work.

2.2) Loading a database containing a sample peptide structure

There should be a file in your directory titled 170_example1.o. It contains a saved O session where the coordinates describing a model peptide were loaded into O and then used to make a number of display "objects." An object is a visual representation of a molecule whose coordinates have been loaded into O. A file containing a saved O session is referred to as a saved database.

First, make a backup copy of this file using the cp command. You should always be careful to make backup copies of any important files you use. Then go ahead and start O using this saved database:

citpig % cp 170_example1.o 170_ex1_backup.o

citpig % ono 170_example1.o

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST

O > Loading 170_example1.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]: Yes

O > Graphics board GL4DPIT-4.0

O > O > Trackball on (F7KEY)

After you request to use the display, a large window will appear on the screen. This is called the graphics window. All of the objects you create with O will be displayed here.

The 170_example1.o database contains one "molecule", a poly-alanine peptide named "ALA" and numerous objects created from that peptide (ALACA, ALALL, YASSPA). The molecule cannot be displayed on the screen; the objects can. The object is effectively a "snapshot" of the molecule -- it includes all the atoms and color information possessed by the molecule at the moment of the object's creation. Later modification of the object -- moving it, changing its color, and so forth -- will not modify the molecule. Modifying the molecule itself makes the object "out of date" -- the snapshot no longer shows the current state of the molecule. Usually, after making any major modifications to a molecule, you should re-define all of the objects affected by the changes. Failing to do so will result in a misleading display. The objects in question are also often strangely distorted. This dichotomy between the "molecule" (coordinates describing the structure) and the "object" (a three-dimensional graphical image, usually created using a "molecule's" coordinates) is rigidly adhered to in O, and takes a little getting used to.

Take a look at the menu on the right hand side of the graphics window. Towards the end of the menu, below the On_Off command, are the names of each object, in uppercase and preceded by a carat (^) sign. Try using the leftmost button to select one of these items; ^ALACA, ^ALALL, ^YASSPA. The name changes color, shifts to lowercase, and the associated object disappears. Clicking on or off on the menu object name toggles display of the object on or off. Equivalently, you can type "^object-name." The three objects available are ALACA, a carbon-[[alpha]] trace, colored in a gradient of red to blue; ALALL, an object containing all the atoms in the peptide colored by atom type; and YASSPA, a carbon-[[alpha]] representation color-coded by the secondary structure of the peptide.

2.3) Displaying your peptide

You are now ready to look at your peptide. Clear away all the objects except the ALALL object. Now take a closer look at the graphics screen. Somewhere near the center you should see a multicolored object. That is your peptide. On the right edge of the screen you should see a long list of words. This list is known as the menu and consists of a series of interactive commands which are chosen by the mouse. More about them later. The upper left corner of the screen contains several lines of text. The first is a reminder (in yellow) that you're using O, including the date and time (in case you work so hard you forget what day it is). Beneath that, there is the word "Prompt:" in red. If a command requires you to carry out some action, a message will appear on this line. Next is a line labeled "Info:." This is where information about the currently selected atom will appear. The names of currently active commands will appear beneath this, in purple. In the lower left corner of the screen you should see something that looks like this:

Zoom Slab

Rot z Trans z

Rot y Trans y

Rot x Trans x

These correspond to the eight dials on that thing with dials mentioned earlier (savvy folk tag this the "dial box"). The dials are all pretty easy to figure out. Right now you should turn the zoom dial until your peptide is a better size for viewing. When the protein is bigger and easier to see, you should notice that all of the carbon atoms are yellow, the oxygens are red, and the nitrogens are blue. This is one coloring scheme for protein displays, but later you will learn to change the colors to suit your needs. Feel free to play around with the three view angles you can adjust. These dials will rotate the peptide in space. Sometimes, various commands (like those for moving around a protein or peptide relative to another) will assign different functions to these dials. We'll cover that when we get to it.

It is usually most convenient to have the text window where you're running O sticking out from behind the graphics window so you can click with the left mouse button on either window "frame" to jump between them. If your graphics window covers your text window, don't panic. With the mouse pointer somewhere in the graphics window, press the "Alt" (left of the space bar) and "F3" (top row) keys at the same time. Try this combination now, and you will see your window pop up over the graphics screen. This trick should work with any window. One nice feature of O is that it doesn't care where you type your commands, so long as it's in one of the O windows. The graphics window is equivalent to the text window for typing, although it's much easier to tell what you're doing in the text window. Just like UNIX, you need to follow each O command you type with a press of the Enter key, or it will not execute.

Sometimes O commands can accept either typed input or input from the mouse. In these cases, you have to type all the input on one line, before pressing return, or O will assume you want to use mouse input. Whenever O expects mouse input, the red "Prompt:" line in the upper left corner of the graphics window will show a message telling you so.

2.4) Using the mouse in the graphics window

While in graphics display mode, the mouse can be used for a number of functions. If you orient the red arrow over any particular atom, and press the left button on the mouse, that atom will be identified. A small letter and number will simultaneously appear in red beside the atom, and in yellow at the top left of the screen. These will correspond to the atom type, and the residue number to which it belongs. This is called "IDing" the atom. Also note that following the residue number at the top of the screen is some additional information, including the molecule name, residue name, residue type, current atom, XYZ coordinates, temperature factor and Z values. (The temperature factor of an atom is a crystallographically determined value related to the atom's mobility; the Z value is the occupancy, another crystallographic value.) Some commands will request input by asking you to "ID" an atom.

All along the right edge of the screen are the menu commands. This list of commands was set up specially for the course -- other users of O may have entirely different menus. The menu commands may be selected by placing the red arrow over one, then pressing the center button on the mouse. This initiates the selected command. However, many of the commands require you to do something further before they complete their functions.

Towards the bottom of the menu, there are the previously mentioned commands that correspond to the current display objects. These consist of the object name, prefaced by a carat (^YASSPA, for example) and provide an easy way to turn the display of objects on and off. When one of these entries is capitalized and colored green, the associated object is being displayed by O. When it is lowercase and colored red, the object is not being displayed. Clicking on the command toggles it between these two states. Be aware that you may not see a displayed object because it is not in view of the screen (not centered) or because it is hidden by another object overlapping it (which often produces strange, mixed colors). Try it. Turn the various representations of ALA (ALACA, ALALL, YASSPA) on and off to get a good look at each one individually. Later, when you are building objects yourself, they will show up in this list. Occasionally, O commands will create objects on their own which will also show up here (and behave normally).

Also, there are several commands in the menu that will give you information about the relations of two or more atoms. These are:

Dist_Define

This will print the distance in angstroms between the next two atoms chosen by the mouse.

Angle_Define

This will print the angle defined by the next three atoms chosen by the mouse.

Phi_Psi

This will print the phi and psi angles for the residue containing the next atom chosen by the mouse.

These commands only measure the distance or angles once -- if you move atoms around relative to one another (e.g., during docking of two molecules), the value doesn't get measured again unless you select the following command:

Trig_Refresh

This will activate continuous updating of trigonometric (distances, angles) measurements.

Trig_Refresh is a good example of a special type of command called a flag -- the command stays active once selected until it is cleared. You can tell what commands are active as their names appear in purple in the upper left-hand corner of the graphics window, beneath the yellow information line.

There are several ways of clearing command selections, and effects. If you wish to unselect active commands, or clear the protein of the information from certain commands, select:

Clear_Flags

This will turn off all active commands, but leave the atom IDs. To remove those, select:

Clear_ID

to clear all atom IDs.

Trig_Reset

clears all trigonometric objects (distance and angle measurements and labels)

Other useful commands on the menu include:

Centre_ID

This command will recenter your protein on the next atom you select with the mouse.

Save_DB

Saves the present database under the appropriate filename, as discussed in Section 2.7.

Yes/No

Some commands will prompt for yes or no responses. In these cases, you can click on yes or no instead of typing.

Also, other trigonometry commands are:

Hbond_all

Creates an object named HBOND showing hydrogen bonds between all atoms in the selected zone.

Hbond_mc

Creates an object named HBOND showing hydrogen bonds between main chain atoms in the selected zone.

Neighbour_atom

Shows all atoms within 3.5 Angstrom of the picked atom (and prints distances)

Neighbour_residue

Shows the neighbour atoms of all atoms in a residue, and associated distances. Clutters display.

We now want you to study the helix in more detail, so center the display around residue 12 using Centre_ID. What you see is an idealized a-helix with all alanine side chains. Draw in some of the main-chain hydrogen bonds. To do this, use the menu command Hbond_mc and pick residues 1 and 25 to get the lines between the appropriate atoms. Then select Clear_ID to remove the atom IDs and make the idealized hydrogen bonds easier to see. O automatically finds all the main-chain hydrogen bonds in the zone you have selected. Notice that all of the hydrogen bonds have the same polarity. What consequence does this have for the helix? Notice also that all of the side chains point towards the N-terminus of the helix (in this case the side chains consist only of the carbon ß atom (ID'd as CB) of the alanine methyl group).

As another way of studying the helix, you can examine its torsion angles, as well as those of the idealized ß-strand. You can do this by using the following menu commands Phi_Psi and Tor_Residue:

Phi_Psi

Selecting this command and then selecting a residue will cause the values for the phi-psi angles to appear in red in the upper left corner of the screen, above the residue information. Phi_Psi is a flag like Trig_Refresh -- it stays on until you issue a Clear_Flags command. You can use this command to check how the phi-psi angles differ for residues in a helix versus an extended strand, and later on when you look at a display of a real protein, you can look at phi-psi angles in turns. In Chapter 5, we will cover how to generate a Ramachandran plot for a structure.

Tor_Residue

This command allows you to rotate about a specified torsion angle, but can also be used to examine the values of all torsion angles within a residue. (In addition to phi and psi angles, you may be interested in the chi angles of the side chains, and/or main chain omega angles to check the planarity of the peptide bond and if the bond is cis or trans.) To use, select Tor_Residue from the menu, then ID the residue of interest. The values of torsion angles associated with the residue will be displayed next to their respective bonds. Note that the dials' definitions change when you're using Tor_Residue to allow you to rotate around any selected angle. If you do rotations, select No to cancel any changes you have made, or Yes to rewrite your molecule to include these changes. Tor_Residue will only change the torsions of the selected residue in isolation: to cause larger scale rotations and translations involving multiple residues, you need to use the Move commands. One warning -- a bug in O prevents Tor_Residue from working properly with the carboxy-terminal amino acid of a polypeptide.

2.5) Viewing your protein in stereo

It is possible to view your protein in stereo. This works by displaying two views of the protein like those pictures you may have seen in some of your biology text books. If you can not see in stereo by crossing your eyes or by looking "wall-eyed", there are special glasses lying around the room to help you. These look like large black welding goggles with a dial on top. To switch back and forth between stereo mode and normal view, press the key in the top row of keys on your keyboard marked "F9."

This should produce two copies of your protein on the screen. They may both be controlled by dials the same way your single protein was. The dial on top of the glasses is used to bring the two images into focus. You can increase or decrease the stereo separation with the up and down arrow keys, respectively. Also, you can alter the depth of the stereo image with the left and right arrow keys. Movement is often helpful in seeing the image three dimensionally. You can set the image to spinning for a short period of time by using the command spin.

If, after some practice, you still cannot see the image three dimensionally, ask someone who has gotten the hang of it for some help. If that still fails, then you may try a TA.

2.6) Using the Move commands for docking

When looking at real proteins (as opposed to this example peptide), you may sometimes want to move part of the protein with respect to another part. For example, you may be working with a protein known to bind peptides, so you could try to dock a peptide of a given sequence into the protein's peptide binding site. To dock the peptide into the binding site, you need to move it independently from the rest of the protein. The interactive option that allows you to move part of the atoms independently from the rest is called Move_zone. You will now be taken through a small exercise to use Move_zone and the other Move commands with your example peptide.

Take a look at the peptide, and choose a part of it you would like to move around. Once you have decided on a fragment, click the mouse on the interactive command Move_zone. Then ID atoms at either end of the fragment you want to move. The bonds on either side of the fragment will disappear. Do not hit Yes or No until you are done moving the peptide. While they are gone, you can translate and rotate this piece independently of the rest of the molecule. You should notice that the dial settings change while the fragment is highlighted. The left hand dials remain the same, but the lower 3 dials on the right side are now labeled FragMove Z, Y, and X. These will translate only the selected zone. How can you rotate the fragment? There are actually other assignments for the dials that you can choose from. The menu commands Dial_next and Dial_previous permit you to cycle through these choices. Select these a couple of times, and watch the dial assignments in the lower left hand corner of the graphics window change. When you have moved the zone to a place you like, click the mouse on Yes to accept the change and write out the new coordinates, or No to return the protein to its original position. WARNING: You will over-write your O database file with new coordinates if you select Yes and Save_DB. This is fine, if you really want to make a permanent change in your coordinate file. You may want to keep a back-up copy of your saved database file if you are going to make changes. If you select Yes without ever selecting Save_DB (or quitting with Stop, which will also save the database) the coordinates will not be permanently altered.

Now take a look at the alpha-helical portion of the peptide. Orient it vertically, and zoom in until only the helical section is visible. Now move the bottom half of the helix so that it is alongside the top half of the helix. With some skill and practice, you could orient the two helices at angles that two helices commonly cross each other in proteins. Or you could try to make a parallel coiled coil. Later on, when you learn to make substitutions in files, you may want to go back to this sample peptide and change residues so that the two helices could be part of a leucine zipper, then try to move them with respect to each other to model a leucine zipper.

Once you feel quite satisfied with [[alpha]]-helices, zoom in on the [[beta]]-sheet residues. Now this next part is a little complicated, so don't worry if you can't seem to quite get it. Orient the [[beta]]-strand vertically, N terminus up, such that all the carbonyls are in the plane of the screen, and every other side chain points in the opposite direction. Select the bottom half of the strand with Move_Zone. Do not use "FragRot X" or "FragRot Y", only use "FragRot Z" to flip the fragment 180 degrees. Use "FragMove X" and "FragMove Y" to move the fragment next to the other one, and orient it in such a way as to make the correct hydrogen bonds in an antiparallel ß-sheet. If you succeed in making the two-stranded antiparallel ß-sheet, try to make a two-stranded parallel sheet, and notice that the hydrogen bonding pattern is quite different. If you can't do either, at least notice that all the carbonyls and nitrogens are in one plane, with the side chains (all leucines in this case) pointing either towards you or away from you (perpendicular to the plane of the carbonyls and nitrogens). Remember that a real ß-sheet is twisted, due to the inherent twist in an individual ß-strand. Immunoglobulins consist almost entirely of ß-sheet structure, and later on you can look at the sheets in an immunoglobulin domain to observe their twist.

Other Move commands are Move_atom, which permits translation of the selected atom; Move_object, which permits rotation and translation of an entire object without distorting the structure (and without changing the coordinates of the underlying molecule); and Move_fragment, a command normally only used in model building and manipulation. A final command is Flip_peptide, which rotates the selected peptide bond 180 degrees about the C[[alpha]]-C[[alpha]] axis. This command is also normally only used in model building, but fun to experiment with nevertheless. For further descriptions of these commands, consult the O manual.

2.7) Databases, Saving, Stopping and Starting

O stores everything you load into it, and all the objects and information you have it generate, in a structure called the database. Everything important is there, from the peptide sequence and xyz coordinates for every protein you load, to descriptions of all of the graphical objects you have O generate, to directions telling O where to look for important files. You have been working with a database specially prepared for you, containing the molecule ALA and some objects made from it.

You can take a look at the entries in the database, called "datablocks," by using the directory command. Remember, your red mouse arrow needs to be in the window where you started O in order for what you type to show up. This command works a lot like the UNIX command "ls" -- to get a listing of all datablocks in the database, you would type

O > directory *

You may need to use the scroll bar on the left side of your window to look at all the lines of such a listing. It can get huge, especially if you've loaded many different molecules into the program. You can also pare down the listing by judicious use of that *, or wildcard. Maybe you've loaded a molecule named ala. To find all the entries associated with ala, you would type

O > directory ala*

which yields a listing of all the database entries that have names starting with ala:

O > directory ala*

Heap> ALA_ATOM_XYZ R W 5439

Heap> ALA_ATOM_B R W 1813

Heap> ALA_ATOM_WT R W 1813

Heap> ALA_ATOM_Z I W 1813

Heap> ALA_ATOM_VISIBLE I W 1813

Heap> ALA_ATOM_SELECT I W 1813

Heap> ALA_RESIDUE_NAME C W 416

Heap> ALA_RESIDUE_TYPE C W 416

Heap> ALA_ATOM_NAME C W 1813

&c.

To save your work in O, you simply save this database. The command save_DB is used to do this. The first time you save any database, save_DB will ask you for a filename to save the database under. Further uses of the save_DB command will automatically use the same filename. For example:

O > save_DB

As1> File_O_save is not defined.

As1> Enter file name [ binary.o]: ala.o

O > save_DB

O >

Be careful -- if you save your database, then manage to mess it up and save it a second time, the second, messed-up database will clobber the first one and you'll lose your work. The best way to avoid this is to regularly change the name of your saved database to something else before you execute save_DB again. This is done by the UNIX command cp (CoPy):

citpig%: cp ala.o ala11-11-94.o

This will make a copy of the file ala.o named ala11-11-94.o. Remember, this is a UNIX command, not an O command -- it will not work from the O > prompt. To get a citpig%: (UNIX) prompt, just open another window, as described in Section 1.3. This is exactly what we did to make a backup copy of the 170_example1.o database in Section 2.2.

Judicious use of save_DB and cp should let you avoid having to do too much over again if you mess up. One suggestion for saving files is to end all the filenames of databases you save via save_DB with ".o", i.e. ala.o, ala_backup1.o, &c. This will help to differentiate these saved databases from all the other files in your (probably cluttered) directory. It's also probably best to use a different database for each problem set, so the files don't get completely unwieldy.

If you wish to exit O and return to the citpig prompt for any reason (such as wanting to logout and leave), just type stop at the prompt. This will automatically save the current database (containing all your work so far) in the file you specified by the save_DB command, or prompt you for a filename to save the current database under if you haven't specified one yet. If you want to quit O without saving at any time, just press the control (CTRL) key and the C key at the same time. This will force O to terminate abruptly.

A convenient feature of O is that you can tell it to open a saved database as it starts up. This lets you jump right back in to your work at the point where you saved it. To start O with a particular database, simply type "ono <name-of-database>" instead of just "ono."

citpig%: ono ala.o

You did this when you started O for the first time, back in Section 2.2. It's very convenient, but what about when you don't have a saved database and need to start O? There should be a file called 170.o in your home directory. This is a saved database that doesn't have any molecules loaded into it. It's a perfect blank slate for starting O -- you can just type "ono 170.o" and get right to work.

Go ahead and use save_DB to save your work normally (just don't call the file 170.o): the database 170.o has been modified so that save_DB should never automatically write over it. If this happens somehow, or if you forget and save a file as "170.o", you can copy a new version of the 170.o file to your directory with the "setup O_files" command.

2.8) Trouble-shooting: what to do if O won't run

Occasionally, right after you try to start O, it displays a line such as "O>File does not exist, try again:", and you will find that you cannot start the program. O needs some variables set up so it knows where to look for certain files. These variables are defined when you "setup O", and without them O will not run properly. In cases when O will not start, you should exit from O using control-c (if the program has not already stopped itself), and try executing "setup O", then restarting O. If it still doesn't work, find a TA.

If O does not start readily with the blank saved database (170.o) provided, or if that database is messed up somehow, delete the file 170.o from your directory. Copy a new version of this file to your directory with

citpig%: setup O_files

CHAPTER SUMMARY

Setting up files O needs:

citpig%: setup O_files

Setting up variables O needs:

citpig% setup O<Enter>

typing ono will start O

citpig%

Starting O:

citpig % ono 170.o

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST

O > Loading 170.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]: Yes

O > Graphics board GL4DPIT-4.0

O > O > Trackball on (F7KEY)

Centering The Display On a Residue or Zone:

O > centre_zone {molecule name} {zone of residues}

Normal Dial Settings:

Zoom Slab

Rot z Trans z

Rot y Trans y

Rot x Trans x

Changing the Dial Settings:

The display menu commands Dial_next and Dial_previous will cycle through the allowed dial settings. The current dial settings are always shown in the lower left hand of the graphics window.

Display Menu Commands:

Dist_Define Gives distance between next two atoms selected.

Angle_Define Gives angle between next three atoms selected.

Trig_Refresh

Causes all trigonometric measurements to be updated dynamically.

Trig_Reset

Turns off all trigonometric measurements. Does not clear IDs.

Clear_Flags Clears active commands. Clears most effects except ID.

Clear_ID

Clears all atom IDs from display.

Yes/No

Used to respond to yes/no prompts.

Centre_ID

Centers display around the next atom selected.

Phi_Psi

Displays phi-psi angles for the next picked residue.

Tor_Residue Displays torsion angles defined by selected atoms and allows rotations about these angles.

Viewing your Protein in Stereo

F9 Toggles stereo viewing on and off.

Spin Causes display to spin. Makes stereo viewing easier.

Saving and Backing up your Database

save_DB O command that saves the current database under the specified filename

cp name1.o name2.o UNIX command that makes a copy of the file name1.o named name2.o

Trouble-shooting: If O won't start: Exit from O using control C. Type "setup O", then restart O. If your copy of the 170.o database is corrupted, copy the file from the student directory using "setup O_files." Chapter 3

BROOKHAVEN COORDINATE FILES AND UNIX COMMANDS

3.1) UNIX commands for file manipulation

Now that you have learned how to display a peptide on the screen and move it around, you are ready to learn how to select a protein structure from the protein structure data base and look at it. First, however, there are a few basic UNIX commands that you will need to know. You need to exit O and return to your citpig prompt to do the next exercise. To do this use the command stop (or press CTRL-C). Alternatively, you can just open another window. The important thing to remember is that these are UNIX commands, not O commands -- they will not work from within O.

The computer is divided into subsections called directories. You do not need to know what these are, simply that they exist. You have your own subdirectory, referred to as your home directory. You will be storing "files" in your home directory, and you will want to be able to look at the names of these files. To list the files in your directory, simply type ls after the citpig prompt, and you will see a listing of the files in your directory. (Note: the ls command does not list files in your directory that begin with a dot or period. The command ls -al will list all files in your directory.)

citpig% ls

170.o 170.omac 170menu.o 170_example1.o 170udp.seq 170test.txt

citpig%

It is customary to name files with a relevant name, then a period followed by the type of file it is. In the above example, one of the files is a test file of the type "txt", which stands for text, so it is called "170test.txt". The file named "170_example1.o" is the database file containing the peptide you just worked with in the prior chapter. The file "170.o" is the standard blank saved database you will usually use to start O. The other files are special files that O uses to run. Feel free to actually do the following examples to help you learn the commands.

You may rename a file with the command mv (MoVe):

citpig% mv 170test.txt 170test.test

citpig%

The above example would rename the file "170test.txt" to "170test.test". You can now check to make sure you really did rename the file by using ls. Be careful using mv -- the command mv filename filename will delete the contents of the file named filename!

You may also create a backup copy of a file with the cp command, as we showed last chapter:

citpig% cp 170test.test 170test.txt

citpig%

This can be useful for making backup copies if you want to change a file, but still want the original around in case you do irreparable damage while changing it. If you use the ls command, you should see two test files now. One called "170test.test" and one called "170test.txt".

You can delete files if you want, but many of the files in your directory are there because they are necessary to run O, or because you will use them later in the manual. If you delete a necessary file accidentally, you can just "setup O_files" again.

To delete a file, use rm:

citpig% rm 170test.test

citpig%

(rm stands for "ReMove".) If you type ls now, you will notice that "170test.test" is gone. You should also erase 170test.txt now.

You may print the contents of a file to the screen by typing more {filename}. This will print the contents of the file to the screen. If the file can not completely fit in your window, then it will fill the window, and the word "--more--" will appear highlighted at the bottom of your window, with the percentage of the file shown so far also highlighted at the bottom of the window. One screen worth of text is referred to as one "page" of text. If you press the space bar the next page of text will fill the screen. If you press "q", on the keyboard, you will return to your prompt. Several of the commands you will use will print text in the "more" format. Anytime you see the highlighted "more" at the bottom of the window, hit space to continue, or "q" to quit. You can get a listing of other commands more accepts by typing a question mark ("?") while it is running. Here is an example:

citpig % more 170udp.seq

UDPA_RESIDUE_TYPE C 338 (1x,5a)

MET ARG VAL LEU VAL

THR GLY GLY SER GLY

TYR ILE GLY SER HIS

THR CYS VAL GLN LEU

LEU GLN ASN GLY HIS

ASP VAL ILE ILE LEU

ASP ASN LEU CYS ASN

SER LYS ARG SER VAL

LEU PRO VAL ILE GLU

ARG LEU GLY GLY LYS

HIS PRO THR PHE VAL

GLU GLY ASP ILE ARG

ASN GLU ALA LEU MET

THR GLU ILE LEU HIS

ASP HIS ALA ILE ASP

--More--(35%)

Go ahead and look through the 170udp.seq file using "more". Don't worry if it doesn't make any sense to you, just get a little practice using "more".

3.2) Let me out!

Another way to exit from printing a file is to hold down the key labeled "control" , this should be on the lower left of the keyboard, and press "c". This is written as "Control- c". Hitting Control- c will exit you from most things. It is your best emergency exit in case something is going very wrong. It's also the best way to quit O if you don't want to save the current database, as we mentioned earlier.

3.3) Special commands for Brookhaven coordinate files

All of the above commands are general to all computers that use the UNIX operating system. Do not worry if you don't understand what that means, just realize that those commands will work on other UNIX computers also. Next will be explained some commands that have been implemented on this computer system to help you examine the protein data base. These commands are unique to the computers in 158 Braun -- you will not find them on other UNIX systems.

When the structure of a protein or DNA molecule is solved by x-ray crystallographic techniques, the researchers involved usually deposit its coordinates into a public access data base called the Brookhaven data base. (Some crystallographers don't deposit coordinates, but that is considered bad form.) Within the Brookhaven data base, every protein has its own coordinate file (called a "pdb" file, for "Protein Data Base"). At the beginning of each coordinate file is information about the protein; such as journal references, people involved in solving the structure, comments about how the structure was solved, and sometimes a listing of the residue ranges of individual secondary structural elements. The x,y,z coordinates of every atom in the structure are then listed, one atom per line. (Almost all the proteins in the data base were not solved at high enough resolution to allow accurate placement of hydrogen atoms, so everything you look at will be missing hydrogen atoms.) Your mission, should you choose to accept it (and pass the class) is, at first, to be able to find the correct protein file to display. Since the coordinate file names consist of an obscure four or five character alpha-numeric code, this is not straight-forward. Luckily for you, there are some convenient commands to help you out.

In order to find the coordinate file of a protein that interests you, it is possible for you to search the Brookhaven data base for key words. The command to do this is:

finp "{key phrase}"

This command stands for "FINd Protein", and will search for all occurrences of what appears in the quotes, and print them to the screen in the "more" format. On the far right of each line will be the protein code for the file that line came from. This command is slow sometimes, so don't be worried if it takes up to a minute to respond. Also, if there are no matches for your key phrase, you will simply be returned to the prompt. Test this function now by typing:

finp "trypsin"

citpig% finp trypsin

COMPND MODIFIED BETA TRYPSIN (MONOISOPROPYLPHOSPHORYL INHIBITED) 1NTP 4

REMARK 1 TITL 3 OF TRYPSIN 1NTP 14

REMARK 1 TITL 4 NEUTRON STRUCTURE OF TRYPSIN 1NTP 28

REMARK 1 TITL 2 CATALYTIC BASE IN TRYPSIN 1NTP 34

REMARK 1 TITL 3 BOVINE TRYPSIN 1NTP 41

REMARK 1 TITL 2 OF /DIP$-TRYPSIN AT 1.5 ANGSTROMS USING A 1NTP 47

REMARK 1 TITL 2 CRYSTALLOGRAPHIC STUDY OF SILVER-TRYPSIN 1NTP 65

REMARK 1 TITL THE STRUCTURE OF BOVINE TRYPSIN,ELECTRON DENSITY 1NTP 70

REMARK 1 TITL STRUCTURE AND SPECIFIC BINDING OF TRYPSIN, 1NTP 77

REMARK 1 TITL 2 /DIP$-INHIBITED BOVINE TRYPSIN AT 2.7 ANGSTROMS 1NTP 85

REMARK 4 RESIDUES 48, 95, AND 115 ARE ASN IN THE NATIVE TRYPSIN 1NTP 105

REMARK 7 CHARGED SIDE CHAIN OF A SPECIFIC SUBSTRATE, GIVING TRYPSIN 1NTP 130

REMARK 8 AS THE PRIMARY CA2+ BINDING SITE OF TRYPSIN BY BODE AND 1NTP 139

REMARK 1 TITL 2 /BPN$* AND ITS RESEMBLANCE TO CHYMOTRYPSIN 1SBTB 45

REMARK 4 ALIGNMENT WITH CHYMOTRYPSINOGEN. IN THIS ALIGNMENT THE 1SGC 36

COMPND TRYPSIN (/SGT$) (E.C.3.4.21.4) 1SGT 4

JRNL TITL 2 TRYPSIN AT 1.7 ANGSTROMS RESOLUTION 1SGT 10

REMARK 1 TITL 2 OF STREPOYCES $GRISEUS TRYPSIN 1SGT 17

REMARK 4 WITH CHYMOTRYPSIN. SEE THE REFERENCE CITED IN THE *JRNL* 1SGT 64

COMPND TRYPSINOGEN-CA FROM PEG 1TGBE 1

JRNL TITL CRYSTAL STRUCTURE OF BOVINE TRYPSINOGEN AT 1.8 1TGB 8

JRNL TITL 4 COMPARISON WITH BOVINE TRYPSIN 1TGB 11

--More--

The above is a direct example from the screen, except for the underlined section. Depending on how large your window is, you may have more, or fewer lines on the screen. You should now practice a little getting around in the "more" format. Hit the space bar to show the next page of text. If you continue to the end, you will be returned to the prompt, or if you wish to stop early, you may press "q" or Control- c.

You should notice that each line has the phrase "trypsin" somewhere in it. You should also notice that the protein codes are listed on the far right, followed by the line number within the file. One of these files is of trypsin from streptomyces, for which the protein code is "1sgt" (and we're pretty sure you wouldn't have guessed that was the file for trypsin without help). This protein will be our example protein. All of the lines above that we are interested in have been underlined.

This "FINP" command will search for all exact matches of the phrase within the quotation marks. So, if one of your friends solved a structure, but you forgot what it was, you could search the data base with their name, since it should be included in the header information. As another example, if you were trying to find the bee venom protein, melittin, but couldn't remember the name, you might think of just searching for "bee". However, if you were to type:

finp "bee"

The computer would display all lines that contained "bee", including all lines that contained the word "been", which occurs rather frequently. If you wanted to just search for "bee", you would need to put a space on either side of the word, that way the computer would search for "bee" as an isolated word, and not part of another word. It should look like this:

finp " bee "

You should write down the protein code for all proteins you see that you think you may be interested in. The next command will allow you to look at any single protein file. It will print the contents of the named protein file to the screen in the "more" format. This will be useful for making sure that you have the correct protein, and also for finding out any other useful information about the protein. On the next page is the format for this command:

lk {protein code}

citpig% lk 1sgt

HEADER HYDROLASE (SERINE PROTEINASE) 13-APR-88 1SGT 1SGT 3

COMPND TRYPSIN (/SGT$) (E.C.3.4.21.4) 1SGT 4

SOURCE (STREPOYCES $GRISEUS, STRAIN K1) 1SGT 5

AUTHOR R.J.READ,M.N.G.JAMES 1SGT 6

REVDAT 1 16-JUL-88 1SGT 0 1SGT 7

JRNL AUTH R.J.READ,M.N.G.JAMES 1SGT 8

JRNL TITL REFINED CRYSTAL STRUCTURE OF STREPOYCES $GRISEUS 1SGT 9

JRNL TITL 2 TRYPSIN AT 1.7 ANGSTROMS RESOLUTION 1SGT 10

JRNL REF J.MOL.BIOL. V. 200 523 1988 1SGT 11

JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 070 1SGT 12

REMARK 1 1SGT 13

REMARK 1 REFERENCE 1 1SGT 14

REMARK 1 AUTH R.J.READ,G.D.BRAYER,L.JURASEK,M.N.G.JAMES 1SGT 15

REMARK 1 TITL CRITICAL COMPARISON OF COMPARATIVE MODEL BUILDING 1SGT 16

REMARK 1 TITL 2 OF STREPOYCES $GRISEUS TRYPSIN 1SGT 17

REMARK 1 REF BIOCHEMISTRY V. 23 6570 1984 1SGT 18

REMARK 1 REFN ASTM BICHAW US ISSN 0006-2960 033 1SGT 19

REMARK 2 1SGT 20

REMARK 2 RESOLUTION. 1.7 ANGSTROMS. 1SGT 21

REMARK 3 1SGT 22

REMARK 3 REFINEMENT. BY THE RESTRAINED LEAST SQUARES PROCEDURE OF J.1SGT 23

--More--(1%)

The above is just an example from the beginning of the file, feel free to page through the file, reading information on the protein, and getting familiar with the file structure. Remember, anytime you see that "--more--" at the bottom of the screen, it means you are in the standard "more" format mentioned several times before. WARNING: Sometimes the Brookhaven entry for a protein will consist only of information about the protein, and will contain no coordinates. (This is the crystallographer's way of announcing that someday soon, he/she is planning to deposit coordinates for that molecule.) Obviously O cannot display a protein from such a file. It is a good idea at this point to look through any Brookhaven file to make sure that it contains coordinates and is not all header. The coordinate lines will each begin with the word ATOM or HETATM.

Sometimes you will want to know exactly where within the protein certain secondary structures occur. The different things you can look for are: 1) the occurrence of [[alpha]]-helices, 2) the occurrence of beta turns, 3) the occurrence of ß-sheets, and 4) the occurrence of disulfide bonds. In addition, you can search for the occurrence of cis-prolines, metals, and carbohydrates. If you search a protein for a structure that it either doesn't contain, or doesn't list (not all files list all secondary structures), then you will be returned to the prompt. Here are the commands and examples:

helix {protein code} (use lower case for protein code)

citpig% helix 1sgt

HELIX 1 A ALA 56 CYS 58 5 1SGT 154

HELIX 2 B1 ASP 165 TYR 172 1 1SGT 155

HELIX 3 B2 GLY 173 GLU 175 5 1SGT 156

HELIX 4 C1 VAL 231 ARG 243 1 1SGT 157

HELIX 5 C2 ALA 242 THR 244 5 1SGT 158

citpig%

The number right after HELIX usually corresponds to which helix the following information is about. So in this protein there are five helices. The numbers following the three letter code usually represent the range, in residues, of the helix. So helix two starts at residue 165, which is ASP, and continues through amino acid 172, which is TYR.

Now we'll look for the occurrence of ß-turns:

turn {protein code} (use lower case for protein code)

citpig% turn 1sgt

TURN 1 T1 ALA 23 GLU 26 1SGT 173

TURN 2 T2 PHE 27 MET 30 1SGT 174

TURN 3 T3 LEU 33 GLY 41 1SGT 175

TURN 4 T4 ALA 48 ILE 51 1SGT 176

TURN 5 T5 ASP 72 SER 76 1SGT 177

TURN 6 T6 SER 76 ALA 80 1SGT 178

TURN 7 T7 ALA 91 TYR 94 1SGT 179

TURN 8 T8 THR 129 ASN 132 1SGT 180

TURN 9 T9 ARG 145 GLY 149 1SGT 181

TURN 10 T10 CYS 168 ALA 171 1SGT 182

TURN 11 T11 VAL 177 GLU 179 1SGT 183

TURN 12 T12 CYS 191 ASP 194 1SGT 184

TURN 13 T13 ASP 194 GLY 197 1SGT 185

TURN 14 T14 ASP 203 ASP 205 1SGT 186

TURN 15 T15 ARG 222 TYR 224 1SGT 187

citpig%

This works the same as helix. So turn two goes from residue 27, which is PHE, to residue 30, which is MET.

You can also look for ß-strand structure within a protein:

sheet {protein code} (use lower case for protein code)

citpig% sheet 1sgt

SHEET 1 S1 7 MET 30 LEU 33 0 1SGT 159

SHEET 2 S1 7 CYS 42 ALA 48 -1 O GLY 44 N VAL 31 1SGT 160

SHEET 3 S1 7 ILE 51 THR 54 -1 O LEU 53 N ALA 45 1SGT 161

SHEET 4 S1 7 ALA 104 LEU 108 -1 O ILE 106 N VAL 52 1SGT 162

SHEET 5 S1 7 VAL 81 GLN 90 -1 O LEU 89 N LEU 105 1SGT 163

SHEET 6 S1 7 THR 65 GLY 68 -1 O GLY 68 N VAL 81 1SGT 164

SHEET 7 S1 7 MET 30 LEU 33 -1 O ARG 32 N THR 67 1SGT 165

SHEET 1 S2 7 THR 135 GLY 140 0 1SGT 166

SHEET 2 S2 7 LEU 156 VAL 163 -1 O VAL 160 N PHE 136 1SGT 167

SHEET 3 S2 7 GLU 180 ALA 183 -1 O CYS 182 N VAL 163 1SGT 168

SHEET 4 S2 7 GLY 226 GLU 230 -1 O TYR 228 N ILE 181 1SGT 169

SHEET 5 S2 7 TRP 207 TRP 215 -1 O TRP 215 N VAL 227 1SGT 170

SHEET 6 S2 7 PRO 198 LYS 202 -1 O ARG 201 N ILE 208 1SGT 171

SHEET 7 S2 7 THR 135 GLY 140 -1 O THR 137 N PHE 200 1SGT 172

citpig%

This information is formatted in a very confusing manner. In the above example, there are two sheets, with seven strands in each of them. Each strand has "S1" or "S2" before it, depending on which sheet it belongs to. Then comes the first and last amino acid in each sheet. This works the same as the others, three letter code followed by residue number. The listings after the -1 are not really important, but if you want to know what they mean there is an explanation in the appendix. In this example there are actually only six strands per sheet, but since they form a beta barrel, the first and seventh strands are listed as the same strand. Do not ask why they did this, unless you would like to call up the crystallographers who submitted the information and ask them. It was their choice for some reason only they fully understand.

The location of cysteine residues involved in disulfide bonds can also be searched for:

ss {protein code} (use lower case for protein code)

citpig% ss 1sgt

SSBOND 1 CYS 42 CYS 58 1SGT 188

SSBOND 2 CYS 168 CYS 182 1SGT 189

SSBOND 3 CYS 191 CYS 220 1SGT 190

citpig%

A list of any cis-prolines in the protein can be obtained:

cpro {protein code} (use lower case for protein code)

citpig % cpro 2hfl

FTNOTE 1 2HFL 124

FTNOTE 1 RESIDUES PRO L 8, PRO L 139, PRO H 150, AND PRO H 152 ARE 2HFL 125

FTNOTE 1 CIS-PROLINES. 2HFL 126

citpig %

A list of any metals or groups containing metals associated with the protein can be displayed using the "metal" command:

metal {protein code} (use lower case for protein code)

citpig % metal 6tmn

HET CA E 1 1 CALCIUM(II) ION 6TMN 215

HET CA E 2 1 CALCIUM(II) ION 6TMN 216

HET CA E 3 1 CALCIUM(II) ION 6TMN 217

HET CA E 4 1 CALCIUM(II) ION 6TMN 218

HET ZN E 5 1 ZINC(II) ION 6TMN 219

citpig %

A list of any carbohydrates associated with the protein can be displayed using the "carb" command:

carb {protein code} (use lower case for protein code)

citpig % carb 1ige

HET NAG A 1 14 N-ACETYL-D-GLUCOSAMINE 1IGE 498

HET NAG A 3 14 N-ACETYL-D-GLUCOSAMINE 1IGE 500

HET NAG A 6 14 N-ACETYL-D-GLUCOSAMINE 1IGE 503

HET NAG A 9 14 N-ACETYL-D-GLUCOSAMINE 1IGE 506

HET NAG B 1 14 N-ACETYL-D-GLUCOSAMINE 1IGE 507

HET NAG B 3 14 N-ACETYL-D-GLUCOSAMINE 1IGE 509

HET NAG B 6 14 N-ACETYL-D-GLUCOSAMINE 1IGE 512

HET NAG B 9 14 N-ACETYL-D-GLUCOSAMINE 1IGE 515

HET FUC A 2 10 FUCOSE 1IGE 499

HET FUC B 2 10 FUCOSE 1IGE 508

HET MAN A 4 11 ALPHA-D-MANNOSE 1IGE 501

HET MAN A 5 11 ALPHA-D-MANNOSE 1IGE 502

HET MAN A 8 11 ALPHA-D-MANNOSE 1IGE 505

HET MAN B 4 11 ALPHA-D-MANNOSE 1IGE 510

HET MAN B 5 11 ALPHA-D-MANNOSE 1IGE 511

HET MAN B 8 11 ALPHA-D-MANNOSE 1IGE 514

HET GAL A 7 11 D-GALACTOSE 1IGE 504

HET GAL B 7 11 D-GALACTOSE 1IGE 513

citpig %

If you are interested in unusual residues or associated molecules in a protein which are neither carbohydrates or metals, you can use the "hetatm" command to display the entire section of the Brookhaven file that deals with such things. This will also give you any metals or carbohydrates present, and often some solvent molecules. Below is an example:

hetatm {protein code} (use lower case for protein code)

citpig % hetatm 6tmn

HET CA E 1 1 CALCIUM(II) ION 6TMN 215

HET CA E 2 1 CALCIUM(II) ION 6TMN 216

HET CA E 3 1 CALCIUM(II) ION 6TMN 217

HET CA E 4 1 CALCIUM(II) ION 6TMN 218

HET ZN E 5 1 ZINC(II) ION 6TMN 219

HET CBZ I 1 10 CARBOBENZOXY GROUP 6TMN 220

HET PGL I 2 5 MODIFIED GLY WITH C=O REPLACED BY PO2 6TMN 221

HET OLE I 3 7 MODIFIED LEU WITH N REPLACED BY O 6TMN 222

citpig%

All of the above information can also be found by reading the contents of the header records using "lk", but these commands are often more convenient.

It will often be useful for you to know the number of residues is your protein. To find this out, as well as the actual sequence, type:

res {protein code} (use lower case for protein code)

citpig% res 1sgt

SEQRES 1 223 VAL VAL GLY GLY THR ARG ALA ALA GLN GLY GLU PHE PRO 1SGT 92

SEQRES 2 223 PHE MET VAL ARG LEU SER MET GLY CYS GLY GLY ALA LEU 1SGT 93

SEQRES 3 223 TYR ALA GLN ASP ILE VAL LEU THR ALA ALA HIS CYS VAL 1SGT 94

SEQRES 4 223 SER GLY SER GLY ASN ASN THR SER ILE THR ALA THR GLY 1SGT 95

SEQRES 5 223 GLY VAL VAL ASP LEU GLN SER GLY ALA ALA VAL LYS VAL 1SGT 96

SEQRES 6 223 ARG SER THR LYS VAL LEU GLN ALA PRO GLY TYR ASN GLY 1SGT 97

SEQRES 7 223 THR GLY LYS ASP TRP ALA LEU ILE LYS LEU ALA GLN PRO 1SGT 98

SEQRES 8 223 ILE ASN GLN PRO THR LEU LYS ILE ALA THR THR THR ALA 1SGT 99

SEQRES 9 223 TYR ASN GLN GLY THR PHE THR VAL ALA GLY TRP GLY ALA 1SGT 100

SEQRES 10 223 ASN ARG GLU GLY GLY SER GLN GLN ARG TYR LEU LEU LYS 1SGT 101

SEQRES 11 223 ALA ASN VAL PRO PHE VAL SER ASP ALA ALA CYS ARG SER 1SGT 102

SEQRES 12 223 ALA TYR GLY ASN GLU LEU VAL ALA ASN GLU GLU ILE CYS 1SGT 103

SEQRES 13 223 ALA GLY TYR PRO ASP THR GLY GLY VAL ASP THR CYS GLN 1SGT 104

SEQRES 14 223 GLY ASP SER GLY GLY PRO MET PHE ARG LYS ASP ASN ALA 1SGT 105

SEQRES 15 223 ASP GLU TRP ILE GLN VAL GLY ILE VAL SER TRP GLY TYR 1SGT 106

SEQRES 16 223 GLY CYS ALA ARG PRO GLY TYR PRO GLY VAL TYR THR GLU 1SGT 107

SEQRES 17 223 VAL SER THR PHE ALA SER ALA ILE ALA SER ALA ALA ARG 1SGT 108

SEQRES 18 223 THR LEU 1SGT 109

citpig%

The number you see before each line of the sequence is the total number of amino acids, which in this case is 223. This number will determine how much room you need when you create your own protein files. Unfortunately, this information, is not always completely acurate, nor is it all you need to know. It is often more useful to use "lk" to search the beginning of your protein file to find out relevant structure information. In the case of this protein, there are an extra 192 solvent water molecules included in the structure information. Each one of these will count as one residue for determining how much space you need. Also the numbering of this protein actually begins at 16 and end at 245. So it is always a good idea to scan the protein info provided.

There is one more important command you will need to know. Once you have decided on your protein, and looked at all relevant structural information that you might want, you must copy the coordinate file to your directory so that O will be able to find it. To copy the coordinate file, just type:

get {protein code} (use lower case for protein code)

get 1sgt

Don't worry if this command takes a little while. Once you have returned to your prompt, type "ls" to make sure the file is in your home directory now. It should be called "1sgt.pdb".

citpig% ls

170.o 170.omac 170menu.o 170_example1.o 170udp.seq 170test.txt 1sgt.pdb

citpig%

You now know everything that you will ever need to type from your prompt. Next you will learn to load a pdb file into O.

3.4) Loading Brookhaven files using the Sam commands

Whenever you wish to display a protein on the screen, you have to load the coordinates for the structure into O. In your first example using O to look at a peptide, the molecule was already loaded into O as part of a saved database. Now you will have to do it yourself. Once you have discovered the protein code for your protein, and have a copy of its pdb file in your home directory (by using "get"), you are ready to load the molecule into O. The Sam commands in O carry out operations on coordinate files. These include loading pdb files into O, making new pdb files for molecules you've changed with O, preparing space in the O database for building a structure, and reading the sequence of a molecule from within O.

The first Sam command you need is Sam_atom_in. This is the command used for loading a molecule into O. Sam_atom_in will first prompt you for the name of the coordinate file containing the molecule. If the filename ends in ".pdb", O assumes the file is a pdb file, and interprets it appropriately. Otherwise, you will be asked to tell O what type of coordinate file you're loading (supported coordinate file types are listed in the O manual). Once all this is established, O asks you to supply a molecule name. This is the name that will be associated with the molecule in the O database. It's best to keep names short, as O only recognizes the first 5 characters. You will find that the protein code (first part of the pdb file name) is often a convenient molecule name. However, feel free to use whatever molecule names you want -- just remember what they are!

For our sample trypsin structure, you would load it into O thus:

First, have O running. If O is not already running, start O using a saved database (like 170.o):

citpig%: ono 170.o

After agreeing to use the graphics display, enter the Sam_atom_in command. When prompted for the filename, type the code for the protein you used with "get", followed by ".pdb". Press Enter, and choose something appropriate for the molecule name. O will ignore anything beyond the first five characters of a name, so keep it short.

O > sam_atom_in

Sam> Name of input file: 1sgt.pdb

Sam> O associated molecule name: 1sgt

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 142658 atoms

Sam> Space for 10000 residues

Sam> Molecule 1SGT contained 416 residues and 1813 atoms

Your molecule is now loaded into the O database. Nothing will show up in the graphics window, because you have not created any objects from this molecule. In O, the molecule serves as an invisible "template" for the creation of visible objects. Creating and manipulating objects will be covered in the next chapter.

Well, how do you know your molecule got loaded? You can use the directory command to search the database -- all entries associated with a molecule start with the molecule name, so the command directory <molecule name>* yields a list of all datablocks associated with a molecule:

O > directory

Heap> Which param blocks : 1sgt*

Heap> 1SGT_ATOM_XYZ R W 5439

Heap> 1SGT_ATOM_B R W 1813

Heap> 1SGT_ATOM_WT R W 1813

Heap> 1SGT_ATOM_Z I W 1813

Heap> 1SGT_ATOM_VISIBLE I W 1813

Heap> 1SGT_ATOM_SELECT I W 1813

Heap> 1SGT_RESIDUE_NAME C W 416

Heap> 1SGT_RESIDUE_TYPE C W 416

Heap> 1SGT_ATOM_NAME C W 1813

Heap> 1SGT_RESIDUE_POINTERS I W 832

Heap> 1SGT_RESIDUE_CG R W 1664

Heap> 1SGT_PDB_HEADER T W 61

Heap> 1SGT_PDB_COMPND C W 10

Heap> 1SGT_PDB_SOURCE T W 65

Heap> 1SGT_SSBOND C W 6

Heap> 1SGT_CELL R W 6

Heap> 1SGT_SPACEGROUP T W 13

Heap> 1SGT_PDB_SCALE R W 12

Heap> 1SGT_DATE T W 25

Well, the molecule's there, but all those entries seem a little cryptic. Examining the contents of a datablock (using the write_formatted) is discussed in Chapter 7 of this primer or O for Morons, if you are curious. Another way to get more information about your protein is by using another Sam command, sam_list_sequence. Sam_list_sequence will prompt you for the molecule name, and will then list the sequence of the molecule on the terminal:

O > sam_list_sequence

Sam> Molecule name []: 1sgt

Sam> Name Type From To Centre Radius

Sam> 16 VAL 1 7 30.50 20.97 39.30 2.92

Sam> 17 VAL 8 14 32.71 21.46 35.31 2.69

Sam> 18 GLY 15 18 35.12 23.75 37.68 1.81

Sam> 19 GLY 19 22 35.33 23.25 40.45 1.66

Sam> 20 THR 23 29 35.62 24.71 44.03 2.70

Sam> 21 ARG 30 40 35.55 21.25 48.86 4.20

Sam> 22 ALA 41 45 31.75 22.97 49.73 1.88

Sam> 23 ALA 46 50 31.84 24.32 52.72 2.00

Sam> 24 GLN 51 59 29.90 21.78 56.31 3.43

Sam> 25 GLY 60 63 27.86 24.97 56.10 1.89

Sam> 26 GLU 64 72 28.93 27.11 52.29 3.72

Sam> 27 PHE 73 83 26.19 26.03 50.29 3.85

Sam> 28 PRO 84 90 22.65 24.23 54.74 2.59

Sam> 29 PHE 91 101 20.58 24.90 50.96 3.76

Sam> 30 MET 102 109 22.31 20.98 48.87 3.22

Sam> 31 VAL 110 116 19.29 17.87 49.44 2.69

Sam> 32 ARG 117 127 22.38 13.63 45.81 4.40

Sam> 33 LEU 128 135 18.13 13.00 46.10 3.07

Sam> 34 SER 136 141 18.46 8.75 45.43 2.38

Sam> 35 MET 142 149 17.21 9.91 41.99 3.25

Sam> 41 GLY 150 153 20.86 11.30 41.19 1.78

Sam> 42 CYS 154 159 19.27 14.07 41.93 2.69

This is a convenient way to find out how many polypeptide chains are in your structure, how many residues there are in each, and the naming convention used. While looking at this listing, you should make a note of the starting and ending numbers for each chain, since the amino acids are not always numbered starting with one as you can see with this example. Sometimes they are numbered with letters also. Melittin is stored as a dimer, with the first subunit being labeled "A1-A26", and the second subunit being labeled "B1-B26". Remember you can use the scroll bar on the left side of the window to scroll through the listing. The coordinates of ordered water molecules within the structure are often stored at the end of the protein, and these are usually labeled with an "H" somewhere in their number. Our example protein starts at residue 16 and ends at residue 245. At the end there are 192 water molecules labeled H1-H192. You will need to know this information for displaying the protein, so spend a little time getting familiar with the sam_list_sequence command and sequence listing. Whenever you load a molecule into O, make a habit of checking the starting and ending residues of each chain.

3.5) Renumbering a coordinate file

If you ever need to change the way residues are numbered in a molecule you have loaded into O, you can use the sam_rename command. Sam_rename accepts as input a molecue name, a zone of residues, and a new number for the first residue. The first residue in the specified zone is given this new number, and subsequent residues are named by incrementing the supplied number (i.e., a first residue number of A12 will rename the first residue in the zone A12, the second A13, and so on). Let's renumber the residues in our protein molecule 1-230 instead of 16-245:

O > sam_rename

Sam> What molecule []: 1sgt

Sam> Residue range [all molecule]: 16 245

Sam> NEW name of FIRST residue [16 ]: 1

O >

3.6) Writing out a pdb file

There is one other important Sam command we will cover right now. In the course of using O, you may make some modifications to the structure of a molecule, such as residue substitutions, deletions, insertions, or distortions. Alternatively, you may construct a hypothetical structure de novo for a peptide or protein. Most of these procedures will be covered in Chapter 7 of this primer. In any case, you will eventually want to create a pdb file from a molecule in the O database. The command sam_atom_out is used for this. You will be prompted for a molecule name and file name (end the filename with .pdb, so O knows what format to use). The values for other inputs will usually be correct, so you can accept them by pressing Enter at each prompt. Note that you can choose to write only the currently selected atoms to the pdb file. This option permits you to make a pdb file for part of a molecule, using the Select commands described in Section 4.2.

This is all you will need to know for now, so once you are done writing the renumbered trypsin structure to a pdb file, type "Control-c" and you will be returned to your "citpig%" prompt. Take a look at the new pdb file using more to verify the new numbering. Remember, you will need to repeat the process of loading a molecule into O each time you want to look at a new structure.

CHAPTER SUMMARY

Useful UNIX Commands:

cp {oldfilename} {newfilename} -- Makes a copy file with a new name.

ls -al -- Lists all files in current directory.

more {filename} -- Lists contents of file to the screen.

mv {oldfilename} {newfilename} -- Changes a files name.

rm {filename} -- Deletes file.

Commands for manipulating Brookhaven files:

carb {protein code} -- Lists carbohydrates in specified protein file.

cpro {protein code} -- Lists cis-prolins in specified protein file.

finp "{phrase}" -- Lists all protein files that contain {phrase}.

get {protein code} -- Creates a pdb file from the specified protein.

helix {protein code} -- Lists a-helix ranges in specified protein file.

hetatm {protein code} -- Lists any unusual residues or any associated molecules in specified protein file.

lk {protein code} -- Lists specified protein file.

metal {protein code} -- Lists metals in specified protein file.

sheet {protein code} -- Lists ß-sheet ranges in specified protein file.

ss {protein code} -- Lists disulfide bonds in specified protein file.

turn {protein code} -- Lists ß-turn ranges in specified protein file.

Reading In Your Protein From A ".pdb" File:

O > sam_atom_in

Sam> Name of input file: 1sgt.pdb

Sam> O associated molecule name: 1sgt

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 142658 atoms

Sam> Space for 10000 residues

Sam> Molecule 1SGT contained 416 residues and 1813 atoms

Other Sam Commands:

sam_list_sequence {molecule name} -- Lists residues of selected molecule

sam_rename {molecule name} {residue range} {new first residue name} -- Renames/numbers selected residues

sam_atom_out {molecule name}{file name} -- Writes coordinates of selected atoms out as a pdb file. Chapter 4

DISPLAYING A PROTEIN STRUCTURE USING O

4.1) Selecting a molecule and building an object

Again, start O using the 170.o database. Use sam_atom_in to load the trypsin pdb file "1sgt.pdb" into O. Once you have loaded your pdb file into O, you are ready to take a look at it. Since you can have multiple molecules loaded into O at any one time, you will need to tell O the name of the molecule you want to work with. Once you select a molecule, O automatically assumes you will be building objects from it until you choose another. Recall also that a saved database includes all of the molecules and objects loaded into O when the database was saved. This means that whenever you leave O and come back, it starts you where you left. To select the current molecule for O, type:

O > molecule_name

O > Current molecule has not been loaded.

Mol> Molecule code name []: 1sgt

O >

O will now be working with the molecule whose code name you have supplied (in this case, 1sgt). If you want to see a list of all the molecules presently loaded into O, type:

O > directory *residue_type

Heap> ALPHA_RESIDUE_TYPE C W 5

Heap> BETA_RESIDUE_TYPE C W 5

Heap> DI_RESIDUE_TYPE C W 2

Heap> 1SGT_RESIDUE_TYPE C W 416

O >

Each molecule loaded into O will have its own {molecule name}_residue_type entry in the database. The only molecule we have loaded so far is 1sgt. The other names (alpha, beta, and di) are template molecules that O uses as standards for secondary structure assignment. Any other molecules you load will be added to the list.

In order to look at the trypsin molecule, you will need to build an object. First, you need to specify the name of the object to build using the Object_name command. Like molecule names, object names can be up to 5 characters long. Once the object is named, you add to it by issuing commands like zone, ca, cover, and sphere. These commands are added to the definition of the object until the end_object command is issued. Once end_object is typed, the object is completed and displayed. Since all the display commands between an object_name and a end_object command are part of the resulting object, it is possible to create highly complex displays with objects built from several display commands. You are not restricted to just display commands, however -- you can use any O commands while in the process of building an object. For now, we'll keep things simple. One common display command is zone. Zone adds all of the atoms in a specified zone of residues to a certain object.

I would recommend setting the zone to display residues 16 through 245 for starters (1 through 230 if you renumbered the molecule); this will leave off the water molecules. You also need to center the object on the screen somehow. If part of the object is visible onscreen, you can use Centre_ID (center around the next atom picked). However, you will often encounter molecules whose coordinates are not initially onscreen at all. In these cases, the centre_zone command is useful -- specifying a molecule name and starting and ending residues centers the view on the center of gravity of all the specified residues. Note the spelling of Centre. O will not recognize "center" (or "color", for that matter -- those crazy Swedes!). Putting all this together, we can build and display an object containing all of the protein atoms in our trypsin structure, and center it in the display window:

O > object_name

Mol> Name of the new object [1SGT ]: all

O > zone

Mol> Zone [all molecule]: 16 245

O > end_object

O > centre_zone 16 245

As4> No object defined.

As4> 1SGT 16 245 ALL

As4> Centering on zone from 16 to 245

O >

The graphics display now contains a molecule named all. You can see a ^ALL entry added to the bottom of the menu. You can try IDing atoms in the structure using the mouse, or you can look at phi and psi angles for any residue, or look at its torsion angles using the menu commands you learned about in Chapter 2.

Some of you may notice that the protein you are looking at has several odd features. The most obvious problem is that there are some breaks in the peptide backbone. In addition, several of the cysteine and methionine residues (we'll let you find which ones) in this structure contain atoms that are not connected. This occurs because some of the atoms in these residues were too far away for O to consider them part of the same residue. (Obviously, this structure was not well refined, or the residues would conform more to ideal bond distances.) You can fix this permanently in your file by using the regularization routine refi_zone. Refi_zone gradually alters the coordinates of the selected zone to more closely conform to ideal bond angles, torsions, and lengths. Enter the command, then pick a zone to regularize in the vicinity of the offending residue (include a few residues on either side) by selecting with the mouse. When prompted to keep or discard the refined coordinates, click Yes on the menu. Be aware that this is regularization, not refinement -- you are not improving the model based on any experimental data.

Manipulations such as refi_zone or move_zone (and others we will see later) change the coordinates of the molecule underlying the visible objects. Sometimes these changes are not accurately reflected by the objects associated with the molecule, or the changes result in wildly distorted objects. Consequently, it is usually a good idea to re-build all affected objects after carrying out such commands. Unfortunately, the only way to do this is to build each object all over again. You can use the same name as the prior object -- the new object will just replace the old one.

4.1a) Carbon-[[alpha]] objects. You are now looking at all the main and side chain atoms of trypsin and you probably won't be able to pick out where the helices and strands are. (You can try IDing atoms at the start and ends of what you think are helices or strands. Remember, the residue ranges of the secondary structural elements in trypsin were listed in the header of the Brookhaven file 1sgt.)

In order to follow the chain trace of the protein, you can create a simpler object that contains only the main chain carbon-[[alpha]] atoms. In doing this, you will be eliminating all side chain atoms, as well as the main chain carbonyl carbons and oxygens and the amide nitrogens, and drawing bonds (<=4.2Å in length) between carbon-[[alpha]] atoms. A poorly refined structure may have distances of >4.2Å between carbon-[[alpha]]s, which will result in gaps in the object. This is because O applies a strict upper bound on the permitted "length" of a peptide bond. To make this carbon-[[alpha]] object, use the ca_zone command instead of the zone command in building your object. The input to ca_zone is a zone of residues, exactly like zone. Here is an example of such an object containing only the carbon [[alpha]] backbone.

O > object_name

Mol> Name of the new object [1SGT ]: trace

O > ca_zone

Mol> Ca zone [all molecule]: 16 245

O > end_object

If you want, try using zone or ca_zone to produce an object that only includes one of the alpha helices (find the names of residues at either end of the helix by IDing with the mouse). Experiment with displaying all the atoms of one helix at the same time as a C[[alpha]] trace of the entire protein.

4.1b) Cover_sphere and Sphere_centre. There are two other basic commands for adding residues to an object. They are cover_sphere and sphere_centre. Both work in a relatively similar fashion, adding a group of atoms or residues close together in the molecule to the current object. Cover_sphere first asks for a residue, or residue and atom. Next, a radius (in Å) is requested. The command then adds to the object all the residues that have at least one atom within the selected radius of the input residue or atom. The following example uses this effect to display all the residues contacting His57, one of the catalytic triad of residues in the active site of trypsin.

O > object_name

Mol> Name of the new object [1SGT ]: by57

O > cover_sphere

Mol> Specifying just a residue forces residue covering.

Mol> Specify sphere centre by residue and atom : 57

Mol> Residues will be chosen if within a radius of [ 0.00] : 1.0

O > end_object

Sphere_centre also adds a sphere of elements to the object. In this case, however, the user only specifies a radius (again in Å) and O proceeds to add all of the atoms in the molecule that are within that radius of the current display center (as set by Centre_ID, Centre_zone, etc.) to the object. This is a good way of visualizing all of the atoms in a close area, like a pocket or active site of an enzyme. An object that shows the active site of trypsin (incliding the His57, Asp102, Ser 195 catalytic triad) is built below:

O > object_name

Mol> Name of the new object [1SGT ]: active

O > centre_atom

As3> Define molecule [1SGT], residue, and atom [CA]: 1sgt 57 cb

O > sphere_centre

Mol> Residues will be chosen if within a radius of [ 0.00] : 8

O > end_object

4.1c) Deleting objects. At some time, you will want to delete an object you have created. This is accomplished by the command delete_object. O will ask you which object you want to delete. Type the name of the object, and press return. O will delete that object, and prompt you for the name of another object to delete. Once you've removed all the objects you want to, press return when asked for an object name to get back to the O prompt.

O > delete_object

Mol> Objects = ALL TRACE BY57 ACTIVE

Mol> Object name ( <CR> = exit ) : by57

Mol> Objects = ALL TRACE ACTIVE

Mol> Object name ( <CR> = exit ) : <Enter>

O >

4.2) Making more complex objects by selection

There are some other techniques for adding atoms to an object. The majority of these involve the Select commands: select_on, select_off, select_property, select_visible, and select_invert. In O, selection is a molecule property where each atom can have a value of "on" or "off" associated with it. These values can then be used to draw objects using only the "on" atoms.

The general technique for producing an object that only includes selected atoms is to start building the object, select only those atoms you want to be visible, apply a display command to the entire molecule (which will only display the selected atoms) and then execute end_object.

There are four commands that let you select atoms on and off. The two most obvious are select_on and select_off -- these both accept a molecule and a zone of residues as input, and then select or deselect the specified zone, respectively. It's usually best to first select the entire molecule off and then select specified atoms and residues back on when building a complex object.

O > select_off

Sel> What molecule [1SGT ]: 1sgt

Sel> Residue range [all molecule]: 16 245

Select_property is more sophisticated than select_on or select_off. This command will select on or off those atoms defined by a logical expression. The inputs, in order, are <property>, <operator>, <value>, and <on/off>. <Property> is the property used for your selection criteria -- examples (and corresponding values) are:

residue_type: three-letter amino acid code

atom_name: c, ca, n, o, etc.

other, numeric properties like

pep_flip: measure of correct peptide bond orientation

atom_b: temperature factor for each atom, as specified in the PDB file

The operator can be an equality (=, equal, or ^=, not equal) or any standard inequality (<, >, <=, or >=). Residues are selected to be on or off, whichever is input, based whether or not they fulfill the logical expression <property> <operator> <value>. For example,

O > select_property

Sel> Property? [atom_name]: residue_type

Sel> Operator (< > <= >= ^= [=]): =

Sel> Value? []: tyr

Sel> [On]/off: on

selects on all tyrosine residues in the molecule, while

O > select_property

Sel> Property? [atom_name]: residue_type

Sel> Operator (< > <= >= ^= [=]): ^=

Sel> Value? []: trp

Sel> [On]/off: off

turns off all non-tryptophan residues in the molecule. Similarly,

O > select_property

Sel> Property? [atom_name]: atom_b

Sel> Operator (< > <= >= ^= [=]): >

Sel> Value? []: 5

Sel> [On]/off: on

selects on all atoms with temperature factors greater than 5.

Select_invert flip-flops the selection data, selecting all previously unselected atoms and deselecting all previously selected atoms. It's like making a photographic negative of your previous selection. After you have used these four commands to select only the atoms you want in your object, you must issue the command select_visible. This converts the selection information into visibility, making selected atoms visible and deselected atoms invisible. Remember, selection is a property of the underlying molecule. In order to have this property affect objects you build, you have to select_visible. Only the currently selected residues will be displayed in any object you build until the next select_visible is executed.

Once you have finished selecting atoms and adjusting their visibility, issue the display commands defining your object. It is not possible to mix selected and unselected display commands in a single object. You must use multiple objects to achieve such effects. The following example uses the select commands to produce an object containing all the tyrosines in trypsin:

O> object_name tyr

O> select_off 1sgt 16 245

O> select_property residue_type = tyr on

O> select_visible

O> zone 16 245

O> end_object

O> select_on 1sgt 16 245

O> select_visible

In general, you should make all the atoms visible again (by selecting the entire molecule with select_on and then running select_visible) once the object is complete to avoid accidentally messing up the next object you build.

4.3) Colo(u)ring things

Well, you can now build objects pretty competently, but you're still at the mercy of O when it comes to coloring them. O provides two mechanisms for coloring things. Both molecules and objects can be colored. When an object is created, it takes its coloration from the underlying molecule. This is similar to the way that an object initially takes its coordinates from the associated molecule. After creation, the object itself can be re-colored, but this coloration has no effect on the parent molecule. Also, changing the coloration of a molecule after an object has been created will not change the colors of that object.

4.3a) Coloring by object. To color by object, you first have to select a color. This is done by using the paint_colour command, followed by the color you want to use. O accepts a broad range of named colors, but will also accept color descriptions in integer values. See the O manual for details. Some of the color names accepted by O include red, green, blue, yellow, orange, cyan, white, black, light_blue, medium_blue, dark_blue, magenta, gray, grey, and wheat. A full listing of the named colors accepted by O is found in the Appendix of the O Manual, or you can get it by typing a question mark (?) as input for paint_colour. Paint_colour is very straightforward to use, provided you spell it properly:

O > paint_colour

Paint> Colour? [orange]: medium_aquamarine

Once a color is selected, there are three paint commands that will apply that color to an object. They are paint_object, paint_obj_zo, and paint_obj_at. Paint_object will apply the current color to every atom in the object you either specify by name or select with the mouse. Either supply the name of the object you want to color when you type the command (paint_object {object name}) or start the command, then ID an atom in a visible object with the mouse. The picked object will be colored the selected color.

Paint_obj_zo(ne) will apply the current color to a zone of residues defined either with the mouse or by text input. If you just type paint_obj_zo<Enter>, O will prompt you to ID the first and last atoms of the zone you want to color. Make sure the object you want to color is the only one visible, then make your selections. Otherwise, O might color some other object with similar coordinates.

To use paint_obj_zo without the mouse, you have to follow the command with the molecule name, starting and finishing residues, and name of the object to color before you press Enter. This is good for highlighting secondary structural features in a hurry, for example:

O > paint_obj_zo 1sgt 45 65 all

Paint> 1SGT 45 65 ALL

This will quickly apply the current color to the residues from 45 to 65 in the object named all, built from the molecule 1sgt. Also note that if you set the beginning and end of the zone to the same residue, you will paint that residue only. Using this feature, you can quickly highlight residues of interest in an object:

O > paint_colour green

O > paint_obj_zo 1sgt 57 57 all

The final object-painting command is paint_obj_at(om). This permits the user to change the color of a single atom in an object by either picking it with the mouse or specifying it by molecule, residue, atom and object. You probably won't use this command very much -- it's usually better to color by molecule when generating atom-by-atom color schemes.

4.3b) Coloring by molecule. When coloring a molecule, you no longer use the paint_colour command. Instead, the commands paint_property, paint_ramp, paint_case and paint_zone are used to apply color directly to the molecule based on some atom or residue property. With all of these commands, the last input you give is always the color to use with the command.

The simplest of these commands to use is paint_zone. Paint_zone requires that you specify a molecule name, start and end residues for the zone, and a color. It then colors all atoms in the zone the specified color.

Paint_ramp is slightly more complex. It applies a smooth gradation of colors to atoms based on some property that has a numeric value for each atom. These properties are the same as some of those listed for the select_property command in section 4.2. Paint_ramp requires the name of the property, a range of values for the property, and two colors to shade between. Choosing red and blue for colors will give you a full spectrum of gradation. There are only two properties commonly used with paint_ramp: residue_irc and atom_b. Residue_irc is O's internal notation for the order of residues in the polymer. The first residue has a residue_irc of 1, the second 2, and so on, in the exact order they are listed in the pdb file. Consequently, the most common use of paint_ramp is to produce a c[[alpha]] object that smoothly shades from red at the N-terminus to blue at the C-terminus:

O > paint_ramp

Paint> Colour-ramp a property in molecule 1SGT

Paint> Property [residue_irc] : residue_irc

Paint> Minimum and maximum value of property [1 416] : 16 245

Paint> First colour [red] : red

Paint> Second colour [blue] : blue

O> object_name hued

O> ca_zone 16 245

O> end_object

Using paint_ramp to color residues based on their temperature factor is also useful. It helps identify more disordered areas of the structure, or questionably refined residues (residues with B-values much higher than their neighbors).

Paint_property also makes use of the idea of residue or atom properties. However, it is much more like select_property than paint_ramp is. In fact, paint_property requires a similar <property> <operator> <value> expression as input, followed by the color to paint atoms that fulfill the expression. This can be readily used to highlight all residues of a certain type in a molecule:

O > paint_zone

Paint> What molecule [1SGT ]:

Paint> Residue range [all molecule]: 16 245

Paint> Colour? [medium_aquamarine]: yellow

O > paint_property

Paint> Property? [atom_name]: residue_type

Paint> Operator (< > <= >= ^= [=]): =

Paint> Value? []: phe

Paint> Colour? [yellow]: blue

O > paint_property

Paint> Property? [atom_name]: residue_type

Paint> Operator (< > <= >= ^= [=]): =

Paint> Value? []: trp

Paint> Colour? [blue]: violet

The above will color all the residues in the molecule yellow, then change all phenylalanines to blue and all tryptophans to violet. Remember, you won't see these changes until an object is built from the molecule!

The final molecule painting command is paint_case. Again, this command uses O's notion of properties to select colors for atoms. However, paint_case allows you to pick a property, and then assign different colors to multiple different values of that property all at once. This is very useful if you want to color a molecule by atom type. The paint_case command can easily color all carbons white, nitrogens blue, oxygens red, sulfurs yellow and phosphorus green.

O > paint_case

Paint> Colour-case a property in molecule 1SGT

Paint> Property [atom_z] : atom_name

Paint> How many cases [8] ? 5

Paint> Enter property values [ 1 2 3 4 5] : Paint> Property value 1 : c*

Paint> Property value 2 : n*

Paint> Property value 3 : o*

Paint> Property value 4 : s*

Paint> Property value 5 : p*

Paint> Enter 5 colour names: Paint> Colour? [blue]: white

Paint> Colour? [white]: blue

Paint> Colour? [blue]: red

Paint> Colour? [red]: yellow

Paint> Colour? [yellow]: green

O >

As you can see, paint_case first requires the property in question, then an integer defining the number of different values accepted for that property (in the above case, it's 5: c*, n*, o*, s*, and p*). You are then required to input each value (in our example, the wildcards (*) are included so atoms like c[[alpha]] and c[[gamma]] are the same color) and then the color to associate with each value.

CHAPTER SUMMARY

Changing The Current Molecule:

O > molecule_name

O > Current molecule has not been loaded.

Mol> Molecule code name []: 1sgt

Centering The Display On A Zone Of Residues

O > centre_zone 16 245

As4> No object defined.

As4> 1SGT 16 245 ALL

As4> Centering on zone from 16 to 245

Building An Object:

O > object_name

Mol> Name of the new object [1SGT ]: all

O > {issue any number of display commands to add atoms to the object here}

O > end_object

Adding A Range Of Residues To An Object:

O > zone

Mol> Zone [all molecule]: 16 245

Adding Only Carbon-[[alpha]] Atoms To An Object

O > ca_zone

Mol> Ca zone [all molecule]: 16 245

Deleting An Object

O > delete_object

Mol> Objects = ALL TRACE BY57 ACTIVE

Mol> Object name ( <CR> = exit ) : by57

Mol> Objects = ALL TRACE ACTIVE

Mol> Object name ( <CR> = exit ) : <Enter>

O >

Adding Only Selected Atoms To An Object:

O > select_off

Sel> What molecule [1SGT ]: 1sgt

Sel> Residue range [all molecule]: 16 245

O > {issue any number of selection commands here}

O > select_visible

O > {now build your object}

Selecting Residues By Zone:

O > select_on

Sel> What molecule [1SGT ]: 1sgt

Sel> Residue range [all molecule]: 16 35

Selecting Residues By Property:

O > select_property

Sel> Property? [atom_name]: residue_type

Sel> Operator (< > <= >= ^= [=]): =

Sel> Value? []: tyr

Sel> [On]/off: on

Coloring by Object:

O > paint_colour

Paint> Colour? [orange]: medium_aquamarine

O > paint_obj_zo 1sgt 45 65 all

Paint> 1SGT 45 65 ALL

Coloring by Molecule:

O > paint_ramp

Paint> Colour-ramp a property in molecule 1SGT

Paint> Property [residue_irc] : residue_irc

Paint> Minimum and maximum value of property [1 416] : 16 245

Paint> First colour [red] : red

Paint> Second colour [blue] : blue

O > paint_zone

Paint> What molecule [1SGT ]:

Paint> Residue range [all molecule]: 16 245

Paint> Colour? [medium_aquamarine]: yellow

O > paint_property

Paint> Property? [atom_name]: residue_type

Paint> Operator (< > <= >= ^= [=]): =

Paint> Value? []: phe

Paint> Colour? [yellow]: blue

Remember that changes to the molecule's coloration will not be visible until the next object is built from the molecule.

Chapter 5

More O Tricks

5.1) Graphics shortcuts

The dials are good for carefully manipulating the objects in the graphics window. However, they can be a little cumbersome if you want to manipulate the view quickly. As an alternative, O permits you to carry out the commands normally associated with the dials by using the mouse. When the red mouse pointer is in the graphics window, you can move the mouse

while holding the: to cause:

right mouse button xyz rotation

right mouse button + shift key x/y translation

right + middle mouse buttons + shift key z translation

right + middle mouse buttons zoom

right + left mouse buttons slab

You will find that some of these are useful, and some are just a pain. Use whatever combination of mouse and dials you find comfortable.

5.2) Command shortcuts

There are numerous shortcuts to typing commands in O. The first of these is abbreviation. You do not have to type the entire name of a command, just enought to uniquely specify it. Throughout this primer, we do not truncate any commands in the interests of readability. However, it makes entering commands much faster. Since the only command that starts with "object" is object_name, you can type

O> obj <name of object> instead of O> object_name <name of object>

"dir" can be substituted for "directory", "zo" for "zone", "ca" for "ca_zone", and so on. Also, O considers truncation word by word, so "pai_o_z" will replace "paint_object_zone". Be careful, though; if you truncate a command too much, O will let you know that it cannot determine what you want it to do:

O > pai_obj

O > PAI_OBJ is not a unique keyword.

O > Paint_object is a possibility.

O > Paint_obj_zo is a possibility.

O > Paint_obj_at is a possibility.

O > PAI_OBJ is not a visible command.

Another way to speed things up is to not bother waiting for O to prompt you for inputs to a command. Once you've used a command several times, you probably know exactly what it requests as input. You can then speed things up by typing the command and all of its inputs on the same line. The inputs following the command are read in the usual order. If you fail to supply all the inputs, O will request the remainder. Using this trick,

O > paint_ramp residue_irc 16 245 red blue

can replace

O > paint_ramp

Paint> Colour-ramp a property in molecule 1SGT

Paint> Property [residue_irc] : residue_irc

Paint> Minimum and maximum value of property [1 416] : 16 245

Paint> First colour [red] : red

Paint> Second colour [blue] : blue

Remember that when typing the command and inputs on the same line, you should not press <Enter> after the command. Instead, press <Enter> after the last input. One other thing you may have noticed when typing in all the inputs one-by-one is that most inputs have defaults. These are the bracketed values that O lists right before asking you to type an input -- they are usually the most common values for the input in question. When entering inputs one-by-one, you can just press <Enter> to accept the default value for any input.

O > paint_ramp

Paint> Colour-ramp a property in molecule 1SGT

Paint> Property [residue_irc] : <Enter>

Paint> Minimum and maximum value of property [1 416] : 16 245

Paint> First colour [red] : <Enter>

Paint> Second colour [blue] : <Enter>

This obviously won't work when you enter both command and inputs on the same line. In such a case, a semicolon (;) will have the same effect.

O> paint_ramp ; 16 245 ;;

is the same as the three prior instances of the paint_ramp command, and much faster to use.

There is one last technique for entering commands into O more quickly. In addition to putting both a command and its associated inputs on the same line, you can put multiple command+input pairs on the same line. For example, the following will quickly produce an object (imaginatively named ALL) containing all the atoms of the 1sgt molecule:

O> mol 1sgt obj all zo ; end

For further examples of these truncation techniques, see the example O sessions in Appendix A.

5.3) Macros

A macro is a text file, external to O, that consists of a series of O commands and inputs. Macros can be very sophisticated, including both prompts for the user and UNIX commands. It is relatively straightforward to construct a macro to do whatever you want, given a text editor (like vi) and a good working knowledge of O. However, the design and construction of macros is beyond the scope of this primer. If you are interested, check out O for Morons, which clearly explains the macro-creation process.

Macros are executed by typing the "at" sign (@) immediately followed by the filename of the macro. Since these are filenames and not commands, you cannot truncate macro names or use any of the shortcuts from section 5.2.

O> @<macro filename>

O comes with a host of macros stored in the directory /omac, so you will usually execute a macro with

O> @omac/<macro filename>

The next section lists some macros that will be very useful for the homework sets. Numerous other macros are listed in the O manual and O for Morons, or examine the contents of /usr/local/src/Ono/omac for yourself.

5.4) Some important macros

5.4a) Making van der Waals and solvent-accessible surfaces. Probably the most useful thing you can do with macros is to create van der Waals or solvent-accesible surfaces around the atoms of your molecule. There are several macros that carry out various stages of the process. The two main surface-building macros are @local/make_vdwmod.omac, for constructing a van der Waals surface around a molecule and @local/make_surfmod.omac, for constructing a solvent-accessible surface around a molecule. For the sophisticated, neither routine permits you to define a context adequately. This is a substantial limitation, because a proper context is often necessary for a realistic surface. For instance, one atom in context of just itself would be a closed sphere, but in context of a larger molecule would just be the portions of that sphere exposed to solvent.

Once the surface has been generated, the macro @local/surfing.omac will redraw a portion of the surface near the current display center. You don't want to display the entire surface at one time -- it's impossible to see things, and the display slows down incredibly. For more information on manipulating surfaces, see Chapter 6 of O for Morons.

5.4b) Quickly loading and displaying molecules. The macro @local/load_pdb.omac will prompt for a molecule name and then a pdb filename. It will then load the molecule into O and use it to produce three objects: ALL (all the atoms), CA (a rainbow-colored trace), and TOON (a solid cartoon of the molecule, with pink spirals for helices, green ribbons for sheets, and yellow tubes for loops).

5.4c) A rainbow C[[alpha]] object. @omac/rainbow.omac will ask for a molecule name and then produce a C[[alpha]] object for that molecule, smoothly graded from red at the N-terminus to blue at the C-terminus.

5.4d) Coloring by standard atom colors. @omac/cnos_colours.omac will color the current molecule by atom. Carbons will be yellow, nitrogens blue, oxygens red and sulfurs green.

5.4e) Secondary structure analysis. The macro @omac/yasspa.omac will run a rough secondary structure assignment algorithm on a molecule chosen by the user, then produce a C[[alpha]] trace where [[alpha]]-helices are colored red, [[beta]]-sheets green, and loops yellow. The algorithm is imperfect, but will do a good job of broadly highlighting secondary structure elements.

5.4f) Changing the Save filename. The macro @omac/save_as will change the filename that your database is saved under. The next time you Save_DB it will use the new filename.

5.4g) Making a Ramachandran plot. While not strictly a macro, the UNIX command $ODAT/omac/make_rama.csh will ask you for a pdb file and produce a Ramachandran (Phi, Psi) plot from it. This file will be named "<protein code>_rama.ps". It is a PostScript file that can be viewed with the program xpsview or printed on the HP LaserJet next to howie. For more background on PostScript files and printing, see Chapter 9 or the "Computer Usage Notes."

5.5) Making pretty pictures using the Sketch commands

In general, the graphics that we have produced with O so far have been limited to wire-frame representations. There are two ways to produce prettier solid-surface graphics within O. The first is the command cpk_object, used to make ball-and-stick versions of preexisting objects. An alternative is to use O's powerful and flexible library of Sketch commands to produce a "sketch object." A "sketch object" differs from a normal O object primarily in that it is not saved as part of the O database. If you create a sketch object, save_DB, and restart O with the saved database, your sketch object will no longer be there. A ball-and-stick object produced with cpk_object, however, will be saved in the database as usual. However, cpk_object objects cannot be plotted (Chapter 9); most sketch objects can. One other warning about using sketch objects: the graphics board in the machine citpig is not capable of coloring sketch objects properly -- they will show up as solid black. If you want to use sketch objects, use goose, pi, or covalent instead.

5.5a) Drawing ball-and-stick objects. The O command cpk_object will produce a space-filling model of an object. The object usually comes out more ball-and-stick-like than space-filling, but it looks pretty good. This object will be saved as part of the O database, but cannot be plotted with the Plot commands.

To use this command, first build an object containing all the atoms you want in your space-filling model (all the atoms of your ligand, for instance). Once the object is built, issue the command cpk_object {object name} and wait -- it may take a little while for the space-fillling representation to appear.

5.5b) Drawing solid objects. Solid and wireframe "sketch objects" are produced by the Sketch command. This is mainly useful for producing pretty pictures for plotting or capture with the snapshot program (see Chapter 9). This section will only cover some of the basic sketch objects that can be produced. For more complex sketch objects, see the O manual and Chapter 9 of O for Morons.

Similar to regular O objects, each sketch object is started by supplying its name to the sketch_object command:

O > sketch_object

Sketch> Sketch object name? [sketch]: hlx2

Sketch> Making visibility data structures.

Graphical elements can be added to the current sketch object with the sketch_add command. However, the type of graphical element needs to be set first with the command sketch_type. Sketch_type prompts the user to enter the name of the desired sketch type. Some sketch types are

ribbon: A ribbon of smooth lines guided by the main chain coordinates of the molecule -- good for showing an entire protein molecule, both [[beta]]-sheets and [[alpha]]-helices.

cylinder: A solid cylinder the approximate diameter of an alpha helix is drawn along the axis of the specified helix. Obviously, only good for [[alpha]]-helices.

spiral: A solid spiral is drawn tracing the backbone of a specified [[alpha]]-helical region.

arrow: Draws a broad arrow along a specified [[beta]]-sheet region, with an arrowhead at the C- terminus of the region.

rattler: Draws a smooth looking small diameter tube along the selected backbone. Good for loops between other sketch types, but ribbon is better if the whole protein is drawn in a single style.

Once the sketch type is chosen, the sketch_add command can be used. Sketch_add must be immediately followed by a molecule specification and a zone. The selected type is applied over the specified zone of the molecule and the result added to the current sketch object. The C-terminal helix of trypsin is drawn as a cylinder in the following example:

O > sketch_type cylinder

O > sketch_add 1sgt 231 245

Sketch> Define type cylinder

Sketch> 1SGT 231 245 YASSPA

Sketch> Finding segments in molecule.

Sketch> Central atom: CA

Sketch> Max distance between central atoms: 4.20

Sketch> done.

Sketch> Drawing cylinder from 231 to 245

Like the usual O objects, manipulations and colorations of a sketch object do not affect the molecule on which the object is based. Also, sketch objects take their coloration from the current (molecule) color of the central ([[alpha]]-carbon) atom of each residue. If you make a mistake with sketch_add, you can delete the last sketch_add command by executing sketch_undo. Unlike regular objects, however, you do not need to explicitly end a sketch object; sketch_add commands are drawn as they are issued.

One other useful sketch command is sketch_auto. If you have previously made secondary structure assignments for your molecule by running the yasspa.omac macro, sketch_auto can use that information to automatically create a detailed sketch object consisting of rattler loops, spiral helices, and arrow strands. The only information required is the molecule and zone:

O > sketch_auto 1sgt 16 245

Sketch> Making cartoon from 16 to 245

Sketch> Use the secondary stucture colouring scheme? [Yes]: yes

O > Macro in database.

Sketch> Making visibility data structures.

O > O > O > Sketch> Define type spiral

Sketch> 1SGT 165 171 YASSPA

Sketch> Drawing spiral from 165 to 171

. . .

O > Sketch> Define type arrow

Sketch> 1SGT 226 228 YASSPA

Sketch> Drawing arrow from 226 to 228

. . .

O > Sketch> Define type rattler

Sketch> 1SGT 244 245 YASSPA

Sketch> Drawing tube from 244 to 245 (C-term)

Sketch> Drawing tube from $246 to $246 (C-term)

O > O > Heap> Deleted .BUGS_BUNNY

The previous four commands will permit you to create relatively complex representations of your proteins. Be aware that you can display sketch and regular objects simultaneously to good effect. A good example is the following construction of a ribbon and cylinder representation of 1sgt that includes the catalytic triad of residues in the active site. For more complex sketch objects, you should consult the other O manuals in 158 Braun.

O > paint_zone 1sgt 16 245 red

O > obj triad zone 57 57 zone 102 102 zone 195 195 end

O > O > sketch_object solid

Sketch> Making visibility data structures.

O > sketch_type cylinder

O > sketch_add 1sgt 164 171

Sketch> Define type cylinder

Sketch> 1SGT 164 171 TRIAD

Sketch> Drawing cylinder from 164 to 171

O > sketch_add 1sgt 231 245

Sketch> Define type cylinder

Sketch> 1SGT 231 245 TRIAD

Sketch> Drawing cylinder from 231 to 245

O > sketch_type ribbon

O > sketch_add 1sgt 16 245

Sketch> Define type ribbon

Sketch> 1SGT 16 245 TRIAD

Sketch> Drawing ribbon from 16 (N-term) to 245 (C-term)

Sketch> Drawing ribbon from $246 to $246 (C-term)

Sketch> Zone too short for ribbon

CHAPTER SUMMARY

Mouse Shortcuts:

move the mouse while holding the: to cause:

right mouse button xyz rotation

right mouse button + shift key x/y translation

right + middle mouse buttons + shift key z translation

right + middle mouse buttons zoom

right + left mouse buttons slab

Running a Macro:

O > @<macro name>

Some Useful Macros:

@local/make_vdwmod.omac: makes a van der Waals surface for the chosen molecule

@local/make_surfmod.omac: makes a solvent-accessible surface for the chosen molecule

@local/surfing.omac: redraws the parts of the above two surfaces within 12.5 angstrom of the display center

@local/load_pdb.omac: loads the PDB file specified and builds useful objects from the

loaded molecule

@omac/rainbow.omac: produces a rainbow-colored ca_zone object of the chosen molecule

@omac/cnos_colours.omac: colors selected molecule by atom

@omac/yasspa.omac: performs simple secondary structure analysis and produces a

ca_zone object colored by secondary structure

@omac/save_as: changes the filename used by Save_DB

Starting A Sketch Object:

O > sketch_object

Sketch> Sketch object name? [sketch]: hlx2

Sketch> Making visibility data structures.

Setting The Sketch Type:

O > sketch_type cylinder

Adding Residues To A Sketch Object:

O > sketch_add 1sgt 231 245

Sketch> Define type cylinder

Sketch> 1SGT 231 245 YASSPA

Sketch> Finding segments in molecule.

Sketch> Central atom: CA

Sketch> Max distance between central atoms: 4.20

Sketch> done.

Sketch> Drawing cylinder from 231 to 245

Building A Secondary Structure Cartoon

O > @yasspa.omac

. . .

O > sketch_auto 1sgt 16 245

Sketch> Making cartoon from 16 to 245

Sketch> Use the secondary stucture colouring scheme? [Yes]: yes

. . .

Chapter 6

Superimposing Structures by Least Squares

6.1) Generating a transformation with Lsq_explicit and Lsq_improve

The easiest way to compare two related protein structures is to superimpose one upon the other. O has facilities for easy comparison of similar structures in different proteins by rotating the coordinates for one protein to best line up with the other protein, using a least squares regression technique. The first part of this superimposition requires you to break each protein into individual residue ranges to be compared. Within each residue range specified, the total number of atoms must be the same in each protein. In general, superimposition is only carried out on carbon-[[alpha]] coordinates due to this requirement.

In the example below, we are going to superimpose an immunoglobulin constant domain from the Fc region upon one of the immunoglobulin-like domains of the histocompatibility molecule HLA-A2. As you will have learned in class, antibody constant domains are composed of 7 ß-strands organized in 2 ß-pleated sheets. HLA-A2 contains two domains with immunoglobulin folds in which individual ß-strands superimpose well upon the ß-strands of antibody constant domains, but the turns between individual ß-strands are different. Because most of the turns in HLA are shorter than those in an antibody constant region, there are fewer residues in the HLA immunoglobulin-like domains than in antibody constant domains.

First use the "get" command described in section 3.2 of this manual to get the coordinate files of both proteins (1fc2 and 3hla), and store them in your home directory. They will now be called 1fc2.pdb and 3hla.pdb.

Start O as usual. Load both molecules into O, and produce a C[[alpha]] trace for each molecule (you might want to color them differently). You will now use the Lsq commands to carry out the superimposition. O superimposes two structures by first generating and then applying a transformation to one of the molecules. This transformation consists of a translation and a rotation, and is stored in the database like everything else in O. Once a transformation is defined, it can be applied to either an object or a molecule. O reports the transformation as a 3x4 matrix of values in the following format:

X' = R1 * X + R4 * Y + R7 * Z + R10

Y' = R2 * X + R5 * Y + R8 * Z + R11

Z' = R3 * X + R6 * Y + R9 * Z + R12

Where R1 to R9 are variables defining the rotation and R10-R12 the translation. The first step in generating such a transformation is generating an initial superimposition more or less manually. This is done with the lsq_explicit command. You will be prompted for the name of the molecule you want to keep stationary, and the name of the molecule you want to rotate. (To superimpose the two structures, one will remain stationary, and the other will be rotated and translated. Here we will transform the immunoglobulin domain and keep the HLA molecule stationary.)

O > lsq_explicit

Lsq > Least squares match by explicit definition of atoms.

Lsq > Given 2 molecules A, B the transformation rotates B onto A

Lsq > What is the name of A (the not rotated molecule)? 3hla

Lsq > What is the name of B (the rotated molecule)? 1fc2

In this initial superimposition, you have to supply O with several zones of both proteins to superimpose. For the stationary molecule, O will ask for the beginning and ending residue numbers for each zone. This should be followed by the specification "ca" on the same line, to tell O that you only want to use carbon-[[alpha]] atoms for the superimposition. For the transformed molecule, you only have to supply the beginning residue number of the zone, as O will assume the zone has the same length in both molecules. Normally, prior to running lsq_explicit, you will have already decided which regions of the two proteins were similar, and noted their beginning and ending residue numbers. In this example, we've done this for you. The ranges below represent individual ß-strands in each protein. (NOTE: In the pdb files of each of these example proteins, the polypeptide chains are differentiated by letters in front of the residue number.) If you make a mistake and there aren't the same number of atoms in the zone in both molecules or you are using incorrect residue numbers, O will let you know. Just keep entering zones one by one. When all the zones are entered, hit <Enter> when asked for the next zone.

Lsq > Now define what atoms in A [=3HLA] are to be matched to B [=1FC2]

Lsq > Defining 3 names in 3HLA implies a zone and an atom name.

Lsq > Defining 2 names in 3HLA implies a zone and CA atoms.

Lsq > Defining 1 name in 3HLA implies the CA of that residue.

Lsq > Molecule 1FC2 just requires the start residue and atom name.

Lsq > A blank line terminates input.

Lsq > Define atoms from 3HLA (the not rotated molecule): b6 b11 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d347

Lsq > Define atoms from 3HLA (the not rotated molecule): b21 b28 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d363

Lsq > Define atoms from 3HLA (the not rotated molecule): b36 b40 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d378

Lsq > Define atoms from 3HLA (the not rotated molecule): b50 b57 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d392

Lsq > Define atoms from 3HLA (the not rotated molecule): b62 b70 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d404

Lsq > Define atoms from 3HLA (the not rotated molecule): b78 b83 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d423

Lsq > Define atoms from 3HLA (the not rotated molecule): b91 b95 ca

Lsq > Define atoms from 1FC2 (the rotated molecule): d437

Lsq > Define atoms from 3HLA (the not rotated molecule):

Once the last zone is entered, O will start churning away and soon present you with a transformation. In addition, O will report a r.m.s. (root-mean-square) deviation for the zones considered. This is a measure of how closely the atoms compared overlapped. If this number is low, the corresponding carbon-[[alpha]]'s are close together. You will now be asked for the name you want to give the transformation. Use something that tells you what the transformation does, like 1fc2_to_3hla.

Lsq > The 47 atoms have an r.m.s. fit of 1.038

Lsq > xyz(1) = -0.5423*x+ -0.7975*y+ 0.2642*z+ 20.0671

Lsq > xyz(2) = -0.7940*x+ 0.3837*y+ -0.4715*z+ 32.4128

Lsq > xyz(3) = 0.2747*x+ -0.4655*y+ -0.8413*z+ 108.5090

Lsq > The transformation can be stored in O.

Lsq > A blank is taken to mean do not store anything

Lsq > The transformation will be stored in .LSQ_RT_1fc2_to_3hla

This is just a preliminary superimposition. The O command lsq_improve will take this initial transformation and use it as a basis for automatically generating an improved transformation based on considering the maximum possible number of zones. The command requests the name of the initial transformation you generated with lsq_explicit. All of the other inputs (name of molecule to transform, name of molecule to hold rigid, zones to consider, atoms to superimpose) should default to correct values. In some instances, however, you want to superimpose one structure on a part of another structure. If this is the case, make sure lsq_improve only considers the partial zone rather than the whole molecule.

O > lsq_improve

Lsq > Least squares match by Semi Automatic Alignment.

Lsq > There are these transformations in the database

Lsq > 1FC2_TO_3HLA

Lsq > Which alignment ? 1fc2_to_3hla

Lsq > Given 2 molecules A,B the transformation rotates B onto A

Lsq > What is the name of molecule A [3HLA ]? <Enter>

Lsq > Zone to look for alignment [all molecule A] : <Enter>

Lsq > What is the name of molecule B [1FC2 ]? <Enter>

Lsq > Zone to look for alignment [all molecule B] : <Enter>

Lsq > What atom [CA] ? <Enter>

Lsq > Number of atoms in A/B to look for alignment 369 249

Once every input is entered, lsq_improve will cycle several times and eventually generate a final transformation. Again, the transformation and RMS fit will be reported. You may notice that the RMS fit is, if anything, slightly worse. This is largely because the RMS fit from lsq_explicit is only calculated for the explicitly superimposed zones; the RMS fit for lsq_improve is calculated for a much larger number of residues, where a precise overlap is harder to find. You will also be asked to name this new transformation (you can use the same name as the old one, if you wish).

Lsq > 0Search for connected fragments.

Lsq > A fragment of 52 residues located.

Lsq > A fragment of 14 residues located.

Lsq > A fragment of 14 residues located.

Lsq > A fragment of 6 residues located.

Lsq > Loop = 1 ,r.m.s. fit = 1.284 with 86 atoms

Lsq > x(1) = -0.5342*x+ -0.7958*y+ 0.2850*z+ 19.4118

Lsq > x(2) = -0.8019*x+ 0.3704*y+ -0.4687*z+ 32.4115

Lsq > x(3) = 0.2675*x+ -0.4790*y+ -0.8361*z+ 108.6804

Lsq > 0Search for connected fragments.

Lsq > A fragment of 52 residues located.

Lsq > A fragment of 14 residues located.

Lsq > A fragment of 14 residues located.

Lsq > A fragment of 6 residues located.

Lsq > Loop = 2 ,r.m.s. fit = 1.284 with 86 atoms

Lsq > x(1) = -0.5342*x+ -0.7958*y+ 0.2850*z+ 19.4118

Lsq > x(2) = -0.8019*x+ 0.3704*y+ -0.4687*z+ 32.4115

Lsq > x(3) = 0.2675*x+ -0.4790*y+ -0.8361*z+ 108.6804

Lsq > The transformation can be stored in O.

Lsq > A blank is taken to mean do not store anything

Lsq > The transformation will be stored in .LSQ_RT_1fc2_to_3hla_2

Lsq > Here are the fragments used in the alignment

Lsq > 0 B1 IQRTPKIQVYSRHP B14

Lsq > D342 QPREPQVYTLPPSR D355

Lsq > 0 B20 SNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTE

Lsq > D362 QVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLT

Lsq > 0 B70 FT B71

Lsq > D412 VD D413

Lsq > 0 B76 DEYACRVNHVTLSQ B89

Lsq > D421 NVFSCSVMHEALHN D434

Lsq > 0 B90 PKIVKW B95

Lsq > D436 YTQKSL D441

6.2) Transforming objects and molecules

Well, you've generated a transformation, but there certainly isn't any change in the graphics display. The superimposing transformation has not yet been applied. In general, you will want to use a transformation that has been generated by lsq_improve to achieve the best correspondence between the two molecules. O contains two commands to apply a transformation, one for objects and one for molecules. The command lsq_object will take as input an object name and a transformation name, and will then apply the specified transformation to the chosen object. Remember, this will not affect the underlying molecule. Subsequent objects built from the same molecule as the transformed object will show up at the old (untransformed) coordinates. This is good for taking a quick look at how well two structures superimpose, but not for detailed comparison.

If you're going to be building a lot of objects with the new coordinates, or you want to produce a pdb file containing the superimposed molecule in the new coordinates, you need to transform the molecule by using lsq_molecule. Lsq_molecule requires a transformation name and a molecule name, and actually changes the molecule coordinates stored in the database to the new transformed coordinates. Once applied, it is not possible to regenerate the old coordinates, unless you use lsq_explicit and lsq_improve to produce an inverse transform (3hla_to_ifc2, for example) before applying the forward transform to the molecule. Being enthusiastic students, you could of course calculate the inverse transform by hand, but that would be a pain.

To produce a pdb file of the superimposed molecule in the transformed coordinates, simply use sam_atom_out after applying the transformation with lsq_molecule.

Chapter 7

Modifying Structures

7.1) Mutating, inserting and deleting

It is often useful to change the side chains of certain residues to see what effect the change would have on the structure. For example, you may be doing site-directed mutagenesis to change your favorite protein, and you then want to see what your mutated protein looks like compared to the original protein. You can do this very easily within O by using the Mutate commands, and the protein will be modified in the way you specify. However, changing side chains can affect the conformation of nearby side chains or even the main chain, and any residues you mutate or insert will lack proper coordinates. Refinement and database searches will permit you to assign plausible coordinates for your changes, but you won't necessarily come up with the correct structure. Only experimental techniques like x-ray crystallography will give you that.

Since we're not teaching you crystallography, we will teach you how to make changes within a protein and "regularize" or clean up the resulting changes. "Regularize", in this instance, means that O will assign correct bond lengths and torsion angles to the residues within a user specified zone. Although the regularized residues have correct geometry, they do not necessarily have correct van der Waals contacts (i.e., in the worst case, you could end up with the new side chain hitting some other part of the structure). You therefore need to follow the regularization step with some tweaking by hand. Ideally, you would use some energy minimization procedure to produce the best possible new structure. Energy minimization algorithms work by adding up sources of potential energy in a protein, and minimizing them by incremental changes in the coordinates. In doing this, you are assuming that the correct structure represents an absolute energy minimum (does it?). You can only hope arrive at a meaningful structure if you start with a structure that is close to being correct. (E.g., you can't substitute all the residues in trypsin for the residues found in lysozyme, energy minimize the mutant trypsin and expect to arrive at the lysozyme structure.) However, O does not contain any energy minimization routines -- if you want to make use of energy minimization, you will have to use another program and should probably go talk to a TA.

As an example, you will change some of the residues found in your sample peptide. First we will need to start O back up with the database containing the peptide (170_example1.o). For our example we will replace residue 10 with a valine. To make the changes we just issue the mutate_residue command. (Remember that any changes you make in your database file are permanent as soon as you save_DB or stop, so you may want to save a copy of your original database file before making mutations.) Note that the mutations will only change the database file. To produce a modified pdb file, you have to use sam_atom_out. You will be asked the molecule name, residue name, and new residue type. You can mutate multiple residues with one mutate_replace command -- just hit Enter to to terminate the command. The name of all amino acid changes must be the three letter code. (An example of how to do this will be shown below.)

Before building an object from your mutated peptide, you need to regularize it in order to allow the valine to fit into the helical structure of the peptide. (If you don't do this, the new side chain will not be visible.) We will discuss regularization in section 7.2

O> mutate_replace

Mut> Mutate a molecule by replacing one residue type

Mut> by another.

Mut> Molecule ([ALA ]) : <Enter>

Mut> Residue name and new type (<cr> to end) : 10 val

Mut> Residue name and new type (<cr> to end) : <Enter>

Mut> There are 1 mutations

Mut> The Rotamer_DB is now being loaded.

O > yes

Other mutation routines are mutate_delete and mutate_insert. Mutate_delete deletes the selected residue in the molecule specified.

O > mutate_delete

Mut> Mutate a molecule by deleting residues

Mut> Molecule ([ALA ]) : <Enter>

Mut> Residue name (<cr> to end) : 2

Mut> Residue name (<cr> to end) : <Enter>

Mut> There are 1 mutations

Mutate_insert requires specification of a molecule, the residue before the inserted residue, and the new residue name (it's best to use the number of the preceding residue plus a letter, like inserting 102a after 102) and three-letter amino acid code.

O > mutate_insert

Mut> Mutate a molecule by inserting residues.

Mut> Molecule ([ALA ]) : <Enter>

Mut> After which residue: 1

Mut> New residue name and type (<cr> to end) : 1a tyr

Mut> New residue name and type (<cr> to end) : <Enter>

Mut> There are 1 mutations

Both these commands cause substantial distortions in the carbon-a backbone, and the molecule will therefore need to first be grossly repaired with lego_loop, then regularized with refi_zone (see next section). Remember, there will be no visible effects from either of these commands until you finish repairing and regularizing and build a new object. All residues changed by a mutate_replace or mutate_delete command will be colored purple in the new object.

7.2) Cleaning up mutations with refi_zone and the Lego commands

The Mutate commands will produce several problems in your molecule, from residues without coordinates to gaps and insertions in the backbone. To fix these problems and achieve some sort of appropriate geometry, O has two mechanisms. The first is regularization. The refi_zone command will force residue geometries (bond lengths, angles) toward their ideal values. Regularization only works on residues that have coordinates, however. To assign coordinates, repair gaps or smooth insertions, the Lego commands are provided.

The program O contains a set of tools that pick out the best match between an input set of C[[alpha]] positions and C[[alpha]] coordinates from a database of well refined protein structures, thus allowing you to find regions of structural similarity. This can be done in real time at the graphics terminal using a very fast matching algorithm that works with diagonal plots (Jones, T.A. (1985) in Crystallography in Molecular Biology, eds. Moras, Drenth, Strandberg, Suck, Wilson, Plenum Press, pp. 125-130; Jones, T.A. and Thirup, S. (1986) EMBO J 5, 819-822). These commands are the lego commands, lego_side_chain, lego_loop, lego_auto_mc, lego_ca, and lego_auto_sc.

The lego commands search a database of 32 high-resolution structures for the best fit backbone coordinates to the atoms under current examination, then permit the user to assign those backbone coordinates to a molecule. Also, the lego commands use a database of common rotamers for each residue (except alanine and glycine) which can be used to assign coordinates to mutated side chains.

What you should learn from experimenting with the lego commands is that, at the level of 5-10 amino acid stretches, all parts of any protein main chain already exist in the protein data base. For example, Alwyn Jones found that the entire structure of retinol binding protein could be built from satellite tobacco necrosis virus, alcohol dehydrogenase and carbonic anydrase C, such that the r.m.s. deviation for all main chain atoms was 1.0 Å after regularization. (These three proteins do not show sequence similarity or an overall structural similarity to retinol binding protein.)

7.2a) Lego_side_chain and lego_loop. To fix an insertion or deletion by lego_loop, you first have to select_on all the residues of the molecule, then select_off the zone you want to modify. Lego_loop can then be executed. It takes a standard molecule and zone input (make the zone a little bigger than the deselected residues on either side) and produces a set of possible carbon-[[alpha]] coordinates from the database. Again, the lower right hand dial will choose between fragments, but the first selected fragment (lowest RMSD) is generally the best choice. Select Yes to accept the current fragment coordinates, or No to abort the routine. The following example applies lego_loop to the tryptophan inserted into the ALA peptide:

O > select_off

Sel> What molecule [ALA ]:

Sel> Residue range [all molecule]:

O > select_on

Sel> What molecule [ALA ]:

Sel> Residue range [all molecule]: 1 12

O > lego_loop ala 1 10

Lego> ALA 1 10 ALALL

Lego> Used.

Lego> Used.

Lego> Used.

. . .

Lego> Number of selected atoms in zone is 10

Lego> The DB is now being loaded.

Lego> Loading data for protein:HCAC

Lego> Loading data for protein:PA

Lego> Loading data for protein:51C_3

Lego> Loading data for protein:ACT_2

Lego> Loading data for protein:APP_2

. . .

Lego> DGNL> Top matches

Lego> Protein Start Res. Score Sequence

Lego> TLN_3 142 1.216 HELTHAVTDY

Lego> CPA_5 291 1.228 QETWLGVLTI

Lego> TLN_3 143 1.230 ELTHAVTDYT

Lego> DFR_3 23 1.241 LPDDLHYFRA

. . .

O > yes (after dialing through the possible structures -- or just choose the best match)

O > obj alall zo ; end

The use of lego_side_ch is a little simpler. Lego_side_chain is used after a mutate_insert command to assign coordinates to the new side chain. After being given the molecule and residue names, lego_side_chain permits the user to graphically cycle through common rotamers for the residue in question. The lower right dial will select between rotamers, and selecting Yes will assign those rotamer coordinates to the residue. The routine can be terminated by entering No. To assign side chain coordinates to our new residue 1a and adjust the conformation of the mutated residue 10, use lego_side_chain twice:

O > lego_side_chain ala 1a

Lego> ALA 1A CA ALALL

O > yes (after dialing through rotamers to find the best one)

O > lego_side_chain ala 10

Lego> ALA 10 CA ALALL

O > yes (after dialing through rotamers to find the best one)

After any lego command, especially lego_loop, it's best to regularize the changes with refi_zone.

7.2b) Refi_zone. The use of refi_zone is straightforward. Input a molecule and zone, or pick two residues to define the zone in consideration. You should regularize all residues changed substantially by any command (not necessarily all with one refi_zone), plus a few residues on either side of each change in case they have been affected by it. If you were making changes in a real protein, you might need to regularize (or preferably energy minimize) non-contiguous (in terms of residue number) parts of the structure that are spatially close to the mutation.

At the end of the refi_zone run, a question asks if you want to save the coordinates. If you answer "yes", you will overwrite whatever is in your database file with the changes made. If you answer "no", the molecule will not be changed. Regularization is an iterative procedure. For major changes, like cleaning up an insertion or deletion, it's best to run refi_zone on the affected residues several times until the reported RMS deviation no longer decreases.

Here we apply refi_zone to the residues of our peptide affected by the changes we've made:

O > refi_zo ala 1 15

Refi > ALA 1 15 ALALL

Refi > Refining zone 1 to 15 in molecule ALA , object ALALL

Refi > 563 lines read from dictionary

Refi > R.m.s.d. in bond lengths, angles, fixed diherals

Refi > 0.06 3.56 2.85

Refi > Accept new coordinates? Hit *Yes/*No

O > yes

O > refi_zone ala 1 15

Refi > ALA 1 15 ALALL

Refi > Refining zone 1 to 15 in molecule ALA , object ALALL

Refi > R.m.s.d. in bond lengths, angles, fixed diherals

Refi > 0.04 3.04 2.86

Refi > Accept new coordinates? Hit *Yes/*No

O > yes

O > obj alall zone ; end

7.3) Building a hypothetical structure

At some future time, you may wish to create a peptide of a certain sequence (or even a whole protein) and give it a three-dimensional model structure. The following sequence of commands will permit you to generate such a structure from within O.

7.3a) Using Sam_Init_DB to create space for a de novo structure. In order to do this in O, you first have to prepare space in the database for the coordinates of your new structure. This is done with the Sam_init_DB command. To use Sam_init_DB, you must first make a file containing the list of residues you want in your peptide. The best way to do this is to use the editor vi. (A summary card of common vi commands is in appendix A.) The first line of this file must consist of the word <molecule name>_RESIDUE_TYPE, all in capitals, followed by five spaces, then a capital "C", another five spaces, the total number of residues listed, a space, and the string "(1x,5a)." (This cryptic line tells O the format of each entry, and how many entries there are in the file.) The residues must be listed by their three-letter codes in the order you want them in the peptide, separated by blank spaces, with no more than five residues on each line. For the purpose of this example you have been given a file called 170udp.seq. It contains the sequence for UDP-galactose isomerase, in the proper format to be used by sam_init_DB. You can look at it using more (or change it using the vi or zip editors):

citpig % more 170udp.seq

UDPA_RESIDUE_TYPE C 338 (1x,5a)

MET ARG VAL LEU VAL

THR GLY GLY SER GLY

TYR ILE GLY SER HIS

THR CYS VAL GLN LEU

LEU GLN ASN GLY HIS

ASP VAL ILE ILE LEU

ASP ASN LEU CYS ASN

SER LYS ARG SER VAL

LEU PRO VAL ILE GLU

ARG LEU GLY GLY LYS

HIS PRO THR PHE VAL

GLU GLY ASP ILE ARG

ASN GLU ALA LEU MET

THR GLU ILE LEU HIS

ASP HIS ALA ILE ASP

THR VAL ILE HIS PHE

ALA GLY LEU LYS ALA

VAL GLY GLU SER VAL

GLN LYS PRO LEU GLU

. . .

Now that you have your sequence file, start O and enter the command read_formatted. This command is used to read a formatted text file into O, where it will become a datablock in the database. (The companion command, write_formatted, can be used to write a datablock to a text file. Just supply the name of the datablock and file name, and accept the default format.) When asked for the name of the file to read, type the name of the UDP sequence file.

O > read_formatted

Heap> Filename of data block to be read: 170udp.seq

This just generates a datablock in O named UDPA_residue_type. Remember, each molecule in O needs a database entry like this -- the molecule you're going to create will be named UDPA. List the database entry with write_formatted:

O > write_formatted

Heap> Name of data block(s) to be written out: UDPA_residue_type

Heap> Name of file to be created: <Enter>

Heap> Format: <Enter>

UDPA_RESIDUE_TYPE C 338 (1x,5a)

MET ARG VAL LEU VAL

THR GLY GLY SER GLY

TYR ILE GLY SER HIS

THR CYS VAL GLN LEU

LEU GLN ASN GLY HIS

(&c.)

Once you're satisfied the entry is present, execute sam_init_DB. This command takes a molecule name (one for which the entry <molecule name>_residue_type exists in the database) and makes room in the database for all the information and entries associated with a molecular structure. You must use sam_init_DB before trying to construct any sort of structure from scratch, or later commands will have nowhere to store the coordinates they generate. Because the molecule has not been refined into a three-dimensional structure yet, all the coordinates will be set to 1500, the default value. Any objects you generate before assigning a reasonable number of these coordinates will be hopelessly bizarre.

O > O > sam_init_DB

Sam> This WILL initialise certain datablocks.

Sam> Molecule name ([] to exit): udpa

Sam> Database compressed.

Sam> Making residue names.

Sam> There are 338 residues, 2625 atoms.

7.3b) Assigning de novo coordinates. Before refi_zone or the Lego commands will work, you need to at least vaguely assign some of the coordinates of your structure. The best way to do this is to copy coordinates from some existing structure, ideally one with some homology to your protein of interest. If you need to form a structure completely from scratch (e.g. assign alpha-helical coordinates to a peptide of interest), you will have to use software other than O, so corner a TA and start asking questions. Assuming you have at least a partial structure to draw from (and it's loaded into O), you can transfer coordinates from that structure to all or part of your new molecule with the merge_atoms command.

For our example, we will use the C[[alpha]] coordinates for UDP as reported in 1udp.pdb. Copy the file to your directory using "get." Building a structure from a pdb file containing only carbon-[[alpha]] coordinates is perhaps not the most "real" use of these techniques. However, recognize that any

hypothetical structure is just that -- hypothetical -- and only gross details have a hope of being correct. Slightly mutated structures like those of Section 7.1, on the other hand, are probably more realistic.

To use merge_atoms, you tell it the molecule you're copying from, and the zone of residues you want to copy, followed by the molecule you want to copy to, and the residue to start inserting the coordinates at. It is also possible to apply a transformation to the coordinates before they are copied, but we don't need to. Clearly, multiple merge_atom commands would permit building a structure from coordinates of several real structures. In our case, we're using the coordinates of a single whole molecule:

O > O > sam_atom_in

Sam> Name of input file: 1udp.pdb

Sam> O associated molecule name: 1udp

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 137623 atoms

Sam> Space for 10000 residues

Sam> Molecule 1UDP contained 676 residues and 676 atoms

O > merge_atoms

Sam> Merge from molecule name, and zone: 1udp a1 a338

Sam> Merge to molecule name and start residue: udpa 1

Sam> Datablock containing transformation [<cr> identity]: <Enter>

Sam> 338 atoms

Sam> 338 updated.

Now you can build a ca_zone object from the udpa molecule and it will actually appear normal. Any other object will look really funny, as side chains and backbone atoms are still lacking coordinates. O has a very nice facility for automatically building the main chain from short fragments of its high-resolution structure database. The command is lego_auto_mc. You merely need to specify the molecule and zone to operate on (C[[alpha]] coordinates must exist), and it will assign the rest of the backbone coordinates. It's kind of neat to watch, too.

O > lego_auto_mc udpa 1 338

Lego> UDPA 1 338 YASSPA

Lego> reading db

Lego> The DB is now being loaded.

Lego> Loading data for protein:HCAC

Lego> Loading data for protein:PA

Lego> Loading data for protein:51C_3

Lego> Loading data for protein:ACT_2

Lego> Loading data for protein:APP_2

Lego> Loading data for protein:AZA_1

Lego> Loading data for protein:B5C_2

(&c.)

Following lego_auto_mc, lego_auto_sc will assign coordinates for each side chain. The inputs are the same as lego_auto_mc, and the resulting coordinates are those of the most common rotamer for each residue.

O > O > lego_auto_sc udpa 1 338

Lego> UDPA 1 338 YASSPA

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

(&c.)

(The message "SCGLY is missing./Unable to draw the rotamers." is reported when lego_auto_sc encounters a residue that doesn't have any real rotamers, like glycine.) No bump checking is done to make sure that the assigned rotamers don't overlap with other atoms, so after the coordinates are assigned you should build an object containing all the atoms of the new structure and thoroughly examine each residue for overlaps or obvious errors. You can fix any you find with lego_side_chain or tor_residue. As always, you should probably regularize the structure with refi_zone before you consider yourself finished.

This procedure should produce a hypothetical structure of reasonable, but not high-resolution, accuracy. Be aware that there is a bug in this procedure -- it does not properly assign coordinates for the terminal residues of the protein. The chances of such coordinates being correct in a de novo structure is low, but if you need them for some reason, ask a TA about how to assign terminal coordinates correctly.

CHAPTER SUMMARY

Mutating A Residue:

O > mutate_replace

Mut> Mutate a molecule by replacing one residue type

Mut> by another.

Mut> Molecule ([ALA ]) : <Enter>

Mut> Residue name and new type (<cr> to end) : 10 val

Mut> Residue name and new type (<cr> to end) : <Enter>

Mut> There are 1 mutations

Mut> The Rotamer_DB is now being loaded.

O > yes

Inserting A Residue:

O > mutate_insert

Mut> Mutate a molecule by inserting residues.

Mut> Molecule ([ALA ]) : <Enter>

Mut> After which residue: 1

Mut> New residue name and type (<cr> to end) : 1a tyr

Mut> New residue name and type (<cr> to end) : <Enter>

Mut> There are 1 mutations

Deleting A Residue:

O > mutate_delete

Mut> Mutate a molecule by deleting residues

Mut> Molecule ([ALA ]) : <Enter>

Mut> Residue name (<cr> to end) : 2

Mut> Residue name (<cr> to end) : <Enter>

Mut> There are 1 mutations

Assigning Main Chain Coordinates After Insertion/Deletion:

O > select_off {molecule name} ;

O > select_on {molecule name} {zone including mutation}

O > lego_loop {molecule name} {smaller zone including mutation}

O > yes

Assigning Side Chain Coordinates From the Rotamer Database:

O > lego_side_chain {molecule name} {residue number}

O > yes (after dialing through rotamers to find the best one)

Regularizing A Region Of Structure:

O > refi_zone {molecule name} {zone}

O > yes

Reading A Formatted Sequence Into O:

O > read_formatted {name of sequence file}

For an example of sequence file formatting, more the file 170udp.seq.

Making Room For A Molecule In The Database:

O > sam_init_DB

Sam> This WILL initialise certain datablocks.

Sam> Molecule name ([] to exit): udpa

Sam> Database compressed.

Sam> Making residue names.

Sam> There are 338 residues, 2625 atoms.

Using One Molecule To Assign Coordinates To Another:

O > merge_atoms

Sam> Merge from molecule name, and zone: 1udp a1 a338

Sam> Merge to molecule name and start residue: udpa 1

Sam> Datablock containing transformation [<cr> identity]: <Enter>

Sam> 338 atoms

Sam> 338 updated.

Automatically Building Main and Side Chain Coordinates From C[[alpha]]s:

O > lego_auto_mc udpa 1 338

Lego> UDPA 1 338 YASSPA

Lego> reading db

Lego> The DB is now being loaded.

Lego> Loading data for protein:HCAC

Lego> Loading data for protein:PA

Lego> Loading data for protein:51C_3

(&c.)

O > lego_auto_sc udpa 1 338

Lego> UDPA 1 338 YASSPA

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

Lego> Unable to draw the rotamers.

Lego> SCGLY is missing.

(&c.)

Chapter 8

DISPLAYING DNA

O also has the ability to display and manipulate DNA files, which are stored in the same manner as the protein files. In fact, these may be treated almost exactly like a protein file, with the bases being equivalent to amino acids. To display nucleic acids it is first necessary to change the way the linkage is set between residues. O stores all this connectivity information in the File_display_connectivity entry of its database. Normally, you are asked for this file when starting O the long way. The standard connectivity file is named all.dat, and only contains connectivity information for amino acids. Any polynucleotide structures loaded into O will show up as unconnected monomers. You can create a connectivity file specifying precisely how you want everything displayed.

O also comes with two connectivity files specifying correct display of certain nucleic acids: dna.dat, for DNA, and trna.dat, for TRNA. As with all.dat, using either of these connectivity files will result in proteins being displayed strangely, as neither contains information about peptide bonds. Fortunately for those who want to look at mixed structures, like protein-DNA complexes, these three files have been concatenated together to form a single file specifiying correct connectivity for protein, DNA, and RNA. This file is named all_na.dat, and you should be sure to use this connectivity file when looking at any mixed structure. Aside from changing the linkage, all of the usual O commands apply in the same manner, except for creating a "ca" object. Creating a "ca" object of a nucleic acid will not work, as O won't find any [[alpha]]-carbons in the specified residues. It is possible to display a nucleic acid backbone -- refer to O for Morons, Section 10.3.9 for details.

To change the connectivity file used by O:

O> connect_file

Mol> Connectivity file? [/usr/local/src/Ono/data/all.dat]: {your filename here}

Mol> Maximum inter-residue link distance = 2.00

Mol> There were 23 residues.

Mol> 175 atoms.

Some connectivity files to use:

data/all.dat: Peptide connectivity only.

data/trna.dat: G, A, C, U, & modified nt connectivity only

data/dna.dat: G, A, C, T only

local/all_dna.dat: Peptide, G, A, C, & T connectivity only

local/all_na.dat: Peptide, DNA, and tRNA connectivity.

Find a molecule of DNA to look at, and examine it using different connectivity files. (Hint: search the Brookhaven data base using finp "DNA".) The following example shows the glucocorticoid receptor bound to DNA, with traces of the protein monomers in blue and green and the DNA strands drawn in yellow and orange (with proper connectivity). The helix of each monomer that lies in the major groove of the DNA is also drawn as a zone object.

covalent % get 1glu

covalent % ono 170.o

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST

O > Loading 170.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]: yes

O > Graphics board GL4DXG-4.0.

O > Making visibility data structures.

O > Making visibility data structures.

O > O > Trackball on (F7KEY)

O > connect_file

O > Current molecule has not been loaded.

Mol> Connectivity file? [data/all.dat]: local/all_dna.dat

Mol> Maximum inter-residue link distance = 2.00

Mol> There were 27 residues.

Mol> 256 atoms.

O > sam_atom_in

Sam> Name of input file: 1glu.pdb

Sam> O associated molecule name: 1glu

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 142658 atoms

Sam> Space for 10000 residues

Sam> Molecule 1GLU contained 245 residues and 2071 atoms

O > mol 1glu

O > Current molecule has not been loaded.

O > sam_list_sequence

Sam> Molecule name [1GLU ]:

Sam> Name Type From To Centre Radius

Sam> A434 MET 1 8 -16.62 25.89 39.00 3.86

Sam> A435 LYS 9 17 -13.32 25.19 36.06 3.94

Sam> A436 PRO 18 24 -15.23 21.52 36.17 2.52

Sam> A437 ALA 25 29 -16.54 22.28 32.78 2.09

Sam> A438 ARG 30 40 -20.35 18.72 29.90 4.71

. . .

O > obj prot

O > ca_zone a434 b514

O > end

O > centre_zon a434 b514

As4> No object defined.

As4> 1GLU A434 B514 PROT

As4> Centering on zone from A434 to B514

O > obj dna

O > zone C-10 D9

O > end

O > paint_colour green

O > paint_object prot

Paint> PROT

O > paint_colour blue

O > pai_obj_zo 1glu a434 a514 prot

Paint> 1GLU A434 A514 PROT

O > paint_colour yellow

O > paint_object dna

Paint> DNA

O > paint_colour orange

O > pai_obj_zo 1glu c-10 c9 dna

Paint> 1GLU C-10 C9 DNA

O > obj hlxs

O > zone a457 a470

O > zone b457 b470

O > end

Chapter 9

SAVING PICTURES AS "SNAPSHOTS"

9.1) Using the snapshot program

Once you have created a picture that illustrates a point you would like to make, you would probably like to be able to save it so that you can look at it again. A program called SNAPSHOT allows you to save whatever onscreen image is enclosed in a user-chosen box so that it can be displayed again on the screen. This program only saves the image that is on the screen -- if you want to save the molecule and objects used to build the image, save the O database with save_DB instead. We ask that you learn to use this program so that you can save images for your projects in your directories as named files. We will then look at them while reading your papers.

First create a picture on the screen (however you want to, using objects or sketch objects or some combination thereof). Color it however you want to illustrate the point you are making. The picture will be saved exactly as it appears on the screen. Here is how you save that picture.

First, clear up other windows. To avoid getting them in your picture, make sure they don't overlap in front of the graphics window. (You can just lower each window behind the graphics window with the Alt+F3 key combination described earlier.) Remember to use the mouse to position the red arrow back into either O window after you are done.

You're now ready to start the SNAPSHOT program.

You can start the program from within O by typing a dollar sign followed by the word snapshot:

O> $ snapshot

Of course, this can be typed in either O window. (You can also start the program from within the window manager tools menu).

A small red rectangle will immediately appear. Without touching any of the mouse buttons, use the mouse to move the rectangle to the lower left corner of the graphics window.

Press the right mouse button. A rectangular box with the word "snapshot" will appear along with a small red icon that looks like a camera. Position the icon anywhere within the rectangular snapshot box, and use the left mouse button to create a box around the picture. (Hold the button down, release it when you're finished to create a box of the desired size. It should frame your picture.)

Move the mouse back to the snapshot box. Press the right mouse button to create a pull-down menu. Pick the option "Save and exit" and release the mouse button. Wait a little while. When the computer is done saving the image, the red box will disappear. The image will be saved as a file called snap.rgb.

Put the red arrow back in your O text window. You can check to see that the file was created by looking at the files in your directory, which can be done from within the O program by typing the usual UNIX command preceded by a dollar sign as shown below:

O> $ ls

You now want to rename that file into something that makes sense to you. Every time you make a snapshot image, it is named snap.rgb and will rewrite any old file with the same name, so you MUST rename your file if you want to another image later on. To rename it, use the UNIX command "mv" preceded by a dollar sign. For example:

O> $ mv snap.rgb snap1.rgb

The file has now been renamed snap1.rgb. Alternatively, you can use the "new file name" option in the snapshot pull-down menu to choose a new file name for each image you save before selecting the "Save and exit" option.

We expect you to create a series of these images which should be referred to in the paper describing your project. (Alternatively, for those with a 35 mm camera, you can take pictures of the screen, and include them in your paper. This gets a bit expensive for you, so we devised this method to allow you to save images as "figures" on the computer.)

Here's how to look at an image you created: First exit from O. At the UNIX prompt, type

citpig% ipaste snap1.rgb

(or whatever you named the image). (Just to make sure there is no confusion: the command is "ipaste" using the small letter i.) A red box will appear which can be positioned anywhere. When you click the right mouse button, the picture will be drawn.

To remove the picture and return you to the UNIX prompt, use the right mouse button in the title bar to pull down a menu. Choose the option quit snap.rgb.

9.2) Plotting your image from O

There is a color HP LaserJet 550c in 158 Braun, next to Howie. O contains its own screen-capture and plotting routines that permit you to generate color hardcopy of your display. However, the technique is a bit unwieldy and the output quality is limited, so expect to spend some time adjusting things before you produce an acceptable image. Be warned that some sketch objects and cpk (solid sphere) objects are not plotted, and that Van der Waals or Connolly surfaces usually come out as a solid blur.

Plotting directly from O involves using the Plot commands to generate an output file. This file then has to be converted to PostScript in order to be read by the plotter. There are two ways of generating this output file. The first, quick-and-dirty way is to just use the plot command. Plot dumps the current image displayed in the O graphics window into a file named "plot.o." Like snapshot, plot always uses the same file name, so change the name of plot.o before issuing a second plot command. Also, plot is limited to working in black and white.

To produce a color output file, you need to use the slightly more complex sequence of commands plot_setup, plot_on, and plot_off. Unlike plot, which just dumps the current display, you need to start up plotting with plot_setup and plot_on before any of the objects you want to display are visible. This trio of commands work by adding objects to the output file as they are drawn in the graphics window.

First, you need to execute plot_setup. You will first be asked for a "root name" for all your plot files. Subsequent output files will be stored as root_name1, root_name2, etc., where the number is inserted in place of any percentage signs you put in the root name. For example, if I were plotting images of the molecule LPR, I might choose the root name as LPR_%%%.o. My output files would therefore be LPR_001.o, LPR_002.o, etc., until I changed the root name with another plot_setup command. You will also be prompted for the first number to insert in the root name -- O will increment this number by 1 for each subsequent plot. Finally, you will be asked if you want to include the menu in the plot (usually no) and whether to plot in stereo or not (the eventual output will be a stereo pair instead of a single image, but be too small to look good). Here's a sample plot_setup:

O > O > plot_setup

Plot> Plot file template name? [o_%%%.plt]: 1sgt_%%%.plt

Plot> Number to replace % chars in template

Plot> Starting at [ 1] : 1

Plot> Name of next plot: 1sgt_001.plt

Plot> Plot the menu, too? [No]: <Enter>

Plot> Stereo plot? [No]: <Enter>

Once everything is set up, plotting can be turned on. Use plot_on to start copying all subsequent graphics commands to the output file. The dials are not active when plot_on is running, so make sure you have the view set perfectly before you execute plot_on. To get a reasonable resolution, you will want the desired image to take up as much as possible of the graphics window, so set the view with this in mind. After executing plot_on, issue commands such that the objects you want in the output file are displayed on the screen. Having the object displayed before starting the plot will not make it show up: you have to build every object you want to show up in the plot after you execute plot_on. You terminate the plot by issuing the plot_off command. After the plot_off, graphics commands are no longer copied to the output file. A subsequent plot_on and plot_off will produce another output file, but the number in its name will be incremented by one.

The best way to work with these two commands is to generate an object for the molecule you'll want to display, then use that object and the dials to set the view properly. Turn all the current objects off via the menu (to unclutter the display), plot_on, and build every object you want in the plot. Finish with plot_off. For example:

O > plot_on

Plot> Opening plotfile 1sgt_001.plt

O > plot_off

Plot> closing plotfile

O >

In general, the plot will be much smaller than the on-screen image. Make your image as large as possible in order to get anything useful. Once you have created all the plot files you want, quit O. You need to use the UNIX command o_convert to change the O output files to PostScript files. Typing "o_convert <filename> > <filename>.ps" will produce a PostScript version of <filename>, named <filename>.ps. You can examine your PostScript files by using the program xpsview. Type xpsview at a UNIX prompt, place the window in a convenient place on the screen, and choose the file to display by selecting "Choose File" from the "File" menu with the left mouse button. If the image looks acceptable, quit xpsview ("Quit" in the "File" menu) and print it with the command

citpig% lp < <filename>.ps

It may take a minute or so before the plotter starts. Plotting is not cheap, so please don't print images without previewing them with xpsview first.

9.3) Taking photos of the screen

Should you decide you want high-quality hard copies of your images, you can produce slides and photos using the slide-making equipment in 158 Braun. You will need to collect your images as .rgb files with snapshot, convert them to a format readable by the Macintosh that controls the slide maker, transfer the converted files to the Mac, and finally setup and run the slide-making software. This is a complex and convoluted process. If you already know how to use the slide maker, David Mathog can help with the file conversion and transfer. If you have no idea how the slide maker works, talk to a TA before you do anything.

Appendix A

EXAMPLE

We will now show you a display that can be made with O.

Example: Highlight the combining site of an Fab. The V regions will be shown as the carbon-[[alpha]] backbone, and residues in the three hypervariable regions (CDRs) will be shown as highlighted side chains. For best results, change the colors of the objects so that each CDR is a different color .

O > goose % ono 170.o

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST

O > Loading 170.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]:

O > Graphics board GL4DXG-4.0.

O > Making visibility data structures.

O > Making visibility data structures.

O > O > Trackball on (F7KEY)

O > s_a_i 2fbj.pdb 2fbj

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 142658 atoms

Sam> Space for 10000 residues

Sam> Molecule 2FBJ contained 814 residues and 3761 atoms

O > mol 2fbj

O > Current molecule has not been loaded.

O > paint_ramp ;;;;

O > obj light ca l1 l107 end

O > obj heavy ca h1 h115 end

O > ce_zo l1 l107

As4> No object defined.

As4> 2FBJ L1 L107 HEAVY

As4> Centering on zone from L1 to L107

O > paint_colour magenta

O > obj fbj.2

O > zone l26 l30 zone h27 h32 end

O > paint_object fbj.2

Paint> FBJ.2

O > obj fbj.3 zone l49 l55 zone h53 h56 end

O > paint_colour cyan paint_object fbj.3

Paint> FBJ.3

O > @omac/cnos_colours.omac

O > Macro in computer file-system.

O > ce_at

As3> Define molecule [2FBJ], residue, and atom [CA]: 2fbj h104 cb

The following is an example of the mutations and manipulations that can be carried out with O. One residue in the CDRs of 2fbj will be deleted, and one mutated. The structure will be regularized and compared against the original structure.

goose % ono 170.o

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST

O > Loading 170.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]:

O > Graphics board GL4DXG-4.0.

O > Making visibility data structures.

O > Making visibility data structures.

O > O > Trackball on (F7KEY)

O > s_a_i 2fbj.pdb 2fbj

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 142658 atoms

Sam> Space for 10000 residues

Sam> Molecule 2FBJ contained 814 residues and 3761 atoms

O > mol 2fbj

O > Current molecule has not been loaded.

O > select_off ;;

O > select_on

Sel> What molecule [2FBJ ]:

Sel> Residue range [all molecule]: l90 l95

O > select_on

Sel> What molecule [2FBJ ]:

Sel> Residue range [all molecule]: h100 h105

O > s_a_o

Sam> Output file name: cdr3s.pdb

Sam> Coordinate file type assumed from file name is PDB

Sam> What molecule [2FBJ ]:

Sam> Residue range [all molecule]:

Sam> Define cell constants [ 54.02 74.29 131.35 90.00 90.00 90.00]:

Sam> Write out only selected atoms? [No]: yes

Sam> Use the B-factor? [Yes]:

Sam> Use the occupancy? [Yes]:

Sam> 114 atoms written out.

O > s_a_i cdr3s.pdb

Sam> O associated molecule name: cdr

Sam> File type is PDB

Sam> Nothing marked for deletion, so no compression.

Sam> Space for 137660 atoms

Sam> Space for 10000 residues

Sam> Molecule CDR contained 12 residues and 114 atoms

O > mol cdr obj all zone ; end

O > ce_zo l90 l95

As4> No object defined.

As4> CDR L90 L95 ALL

As4> Centering on zone from L90 to L95

O > mutate_delete

Mut> Mutate a molecule by deleting residues

Mut> Molecule ([CDR ]) :

Mut> Residue name (<cr> to end) : h102

Mut> Residue name (<cr> to end) :

Mut> There are 1 mutations

O > select_off ;;

O > select_on ; h100 h105

O > O > lego_loop cdr h100 h105

Lego> CDR H100 H105 ALL

Lego> Used.

Lego> Used.

Lego> Used.

Lego> Used.

Lego> Used.

Lego> Number of selected atoms in zone is 5

Lego> The DB is now being loaded.

Lego> Loading data for protein:HCAC

Lego> Loading data for protein:PA

Lego> Loading data for protein:51C_3

Lego> Loading data for protein:ACT_2

Lego> Loading data for protein:APP_2

Lego> Loading data for protein:AZA_1

Lego> Loading data for protein:B5C_2

Lego> Loading data for protein:BP2_1

Lego> Loading data for protein:C2C_3

Lego> Loading data for protein:CPA_5

Lego> Loading data for protein:CPV_1

Lego> Loading data for protein:CRN_1

Lego> Loading data for protein:CYT_4

Lego> Loading data for protein:DFR_3

Lego> Loading data for protein:ECD_1

Lego> Loading data for protein:FB4_1

Lego> Loading data for protein:FDX_1

Lego> Loading data for protein:FXN_3

Lego> Loading data for protein:HIP_1

Lego> Loading data for protein:INS_1

Lego> Loading data for protein:LH1_1

Lego> Loading data for protein:MBO_1

Lego> Loading data for protein:NXB_1

Lego> Loading data for protein:OVO_1

Lego> Loading data for protein:PCY_1

Lego> Loading data for protein:PPT_1

Lego> Loading data for protein:PTI_4

Lego> Loading data for protein:PTN_2

Lego> Loading data for protein:REI_1

Lego> Loading data for protein:RHD_1

Lego> Loading data for protein:SGA_2

Lego> Loading data for protein:SN3_1

Lego> Loading data for protein:SNS_2

Lego> Loading data for protein:TLN_3

Lego> DGNL> Top matches

Lego> Protein Start Res. Score Sequence

Lego> APP_2 23 0.521 IGGTT

Lego> APP_2 260 0.536 ISGYT

Lego> FB4_1 201 0.557 HEGST

Lego> PTN_2 183 0.558 CSGKL

Lego> TLN_3 250 0.561 HYGVS

Lego> ACT_2 172 0.574 EGGVD

Lego> SGA_2 22 0.576 VNGVA

Lego> B5C_2 23 0.581 LHYKV

Lego> ACT_2 190 0.584 EEGYM

Lego> BP2_1 78 0.588 SNNEI

Lego> APP_2 92 0.604 VGGVT

Lego> TLN_3 44 0.613 AKYRT

Lego> SNS_2 94 0.617 ADGKM

Lego> SN3_1 42 0.652 YAFAC

Lego> APP_2 158 0.681 KHQQP

Lego> AZA_1 70 0.689 QDYVK

Lego> HCAC 106 0.710 VDKKK

Lego> DFR_3 15 0.719 KDGHL

Lego> SNS_2 27 0.727 YKGQP

Lego> FXN_3 57 0.728 GDEVL

O > yes (after dialing through the possible structures -- or just choose the best match)

O > refi_zone h100 h105

Refi > No object defined.

Refi > CDR H100 H105 ALL

Refi > Refining zone H100 to H105 in molecule CDR , object ALL

Refi > R.m.s.d. in bond lengths, angles, fixed diherals

Refi > 0.04 2.81 3.91

Refi > Accept new coordinates? Hit *Yes/*No

O > yes

O > obj all zone ; end

O > mutate_replace

Mut> Mutate a molecule by replacing one residue type

Mut> by another.

Mut> Molecule ([CDR ]) :

Mut> Residue name and new type (<cr> to end) : l90 lys

Mut> Residue name and new type (<cr> to end) :

Mut> There are 1 mutations

Mut> The Rotamer_DB is now being loaded.

O > lego_si_ch

O > yes (after dialing through the rotamers)

O > obj all zone ; end

O > refi_zone cdr l90 l95

Refi > CDR L90 L95 ALL

Refi > Refining zone L90 to L95 in molecule CDR , object ALL

++++++++++

++++++++++

++++++++++

++++++++++

Refi > R.m.s.d. in bond lengths, angles, fixed diherals

Refi > 0.04 4.86 20.13

Refi > Accept new coordinates? Hit *Yes/*No

O > yes

O > s_a_i cdr3s.pdb cdro

Sam> File type is PDB

Sam> Nothing marked for deletion, so no compression.

Sam> Space for 94157 atoms

Sam> Space for 10000 residues

Sam> Molecule CDRO contained 12 residues and 114 atoms

O > mol cdro

O > Current molecule CDR3S has not been loaded.

O > obj allo zone ; end

O > paint_colour cyan paint_object allo

Paint> ALLO

Appendix B

DESCRIPTION OF SOME OF THE FILES IN YOUR DIRECTORY

The following files in your directory should not be altered or deleted.

.alias This file contains alias commands such as the comands to search and retrieve files from the Brookhaven Data Base. Do not delete any existing commands because the O manual assumes that they are defined in your file, and parts of the manual will not work as described if these commands are deleted.

.login This is a shell script that is executed every time you log in and contains information about what prompt symbol is used, paths, etc.

.cshrc This file is also executed when you log in. Your version of .cshrc simply activates your .alias file so that you can use your alias commands.

You should only touch any of the above three files if you really know what you're doing. If you happen to clobber, abuse, mangle or destroy any of the above files, you can copy new versions to your directory from the student directory located in /ulhhmi/student using the cp command (e.g., if you want to copy the .alias file from the student directory to your home directory, simply type "cp /ulhhmi/student/.alias ~"). Remember that in UNIX, this command will overwrite the existing .alias file!

The following files are copied to your directory by the "setup O_files" command.

170.o This is the standard ("blank") saved database used to start O.

170.omac This is a macro that sets up most of the major parameters (dictionary file locations, the menu) within O.

170menu.o This is an O database file that defines the standard menu. It is read into O by the 170.omac macro.

170_example1.o This is a saved O database containing the polyalanine/polyleucine peptide used in Chapter 2 of this primer.

If you happen to fold, spindle or mutilate these files, they can easily be restored by the "setup O_files" command.

Appendix C

SOME USEFUL UNIX COMMANDS

command description

ls lists the files in the current directory

cd xyz change directory to the xyz directory (relative path name)

cd .. change directory to the parent of the current directory

cd /usr/bin change directory to /usr/bin (full path name)

more f1 shows one screen full of f1 and pauses for instructions

rm f1 removes f1 from the current directory

cp f1 f2 makes a copy of f1 and names it f2

mv f1 f2 gives the new name f2 to f1

man c1 lists the manual page (description) of command c1

xman graphical tool for looking up manual pages

grep xx f1 displays all lines of the file f1 that contain the string xx

history gives a list of the last 50 commands entered

!45 re-executes the 45th command

!c re-executes the last command beginning with a c

!! re-executes the most recent command

vi archaic text editor -- help sheets available in 158 Braun

zip mouse-driven text editor -- help sheets available in 158 Braun

elm e-mail software -- do not use mail

help Mosaic/WWW software -- on-line access to the real O manual

left mouse button click and drag to highlight text

middle mouse button click to paste highlighted text at the cursor

More UNIX help is available in the back pages of O for Morons.