The purpose of this class is to provide enough training so that those who have completed it will able to use and understand the tools the SAF has to offer.
This class is informal - it has no units and there will be no grades.
Homework will be assigned but not collected. Answers to each homework will be provided the following week.
Please interrupt me during the lecture if I say something that is confusing or unclear.
All class material will be provided through our Web server, located at http://seqaxp.bio.caltech.edu/. The FAQ and OVERVIEW sheets in the DOCUMENTATION sections will get you going faster than anything else - they contain the important points distilled from other manuals. Probably you don't want to print these documents, since they are strung together with hypertext links and would lose that as plain text.
The last two lectures are special topics - RNA folding and Web based tools. If there are any special topics you want covered instead, please let me know.
The SAF has an assortment of computers, but the primary one for sequence analysis work is Seqaxp, a Digital 2100 4/200 server, running the OpenVMS operating system. We use this combination because it is fast, reliable and relatively easy to use.
To use SEQAXP you will need an account, which you may access via telnet by providing your Username and Password. You can submit a request for an account by filling out the form on our web site, at URL: http://seqaxp.bio.caltech.edu/www/mail_account.html
We have to charge for facility usage to recover costs. The current rate is $3.00/hour, or 5 cents/minute for connect time. There is an idle job killer that removes unused sessions - so you won't get charged a lot should you forget to logout. There is no charge for accessing our Web pages.
Macintosh: NCSA Telnet or BetterTelnet
Some programs make use of the keypad area on the keyboard. They have defined commands for each button press there. For instance, the editors do this, as does the MAIL utility. For these to work, your terminal emulator must be configured correctly, in particular, it should emulate a VT100 or VT200 series terminal, and should communicate this to Seqaxp when you connect. Use:
$ SHOW TERMINAL
to see what Seqaxp thinks your emulator is, and make sure it agrees with the emulator settings. The "delete previous character" symbol on OpenVMS is "del", not "backspace", be sure your terminal emulator sends the former, or everytime you try to delete a character you will instead move the cursor to the beginning of the line.
If there is one golden rule for using OpenVMS, it is "when in doubt, type HELP".
This will list the various help categories, and at the bottom, the available
help libraries. Move through this information by entering words at
the prompts, or by specifying the full path at the command line.
$ HELP HINTS HINTS Type the name of one of the categories listed below to obtain a list of related commands and topics. To obtain detailed information on a topic, press the RETURN key until you reach the "Topic?" prompt and then type the name of the topic. Topics that appear in all uppercase are DCL commands. Additional information available: Batch_and_print_jobs Command_procedures Contacting_people Creating_processes Developing_programs Executing_programs Files_and_directories Logical_names Operators_in_expressions Physical_devices Security System_management Terminal_environment User_environment HINTS Subtopic? ^Z $ HELP @LYNX LYNX LYNX NAME lynx - a general purpose distributed information browser for the World Wide Web Additional information available: SYNOPSIS DESCRIPTION OPTIONS COMMANDS NOTES ACKNOWLEDGMENTS AUTHORS @LYNX LYNX Subtopic?
The golden rule for using the SAF is, check the SAF "Software Documentation" web pages.
This is what an OpenVMS command looks like:
$ verb/qualifier parameter/qualifier
The VERB tends to be the English word you'd expect for a particular operation, like COPY or SEARCH. Commands are not case sensitive, that is, you can use upper or lower case and it doesn't care. However, parts of parameters and qualifiers can be case sensitive. If you see this, include the part that is case sensitive in double quotes.
$ search/exact login.com "type"
It's a good idea to remember what the parts of the command line are called because the error messages use these terms. (These are all bad commands:)
$ foobar bad verb $ dir/foobar bad qualifier $ dir ^foobar bad parameter
Here is another command example which shows the names and dates of any files having a D in the name:
$ dir/date *d*.* CONFIDENCE.;1 22-APR-1998 09:12:10.24 DEC.KBD;3 24-AUG-1995 16:20:33.46
"*" is a wildcard - match anything, "%" matches any one character. Commands can be recalled and edited. Use arrow keys to do that. Use Recall (only the first 4 letters are required) to recall commands by name, ie,
$ reca d $ reca/all
To control your process or terminal use these control keys. In each case, hold down control (shown here as a caret) and the key. The most commonly used control keys are:
Everything defaults if not specified. We only have one node, and usually your files will all be on your login disk, so you can usually get by with one of these forms:
It is best to organize file your files in directories. On OpenVMS you have a "default directory", which is "where you are":
$ show default $ set default [.subdir] move into the subdirectory $ set default [-] move up from a subdirectory $ set default SYS$LOGIN move to the login directory
$ dir/prot/owner show file ownership and protection $ set file/prot=(s:rwed,o:rwed,g:re,w) filename set the protection on this file
Do the homework if you really want to understand these.
When you login a procedure called "LOGIN.COM" is run automatically if it is present in your home directory. You can use this to customize your environment, define various commands and other information. This is described in some detail in the homework and the OpenVMS beginner's FAQ.
Use ASCII mode FTP for sequence files. If they originate on other systems make sure that they have been formatted as a series of short lines, less than 132 characters each. If the file contains only sequence (specifically, no comments), you can use the CHOPUP command, then the REFORMAT command to convert the result of just about any transfer mode into a valid GCG format file.
Use BINARY mode FTP transfer for a few things like ABI sequencer traces or CGM graphics files.
Make sure names are consistent with OpenVMS usage (not more than one period and one semicolon, best if one case). The PC/Mac FTP program, such as FETCH on the Mac, may let you enter any name, but the resulting name on the OpenVMS side will be horrific, full of dollar signs and 5n's and so forth.
GCG stands for Genetics Computer Group, which is a small company that branched off from the University of Wisconsin, and was recently purchased by Oxford Molecular. http://www.gcg.com/. The GCG package is arguably the best of the commercial Molecular biology software packages in terms of completeness, cost and support. Up through version 8.1 they also provided full source code so that local debugging and modification of their programs was possible, but at 9.0 they changed the terms, so we elected to stay at version 8.1.
EGCG is a set of programs that were written by an assortment of people, primarily in Europe. These programs are built on top of the GCG code. When GCG changed the license terms at 9.0 it made it impossible to upgrade the EGCG set, so that too is stuck at 8.1.
$ GENHELP help on GCG programs by program name $ GENMANUAL help on GCG programs by topic $ EGENHELP help on EGCG programs by program name $ EGENMANUAL help on EGCG programs by topic
These are best accessed from the SAF Webserver Software Documentation page, which has links to these, as well as indices for each.
Use SETPLOT or the specific graphic command, usually one of TEKTRONIX/REGIS/POSTSCRIPT/XWINDOWS/CGM to configure graphics BEFORE you use them. Confirm the setting with SHOWPLOT, test them with PLOTTEST.
$ setplot +---------------------> displaying all of 12 option(s) <---------------------+ |ColorX Color X Windows Graphics Window | |Versaterm Tektronix 4105 mode on Versaterm | |Tek4014 Tektronix 4014, for NCSA Telnet | |PCSmartTermRegis VT340 mode for PC SmartTerm 340 | |DECTermRegisRegis VT340 mode for DECTerm | |PStoFILE Print postscript -> file | |PStoLaser Print postscript -> local laserwriter (no flag page) | |PStoMAIN Print postscript - > Braun 158 printer (flag page) | |PS2toLaser Print postscript at 2/page -> local laserwriter(no flag page) | |PS2toMain Print postscript at 2/page -> Braun 158 printer (flag page) | |COLORPS Print color postscript -> Braun 158 (noflag page, _NOT_FREE_) | |CGM Print through the CGM driver to a file CGM.OUT | +------------------------------------------------------------------------------+ Enter a command. Choices are: <up-arrow> and <down-arrow> scroll the list <return> makes GCG use the selected device Q quits without doing anything C creates and edits a new device (you can't delete from the site file) V views the selection (use C to edit a copy)
or use $ TEKTRONIX VERSATERM-TEK4105 TT
$ showplot Plotting Configuration set to: Language: TEKTRONIX Device: VERSATERM-TEK4105 Port or Queue: TERM: $ plottest PlotTest plots a test pattern to see if your plotter is configured properly. The test pattern uses every GCG graphics feature. It should resemble the example test pattern in the PROGRAM MANUAL. Process set to plot with VERSATERM-TEK4105 attached to TERM: using the TEKTRONIX graphic interface. When your VERSATERM-TEK4105 attached to _Seqaxp$Nty1166: is ready, press <Return>.
GCG commands look like any other OpenVMS command. Most of them have only qualifiers - no parameters, except that if there is a single input file, it can usually be passed as either.
$ reformat filename $ reformat/infile=filename
GCG and EGCG options are generally entered on the command line as qualifiers. Mandatory ones will be prompted for from within the program if they are not present on the command line. Optional ones will not be, that is, if you want to use an optional qualifier for a GCG program, it MUST go on the command line.
$ reformat it will prompt for a filename
The /CHECK qualifier may be used in any GCG or EGCG program to get a quick list of command line options
Many local modifications are not documented in GENHELP etc., and are only evident if you do a /CHECK on the command!!! For instance, the /begin and /end qualifiers on REFORMAT are only present at our site.
The/DEFAULT qualifier may be used to force GCG or EGCG programs to supply default values for mandatory qualifiers. For instance, it forces /begin/end to be the start and end of the sequence.
The locations of some important files are pointed to by logical names, which are a sort of shortcut for pointing to directories. For instance, the logical names for databases are: GB,PIR,SW,NRL_3D,EPD and can be referenced like:$ fetch gb:X02974 accession_number $ fetch gb:dmwhite name
Some of the matrices and other accessory data are in GENRUNDATA, GENMOREDATA, etc, all of which are subsumed under GCGDATADIRS. So to find all comparison matrices, for instance:$ dir GCGDATADIRS:*.cmp
There are a bunch of "gotchas" when using GCG, mostly having to do with syntax. Rather than repeat them here, have a look at the GCG beginner's FAQ (http://seqaxp.bio.caltech.edu/www/GCG_BEGINNERS_FAQ.HTML), see especially the section "Confusing parts of the GCG system".
nonGCG basicsThe SAF also has dozens of other programs, many from Unix, each with its own type of interface. In general you have to read their documentation to use them properly. In addition, there are DCL scripts wrapped around many programs, so that you don't actually ever see the "real" program. For instance, when you run BLAST on seqaxp, the prompts you see all come from such a script.
Next week we'll cover sequence alignment.
Pico is the text editor which comes with the Pine mail program. It is very easy to use, but not particularly powerful. Start it like this:$ pico killme.txt UW PICO(tm) 2.5 File: killme.txt [ New file ] ^G Get Help ^O WriteOut ^R Read File ^Y Prev Pg ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where is ^V Next Pg ^U UnCut Text^T To Spell
Then follow the directions on the screen to learn how to use the commands, which are mostly just control key combinations. For instance, press ^G to read the help file.
There are many editors on OpenVMS, of these, EDT is somewhat easier to learn and use than is TPU, which is actually the default editor. Here is how to use EDT:$ edit/edt [EOB] Input file does not exist
The trick is to know how to get to the keypad help, which is accomplished by pushing the second key from the left on the top row of the numeric keypad. On a PC, it is labeled "/", on a Macintosh "=", and on a Digital keyboard "PF2". Doing so brings up this screen.+-----------------------------------+ +-----------------------------------+ | ^ | DOWN | | | | | | FNDNXT | DEL L | | | | | | <---- | ----> | | GOLD | HELP | | | | | | | | LEFT | RIGHT | | | | FIND | UND L | | UP | v | | | +-----------------------------------+ +-----------------------------------+ | PAGE | SECT | APPEND | DEL W | DELETE Delete character | | | | | LINEFEED Delete to beginning of word | COMMAND| FILL | REPLACE| UND W | BACKSPACE Backup to beginning of line +-----------------------------------+ CTRL/A Compute tab level | ADVANCE| BACKUP | CUT | DEL C | CTRL/D Decrease tab level | | | | | CTRL/E Increase tab level | BOTTOM | TOP | PASTE | UND C | CTRL/K Define key +-----------------------------------+ CTRL/R Refresh screen | WORD | EOL | CHAR | | CTRL/T Adjust tabs | | | | ENTER | CTRL/U Delete to beginning of line |CHNGCASE| DEL EOL| SPECINS| | CTRL/W Refresh screen +--------------------------| | CTRL/Z Exit to line mode | LINE | SELECT | | | | | SUBS | Press a key for help on that key. | OPEN LINE | RESET | | To exit, press the spacebar. +-----------------------------------+
If at this point you touch one of the keypad keys, more help will be shown. For instance, touch the key shown as DEL L/UND L to see:DEL L - (PF4) Deletes text from the cursor position to the end of the current line, including the line terminator. If the cursor is positioned at the beginning of a line, the entire line is deleted. The deleted text is saved in the delete line buffer. UND L - (GOLD PF4) Inserts the contents of the delete line buffer directly to the left of the cursor. To return to the keypad diagram, press the return key To exit from HELP, press the spacebar For help on any other keypad key, press the key
Most keyboards currently on the market fuse the DEL W/DEL C keys, and on these keyboards, pressing that fused key usually results in a DEL C action.
To get out of EDT, press ^Z to get to line mode. There are a variety of commands available in line mode, but most of you will not use most of them. If you are interested, enter HELP to find out more. In any case, you will need to know exactly two line mode commands, which areEXIT leave editor, save changes QUIT leave editor, do not save changes