Fundamentals of Sequence Analysis, 1998-1999 Problem set 1: Computing basics If you get stuck, refer to the OpenVMS and GCG resources in the class home page. Problem group 1. Logging in (If you do not already have an account click here, fill out the form, and within a day (excluding weekends) your account will be ready for use). Use a communications program on your PC or Mac, set to emulate a VT100 or higher, and log onto the SAF machine SEQAXP (for TELNET or RLOGIN use the name SEQAXP.BIO.CALTECH.EDU, for DECNET CTERM or LAT connections just use SEQAXP). The information that rolls by on the computer screen is important - you should always read it. This is even true for the text you see when you log on. Test your understanding of what the login text said by answering the following questions: 1A. What version of the Genbank database is available locally? 1B. What does the GCG program "DIVERGE" do? 1C. A disk block is 512 bytes. How much disk space do you have available in bytes, and how many bytes can you put on disk before you run out of space? Problem group 2. Commands SEQAXP uses the OpenVMS operating system, which you will interact with through command lines. That is, you will tell the computer what to do by typing in commands, as opposed to selecting options from menus as on a Macintosh or Windows machine. The general form of an OpenVMS command is: $ verb/qualifier parameter/qualifer $ prompt (the computer puts this on your screen to indicate that it is ready for another command - everything else you type in, and generally, it is not case sensitive.) verb the action to perform, such as COPY /qualifier modifies the action of a verb or parameter, most qualifiers go on the verb, some on the parameter parameter(s) the object(s) of the verb, parameters are separated by spaces from the verb and from each other. The most important verb is probably HELP - it puts you into the online help system. At the bottom of the help page you will see the list of help libraries that have been added to the local system, access these by preceding the name with an "@". What do these commands do (go ahead and type them in, nothing bad will happen)? 2A. $ COPY/CONFIRM CLASS:generic_login.com  2B. $ DIR/SINCE 2C. $ TYPE/PAGE generic_login.com 2D. $ HELP @LOCAL GENER OPEN EDITORS 2E. $ HELP @LOCAL - GENER OPEN - EDITORS 2F. $ RECALL HE 2G. Try pushing the up/down/right/left arrows on the keyboard 2H. $ mytype :== type/page $ mytype generic_login.com Problem group 3. Directories and files Like most operating systems, OpenVMS stores data on disks in files which are arranged heirarchically in directories. Here are the more common directory related commands: (go ahead and issue these commands, but leave off the part in italics!) $ show default Give the current directory name $ create/dir [.subdir] Create a subdirectory $ set default [.subdir] Move into it $ set default [-] Move up from it $ set default SY$SLOGIN Move to your home directory All files have this form: diskname:[directory.subdirectory]name.extension;version Any of these fields that you don't specify default, for instance mygcgsequence.seq means the highest numbered version of this file that is in your default directory. If you modify a file a higher numbered version will be created - this means that if you mess up, the original is still around. If you delete a file you will need to specify which version or just use a trailing ";" to mean "the most recent one". When doing sequence analysis 99.9% of the files that you will use will consist of plain text, and so the most common operations that you will perform (and the names of the command to do them) are: CREATE, COPY, DELETE, DIRECTORY, EDIT, PRINT, PURGE, RENAME, TYPE Most of these are self explantory. PURGE is a form of delete that removes only lower numbered versions of files. DIRECTORY tells you what files are in a directory. You can use wildcards to match parts of filenames, "*" matches anything, "%" matches any single character. 3A. How many files are in your directory and how much space do they occupy? 3B. Print jobs can be directed to local laserprinters. Issue the command: $ SHOW QUEUE * What is the name of the queue that goes to your local laser printer? (If you don't see one, and you have a networked printer, request that one be set up for you.) 3C. Do you have a LOGIN.COM file in your home directory? (The commands in this file run automatically when you login to configure your process.) If not, rename generic_login.com (from 2A, above) and edit it (see 2D, above) to reflect the appropriate print queue for your lab. Invoke it with the command: $ @login then verify that print jobs come out on your printer with the command: $ print login.com 3D. What command do you use to clean out old versions of files that are in your directory? Try it now, did it work? Problem group 4. File protections Files have access protections that can be set differently for each of four levels of users: System, Owner, Group, and World. System is the operating system or the system operator, owner is the owner of the files, group is other users in the same group (as in, the same lab), and world is everybody else. Access protections are allowed for Read, Write, Execute, and Delete. In general, the default protections is correct, it is equivalent to: $ set file/protection=(s:rwed,o:rwed,g:re,w) filename Some files are protected better, for instance MAIL files are meant to be manipulated only from within MAIL utilities, so their protection is set so that you can't modify them by accident (and don't override this!) Directories default to a protection that prevents them from being deleted, so if you ever do need to delete a subdirectory, you must issue a command similar to that above (after removing all files within it.) 4A. What is the protection on the files in your directory? (Hint, HELP DIR) 4B. What happens when you try to read a file that you don't have access to? Try: $ COPY [-.MATHOG]login.com  4C. What do you think will happen when you block access to a file from the SYSTEM account? Daily backup tapes are made of all user files from the SYSTEM account. If the user disk fails, and the files are restored from tape onto a replacement drive, will a file that was protected from SYSTEM read access be restored to your directory? Problem group 5. Data transfer At some point you will all need to move files back and forth between SEQAXP and your own computers. The simplest method is to use FTP. One FTP program for Macintoshes is called FETCH (ftp://ftp.dartmouth.edu/pub/mac/Fetch_3.0.3.hqx.) There are an assortment of FTP utilities for windows machines too, for instance, WSFTP ( assorted download sites and http://www.ipswitch.com/downloads/ws_ftp_LE.html). There are many other possibilities - the "publish" feature in many programs will do FTP, but don't use it unless you can specify BINARY or ASII type. There are a couple of key points to remember when moving data to/from SEQAXP. Text files should go in ASCII mode, and if they originate on a PC or Macintosh they should consist of a series of small (<200 character) lines. Binary files should go in BINARY mode. Respect the naming convention on Seqaxp (see 3 above) or the file will come over, but the name will be mangled, perhaps to the point where you don't recognize it. (In FETCH, turn off the option that automatically appends .txt or .bin!) 5A. Use FTP on your PC or Macintosh. Copy login.com from your account to your PC/Mac, then back to seqaxp. (Remember, this is a text file.) Call the new copy "new_login.com". Did the transfer work correctly? Look for subtle errors with this command: $ DIFF login.com new_login.com 5B. Login to seqaxp, create a subdirectory called [.KILLME], and copy your login.com file into it. Repeat the transfer as in 5A, but this time against the file in the new subdirectory. Again, check that the transfer did not change the file's content. Now remove any files in the [.KILLME] directory, and then delete the directory itself. 5C. There are numerous ways to mess up file transfers, sending ASCII as BINARY, or vice versa, or sending files with lines that are too long. If a file that you have loaded on SEQAXP misbehaves you can analyze it to see what is wrong. Issue the command: $ ANALYLZE/RMS CLASS:TOOLONG.TXT and look at the RMS FILE ATTRIBUTE section. Why might this file cause problems for some programs? Some computers in the division have Pathworks installed. This commercial software provides a DECNET transport on Macintoshes. If you have it there will be a program around called NCP. Only if you see this program on your computer do this next part. 5D. DECNET allows most OpenVMS commands to function over the net. This can be very convenient for moving text files to/from a Macintosh. (A version for PCs also exists, but I don't believe that anybody here uses it.) Let's assume that your hard disk is called "BIGDISK" on the machine "MACNAME", and that you have run the NCP program on your Macintosh and configured it to allow proxy connections from your SEQAXP account. Try this from your SEQAXP account: $ DEFINE DESKTOP MACNAME::BIGDISK:[DESKTOP_FOLDER] $ copy login.com desktop: If DECNET is working, you should see a file called "login.com" on your desktop. What command would you use *on SEQAXP* to view the contents of that file? (Hint, DESKTOP has been defined as a logical name.) Problem group 6. GCG basics The GCG software by and large behaves as does other OpenVMS software. However, there are a few gotchas, and these are documented in the GCG beginner's FAQ, which you should refer to in order to answer these questions. 6A. Configure your graphics device appropriately (for most terminal emulators that is some form of Tektronix emulation). Issue the command: $ SHOWPLOT What do you see? 6B. What are the command line options for REFORMAT? 6C. Use REFORMAT to put into GCG format the following PROTEIN sequence: AAAGCTCTTGGGTTTT (Hint, put that sequence into a file, and then run REFORMAT on it). Now look at the resulting file (TYPE it). Does the line with a ".." indicate that this is protein? Figure out the correct operation to make this sequence into a GCG protein sequence file. (Note, get in the habit of naming GCG protein sequence files whatever.pep, and GCG nucleic sequence files whatever.seq - that way you can tell that they are at a glance. 6D. Use the GCG program SEQED to edit this sequence - put a P on the end and change the first A to an S. What happens if you leave the program with a QUIT, and what happens when you leave with an EXIT? (Look at the edited file to see).