Fundamentals of Sequence Analysis, 1995-1996 Answer set 1: Computing basics Problem group 1. Logging in 1A. What version of the Genbank database is available locally? Genbank 92.0, released 12/95. 1B. What does the GCG program "DIVERGE" do? $ GENHELP DIVERGE Diverge measures the percent divergence of two protein coding sequences using the method of Perler et al. 1C. A disk block is 512 bytes. How much disk space do you have available in bytes, and how many bytes can you put on disk before you run out of space? At the end of the login you will see two lines like: User [GROUP,USERNAME] has 99 blocks used, 19901 available, of 20000 authorized and permitted overdraft of 50000 blocks on USRDISK So you can use up to 512 * 20000 = 10 Mbytes of disk space. If you go over your diskquota you will be unable to put new files on disk, which will break just about all of the software, and you will see a message about "quota exceeded". (The overdraft is available so that a running program can go over diskquota without being forced to fail. If this happens, be sure to clean up after it or you will see the standard over diskquota problems for all subsequent programs.) Problem group 2. Commands 2A. $ COPY/CONFIRM CLASS:generic_login.com  It copies the file "generic_login.com" from the "CLASS" directory to your default directory. 2B. $ DIR/SINCE This lists all files that have been placed in the default directory since midnight. 2C. $ TYPE/PAGE generic_login.com Display the contents of the file, one page (screenful) at a time. 2D. $ HELP @LOCAL GENER OPEN EDITORS Enter the HELP utility and look at information on text editors. 2E. $ HELP @LOCAL - GENER OPEN - EDITORS Same as above - this demonstrates continuation lines. Not important here, but it would be in a very long command. 2F. $ RECALL HE Recall the last command beginning with "HE", which is the previous help command. 2G. Try pushing the up/down/right/left arrows on the keyboard Up and down move among recent commands, left and right move across them so that they can be edited and reissued. 2H. $ mytype :== type/page $ mytype generic_login.com Define your own symbol for a particular action and then execute it. If you want some shortcuts always defined put the definitions in your login.com file. Problem group 3. Directories and files 3A. How many files are in your directory and how much space do they occupy? The answer varies. To find out, use the command: $ DIR/SIZE 3B. Print jobs can be directed to local laserprinters. Issue the command: $ SHOW QUEUE * What is the name of the queue that goes to your local laser printer? (If you don't see one, and you have a networked printer, request that one be set up for you.) The answer varies depending on your lab. If you were in the Zinn lab the answer would be: ZINN_LW. Most labs have print queues with a similar syntax 3C. Do you have a LOGIN.COM file in your home directory? (The commands in this file run automatically when you login to configure your process.) If not, rename generic_login.com (from 2A, above) and edit it (see 2D, above) to reflect the appropriate print queue for your lab. Invoke it with the command: $ @login then verify that print jobs come out on your printer with the command: $ print login.com Ok, there's no answer - it was a ruse to get you to create a working login.com file! 3D. What command do you use to clean out old versions of files that are in your directory? Try it now, did it work? $ PURGE removes older versions of files (those with smaller version numbers). The easiest way to tell if it worked, meaning, that files were deleted, is to use $ PURGE/LOG which will list the name of each file as it is deleted. Problem group 4. File protections 4A. What is the protection on the files in your directory? (Hint, HELP DIR) Use the command: $ DIR/OWNER/PROT and you should see a series of lines like this: CMD.HTH;1 [GROUP,USERNAME] (RWED,RWED,RE,) 4B. What happens when you try to read a file that you don't have access to? Try: $ COPY [-.MATHOG]login.com  It doesn't let you do it, and gives you this error message: Error opening SEQAXP$DKA200:[USERS.MATHOG]LOGIN.COM;43 as input Insufficient privilege or file protection violation 4C. What do you think will happen when you block access to a file from the SYSTEM account? Daily backup tapes are made of all user files from the SYSTEM account. If the user disk fails, and the files are restored from tape onto a replacement drive, will a file that was protected from SYSTEM read access be restored to your directory? If you block read access to a file from a SYSTEM account you're living very dangerously. The file will not be backed up on tape, and consequently cannot be restored. Furthermore, you're not protecting the file's contents in any way since anybody with SYSTEM level privileges can bypass the file protection system. Problem group 5. Data transfer 5A. Use FTP on your PC or Macintosh. Copy login.com from your account to your PC/Mac, then back to seqaxp. (Remember, this is a text file.) Call the new copy "new_login.com". Did the transfer work correctly? Look for subtle errors with this command: $ DIFF login.com new_login.com There is no answer - it should have worked. 5B. Login to seqaxp, create a subdirectory called [.KILLME], and copy your login.com file into it. Repeat the transfer as in 5A, but this time against the file in the new subdirectory. Again, check that the transfer did not change the file's content. Now remove any files in the [.KILLME] directory, and then delete the directory itself. There is no answer - it should have worked. 5C. There are numerous ways to mess up file transfers, sending ASCII as BINARY, or vice versa, or sending files with lines that are too long. If a file that you have loaded on SEQAXP misbehaves you can analyze it to see what is wrong. Issue the command: $ ANALYLZE/RMS CLASS:TOOLONG.TXT and look at the RMS FILE ATTRIBUTE section. Why might this file cause problems for some programs? This is the line that matters: Longest Record: 326 some programs may choke on lines this long. Optional question, only for Pathworks users. 5D. DECNET allows most OpenVMS commands to function over the net. This can be very convenient for moving text files to/from a Macintosh. (A version for PCs also exists, but I don't believe that anybody here uses it.) Let's assume that your hard disk is called "BIGDISK" on the machine "MACNAME", and that you have run the NCP program on your Macintosh and configured it to allow proxy connections from your SEQAXP account. Try this from your SEQAXP account: $ DEFINE DESKTOP MACNAME::BIGDISK:[DESKTOP_FOLDER] $ copy login.com desktop: If DECNET is working, you should see a file called "login.com" on your desktop. What command would you use *on SEQAXP* to view the contents of that file? (Hint, DESKTOP has been defined as a logical name.) $ type desktop:login.com Problem group 6. GCG basics 6A. Configure your graphics device appropriately (for most terminal emulators that is some form of Tektronix emulation). Issue the command: $ SHOWPLOT What do you see? You should see a bunch of squares, circles, and text, with the phrase "Genetics Computer Group" in fancy text at the bottom. 6B. What are the command line options for REFORMAT? $ reformat/check Reformat rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs. Minimal Syntax: $ REformat [/INfile=]Reformat.Txt /Default Prompted Parameters: None Local Data Files: /DATa=Translate.Txt three-letter to one-letter codes Optional Parameters: /LINesize=50 sets number of characters per line /BLOcksize=10 sets number of characters per block /BLAnklines=1 puts blank lines between the sequence lines /NONUMbering suppresses numbering /NOCOMments suppresses comments /DNA changes U into T /RNA changes T into U /UPPer makes all sequence characters uppercase /LOWer makes all sequence characters lowercase /LIStfile[=Reformat.List] writes a list file of output sequence names Press q to quit or
for more: /MSF reformats sequences into an MSF output file /DEGap removes gap characters (.) from the sequence /THReeintoone translates three-letter peptides into one-letter /ONEIntothree translates one-letter peptides into three-letter /COMparison reformats a table instead of a sequence /ENZymedata reformats an enzyme data file instead of a sequence (used with /PROtein, reformats a protein enzyme data file) /PROtein insists that the sequences are reformatted as protein sequences /NUCleotide insists that the sequences are reformatted as nucleic acid sequences /PROFile reformats an old profile into the new profile format /EXTension=.Seq defines a file name extension /TRANSlate=FileName.Txt lets you name the output translation table [/OUTfile=]NewSeqName lets you name the output file /NOMONitor suppresses the screen trace showing each output file /BEGin beginning of range, defaults to 1 /END end of range, defaults to Maximum sequence length Use these to extract a subsequence from a sequence or MSF file. /DELete delete the subsequence in the range, leave the rest6C. Use REFORMAT to put into GCG format the following PROTEIN sequence: AAAGCTCTTGGGTTTT (Hint, put that sequence into a file, and then run REFORMAT on it). Now look at the resulting file (TYPE) - does the line with a ".." indicate that this is protein? Figure out the correct operation to make this sequence into a GCG protein sequence file. (Note, get in the habit of naming GCG protein sequence files whatever.pep, and GCG nucleic sequence files whatever.seq - that way you can more easily keep track of them.) GCG has to guess that this is a nucleic acid and does so: Killme.Seq Length: 16 January 16, 1996 16:47 Type: N Check: 625 .. ^ If it were in fact a peptide, you could have forced it to the correct type with: $ reformat/infile=killme.pep/protein 6D. Use the GCG program SEQED to edit this sequence - put a P on the end and change the first A to an S. What happens if you leave the program with a QUIT, and what happens when you leave with an EXIT? (Look at the edited file to see). QUIT doesn't save your changes, EXIT does. This is the same as the OpenVMS EDT editor.