Fundamentals of Sequence Analysis, 1995-1996
Problem set 1:  Computing basics

If you get stuck, refer to the OpenVMS and GCG resources in the 
class home page.


Problem group 1.  Logging in

(If you do not already have an account click here, 
fill out the form, and within a day (excluding weekends)
your account will be ready for use). 

Use a communications program on your PC or Mac, set to
emulate a VT100 or higher, and log onto the SAF machine
SEQAXP (for TELNET or RLOGIN use the name
SEQAXP.BIO.CALTECH.EDU, for DECNET CTERM or LAT connections
just use SEQAXP). 

The information that rolls by on the computer screen is
important - you should always read it.  This is even true
for the text you see when you log on.  Test your
understanding of what the login text said by answering the
following questions: 

1A.  What version of the Genbank database is available locally?
1B.  What does the GCG program "DIVERGE" do?
1C.  A disk block is 512 bytes.  How much disk space do you
       have available in bytes, and how many bytes can you
       put on disk before you run out of space? 

Problem group 2.  Commands

SEQAXP uses the OpenVMS operating system, which you will
interact with through command lines.  That is, you will
tell the computer what to do by typing in commands,
as opposed to selecting options from menus as on a Macintosh
or Windows machine.  The general form of an OpenVMS command
is: 

  $ verb/qualifier parameter/qualifer

  $            prompt (the computer puts this on your screen 
               to indicate that it is ready for another command -
               everything else you type in, and generally, it 
               is not case sensitive.)
  verb         the action to perform, such as COPY
  /qualifier   modifies the action of a verb or parameter, most
               qualifiers go on the verb, some on the parameter
  parameter(s) the object(s) of the verb, parameters are separated
               by spaces from the verb and from each other.

The most important verb is probably HELP - it puts you into
the online help system.  At the bottom of the help page you
will see the list of help libraries that have been added to
the local system, access these by preceding the name with an
"@". 

What do these commands do (go ahead and type them in, 
nothing bad will happen)?

2A.   $ COPY/CONFIRM CLASS:generic_login.com []
2B.   $ DIR/SINCE
2C.   $ TYPE/PAGE    generic_login.com
2D.   $ HELP @LOCAL  GENER OPEN EDITORS
2E.   $ HELP @LOCAL -
        GENER OPEN -
        EDITORS
2F.   $ RECALL HE
2G.   Try pushing the up/down/right/left arrows on the keyboard
2H.   $ mytype :== type/page
      $ mytype generic_login.com

Problem group 3.  Directories and files

Like most operating systems, OpenVMS stores data on disks in
files which are arranged heirarchically in directories. 

Here are the more common directory related commands:
(go ahead and issue these commands, but leave off the part 
in italics!)

  $ show default                   Give the current directory name
  $ create/dir  [.subdir]          Create a subdirectory
  $ set default [.subdir]          Move into it
  $ set default [-]                Move up from it
  $ set default SY$SLOGIN          Move to your home directory

All files have this form:

   diskname:[directory.subdirectory]name.extension;version_number

Any of these fields that you don't specify default, for instance

   mygcgsequence.seq

means the highest numbered version of this file that is in
your default directory.  If you modify a file a higher
numbered version will be created - this means that if you
mess up, the original is still around.  If you delete a file
you will need to specify which version or just use a
trailing ";" to mean "the most recent one". 

When doing sequence analysis 99.9% of the files that you
will use will consist of plain text, and so the most common
operations that you will perform (and the names of the
command to do them) are: 

  CREATE, COPY, DELETE, DIRECTORY, EDIT, PRINT, PURGE, RENAME, TYPE

Most of these are self explantory. PURGE is a form of delete 
that removes only lower numbered versions of files.  
DIRECTORY tells you what files are in a directory.

You can use wildcards to match parts of filenames, "*" 
matches anything, "%" matches any single character.

3A.  How many files are in your directory and how much
     space do they occupy?

3B.  Print jobs can be directed to local laserprinters.  
     Issue the command:  $ SHOW QUEUE *
     What is the name of the queue that goes to your local
     laser printer?
     (If you don't see one, and you have a networked printer,
     request that one be set up for you.)

3C.  Do you have a LOGIN.COM file in your home directory?
     (The commands in this file run automatically when
     you login to configure your process.)  If not,
     rename generic_login.com (from 2A, above) and edit
     it (see 2D, above) to reflect the appropriate print
     queue for your lab.  Invoke it with the command:
       $ @login
     then verify that print jobs come out on your printer
     with the command:
       $ print login.com     

3D.  What command do you use to clean out old versions of
     files that are in your directory?  Try it now, did it
     work?

Problem group 4.  File protections

Files have access protections that can be set differently
for each of four levels of users:  System, Owner, Group,
and World.  System is the operating system or the system
operator, owner is the owner of the files, group is other
users in the same group (as in, the same lab), and world
is everybody else.  Access protections are allowed for Read,
Write, Execute, and Delete. In general, the default 
protections is correct, it is equivalent to:
 
  $ set file/protection=(s:rwed,o:rwed,g:re,w) filename

Some files are protected better, for instance MAIL files are
meant to be manipulated only from within MAIL utilities, so
their protection is set so that you can't modify them by
accident (and don't override this!)  Directories default to
a protection that prevents them from being deleted, so if you
ever do need to delete a subdirectory, you must issue a
command similar to that above (after removing all files
within it.) 

4A.  What is the protection on the files in your directory?
     (Hint, HELP DIR)
4B.  What happens when you try to read a file that you don't
     have access to?  Try:  $ COPY [-.MATHOG]login.com []
4C.  What do you think will happen when you block access
     to a file from the SYSTEM account?  Daily backup
     tapes are made of all user files from the SYSTEM account.
     If the user disk fails, and the files are restored from
     tape onto a replacement drive,  will a file that was
     protected from SYSTEM read access be restored to your directory?

Problem group 5.  Data transfer

At some point you will all need to move files back and forth
between SEQAXP and your own computers.  The simplest method
is to use FTP.  One FTP program for Macintoshes is called
FETCH (you may pick it up via appleshare from NET5,
seqmacIICX, use the lastname, first initial of your PI as
the username, ie, ZinnK, and no password.)  There are an
assortment of FTP utilities for windows machines (copy one
from somebody else's PC if you don't have one yet.)

There are a couple of key points to remember when moving
data to/from SEQAXP.

 Text files should go in ASCII mode, and if they
     originate on a PC or Macintosh they should consist
     of a series of small (<200 character) lines.

 Binary files should go in BINARY mode.

 Respect the naming convention on Seqaxp (see 3 above)
     or the file will come over, but the name will be 
     mangled, perhaps to the point where you don't
     recognize it.  (In FETCH, turn off the option that
     automatically appends .txt or .bin!)

5A.  Use FTP on your PC or Macintosh. Copy login.com
     from your account to your PC/Mac, then back to seqaxp.
     (Remember, this is a text file.)  Call the new copy
     "new_login.com".  Did the transfer work correctly? 
     Look for subtle errors with this command: 

       $ DIFF login.com new_login.com

5B.  Login to seqaxp, create a subdirectory called [.KILLME],
     and copy your login.com file into it.  Repeat the transfer
     as in 5A, but this time against the file in the new
     subdirectory.  Again, check that the transfer did
     not change the file's content.  Now remove any files
     in the [.KILLME] directory, and then delete the 
     directory itself.

5C.  There are numerous ways to mess up file transfers, sending ASCII
     as BINARY, or vice versa, or sending files with lines that are too
     long.  If a file that you have loaded on SEQAXP misbehaves you can
     analyze it to see what is wrong.  Issue the command:

       $ ANALYLZE/RMS CLASS:TOOLONG.TXT

     and look at the RMS FILE ATTRIBUTE section.  Why might this file
     cause problems for some programs?

Some computers in the division have Pathworks installed. 
This commercial software provides a DECNET transport on
Macintoshes.  If you have it there will be a program around
called NCP.  Only if you see this program on your computer
do this next part. 

5D.  DECNET allows most OpenVMS commands to function over
     the net. This can be very convenient for moving text
     files to/from a Macintosh.  (A version for PCs also
     exists, but I don't believe that anybody here uses it.)
     Let's assume that your hard disk is called "BIGDISK"
     on the machine "MACNAME", and that you have run the NCP
     program on your Macintosh and configured it to allow
     proxy connections from your SEQAXP account.  Try this
     from your SEQAXP account: 

     $ DEFINE DESKTOP MACNAME::BIGDISK:[DESKTOP_FOLDER]
     $ copy login.com desktop:

     If DECNET is working, you should see a file called
     "login.com" on your desktop.  What command would you
     use *on SEQAXP* to view the contents of that file?
     (Hint, DESKTOP has been defined as a logical name.)


Problem group 6.  GCG basics

The GCG software by and large behaves as does other OpenVMS software.
However, there are a few gotchas, and these are documented in the
GCG beginner's FAQ, which you should refer to in order to
answer these questions.

6A.  Configure your graphics device appropriately (for most terminal
     emulators that is some form of Tektronix emulation).  Issue the 
     command:  $ SHOWPLOT
     What do you see?

6B.  What are the command line options for REFORMAT?

6C.  Use REFORMAT to put into GCG format the following PROTEIN
     sequence:  AAAGCTCTTGGGTTTT
     (Hint, put that sequence into a file, and then run REFORMAT on it).
     Now look at the resulting file (TYPE) - does the line with a ".."
     indicate that this is protein?  Figure out the correct operation
     to make this sequence into a GCG protein sequence file.
     (Note, get in the habit of naming GCG protein sequence files
     whatever.pep, and GCG nucleic sequence files whatever.seq -
     that way you can more easily keep track of them.)

6D.  Use the GCG program SEQED to edit this sequence - put a P on the end
     and change the first A to an S.  What happens if you leave the program
     with a QUIT, and what happens when you leave with an EXIT?  (Look
     at the edited file to see).