Fundamentals of Sequence Analysis, 1998-1999
Problem set 1:  Computing basics

Problem group 1.  Logging in

1A.  Which version of the Genbank database is available locally?

  GenBank          109.0    10/98

1B.  What does the GCG program "DIVERGE" do?



  you learned about GENHELP, and then did

      Diverge  measures  the  percent  divergence  of  two  protein  coding
      sequences using the method of Perler et al.

1C.  A disk block is 512 bytes.  How much disk space do you
       have available in bytes, and how many bytes can you
       put on disk before you run out of space? 

At the end of the login you will see two lines like:

  User [GROUP,USERNAME] has 99 blocks used, 19901 available,
  of 20000 authorized and permitted overdraft of 50000 blocks on USRDISK

So you can use up to 512 * 20000 = 10 Mbytes of disk space.
If you go over your diskquota you will be unable to put new files on disk, 
which will break just about all of the software, and you will see a message
about "quota exceeded".

(The overdraft is available so that a running program can go over diskquota
without being forced to fail.  If this happens, be sure to clean up after
it or you will see the standard over diskquota problems for all subsequent

Problem group 2. Commands.


It copies the file "" from the "CLASS" directory to
your default directory.


This lists all files that have been placed in the default directory since 


Display the contents of the file, one page (screenful) at a time.


Enter the HELP utility and look at information on text editors.

2E.   $ HELP @LOCAL -
        GENER OPEN -

Same as above - this demonstrates continuation lines.  Not important
here, but it would be in a very long command.


Recall the last command beginning with "HE", which is the previous
help command.

2G.   Try pushing the up/down/right/left arrows on the keyboard

Up and down move among recent commands, left and right move across them
so that they can be edited and reissued.

2H.   $ mytype :== type/page
      $ mytype

Define your own symbol for a particular action and then execute it.  If
you want some shortcuts always defined put the definitions in your file.  

Problem group 3.  Directories and files

3A.  How many files are in your directory and how much
     space do they occupy?

The answer varies.  To find out, use the command:


3B.  Print jobs can be directed to local laserprinters.  
     Issue the command:  $ SHOW QUEUE *
     What is the name of the queue that goes to your local
     laser printer?
     (If you don't see one, and you have a networked printer,
     request that one be set up for you.)

The answer varies depending on your lab.  If you were in the Zinn
lab the answer would be:  ZINN_LW.  Most labs have print queues
with a similar syntax

3C.  Do you have a LOGIN.COM file in your home directory?
     (The commands in this file run automatically when
     you login to configure your process.)  If not,
     rename (from 2A, above) and edit
     it (see 2D, above) to reflect the appropriate print
     queue for your lab.  Invoke it with the command:
       $ @login
     then verify that print jobs come out on your printer
     with the command:
       $ print     

Ok, there's no answer - it was a ruse to get you to create
a working file!

3D.  What command do you use to clean out old versions of
     files that are in your directory?  Try it now, did it

  $ PURGE 

  removes older versions of files (those with smaller version numbers). 
  The easiest way to tell if it worked, meaning, that files were deleted, 
  is to use 


  which will list the name of each file as it is deleted.

Problem group 4.  File protections

4A.  What is the protection on the files in your directory?
     (Hint, HELP DIR)

  Use the command:


  and you should see a series of lines like this:

  CMD.HTH;1            [GROUP,USERNAME]        (RWED,RWED,RE,)

4B.  What happens when you try to read a file that you don't
     have access to?  Try:  $ COPY [-.MATHOG] []

  It doesn't let you do it, and gives you this error message:

  Error opening SEQAXP$DKA200:[USERS.MATHOG]LOGIN.COM;55 as input
  Insufficient privilege or file protection violation

4C.  What do you think will happen when you block access
     to a file from the SYSTEM account?  Daily backup
     tapes are made of all user files from the SYSTEM account.
     If the user disk fails, and the files are restored from
     tape onto a replacement drive,  will a file that was
     protected from SYSTEM read access be restored to your directory?

  If you block read access to a file from a SYSTEM account you're living
  very dangerously.  The file may not be backed up on tape, and
  consequently cannot be restored.  Furthermore, you're not protecting the
  file's contents in any way since anybody with SYSTEM level privileges can
  bypass the file protection system.  (Because users sometimes do this,
  the backup process runs with BYPASS privileges.)

Problem group 5.  Data transfer

5A.  Use FTP on your PC or Macintosh. Copy
     from your account to your PC/Mac, then back to seqaxp.
     (Remember, this is a text file.)  Call the new copy
     "".  Did the transfer work correctly? 
     Look for subtle errors with this command: 

       $ DIFF

There is no answer - it should have worked.

5B.  Login to seqaxp, create a subdirectory called [.KILLME],
     and copy your file into it.  Repeat the transfer
     as in 5A, but this time against the file in the new
     subdirectory.  Again, check that the transfer did
     not change the file's content.  Now remove any files
     in the [.KILLME] directory, and then delete the 
     directory itself.

There is no answer - it should have worked.

5C.  There are numerous ways to mess up file transfers, sending ASCII
     as BINARY, or vice versa, or sending files with lines that are too
     long.  If a file that you have loaded on SEQAXP misbehaves you can
     analyze it to see what is wrong.  Issue the command:


     and look at the RMS FILE ATTRIBUTE section.  Why might this file
     cause problems for some programs?

This is the line that matters:

        Longest Record: 326

some programs may choke on lines this long.

Optional question, only for Pathworks users.

5D.  DECNET allows most OpenVMS commands to function over
     the net. This can be very convenient for moving text
     files to/from a Macintosh.  (A version for PCs also
     exists, but I don't believe that anybody here uses it.)
     Let's assume that your hard disk is called "BIGDISK"
     on the machine "MACNAME", and that you have run the NCP
     program on your Macintosh and configured it to allow
     proxy connections from your SEQAXP account.  Try this
     from your SEQAXP account: 

     $ copy desktop:

     If DECNET is working, you should see a file called
     "" on your desktop.  What command would you
     use *on SEQAXP* to view the contents of that file?
     (Hint, DESKTOP has been defined as a logical name.)

 $ type

Problem group 6.  GCG basics

6A.  Configure your graphics device appropriately (for most terminal
     emulators that is some form of Tektronix emulation).  Issue the 
     command:  $ SHOWPLOT
     What do you see?

You should see a bunch of squares, circles, and text, with the phrase
"Genetics Computer Group" in fancy text at the bottom.

6B.  What are the command line options for REFORMAT?

$ reformat/check
Reformat rewrites sequence file(s), scoring matrix file(s), or enzyme
data file(s) so that they can be read by GCG programs.

Minimal Syntax: $ REformat [/INfile=]Reformat.Txt /Default

Prompted Parameters:  None

Local Data Files:

/DATa=Translate.Txt       three-letter to one-letter codes

Optional Parameters:

/LINesize=50              sets number of characters per line
/BLOcksize=10             sets number of characters per block
/BLAnklines=1             puts blank lines between the sequence lines
/NONUMbering              suppresses numbering
/NOCOMments               suppresses comments
/DNA                      changes U into T
/RNA                      changes T into U
/UPPer                    makes all sequence characters uppercase
/LOWer                    makes all sequence characters lowercase
/LIStfile[=Reformat.List] writes a list file of output sequence names
 Press q to quit or  for more:
/MSF                      reformats sequences into an MSF output file
/DEGap                    removes gap characters (.) from the sequence
/THReeintoone             translates three-letter peptides into one-letter
/ONEIntothree             translates one-letter peptides into three-letter
/COMparison               reformats a table instead of a sequence
/ENZymedata               reformats an enzyme data file instead of a
                            sequence (used with /PROtein, reformats a
                            protein enzyme data file)
/PROtein                  insists that the sequences are reformatted as
                            protein sequences
/NUCleotide               insists that the sequences are reformatted as
                            nucleic acid sequences
/PROFile                  reformats an old profile into the new profile
/EXTension=.Seq           defines a file name extension
/TRANSlate=FileName.Txt   lets you name the output translation table
[/OUTfile=]NewSeqName     lets you name the output file
/NOMONitor                suppresses the screen trace showing each output
/BEGin           beginning of range, defaults to 1
/END             end of range, defaults to Maximum sequence length
   Use these to extract a subsequence from a sequence or MSF file.
/DELete          delete the subsequence in the range, leave the rest
/LOOKup="U.,T,"  Convert characters in first string to matching character
                 in second string.
/NODots          Assume input sequence has no ".."

Note that the SAF uses a locally modified version of REFORMAT,
and that options from "/BEGin" on are only available here.  
Commands like the following are easier than going into SEQED:

 $ reformat/infile=initial.seq/outfile=final.seq/begin=100/end=200
 $ reformat/infile=initial.seq/outfile=final.seq/begin=100/end=200/delete
 $ reformat/infile=initial.msf{*}/outfile=final.msf/msf -

The first creates an output file containing only bases 100-200
(inclusive), the creates an output file with those same bases deleted,
the third deletes a column of sequence.

6C.  Use REFORMAT to put into GCG format the following PROTEIN
     sequence:  AAAGCTCTTGGGTTTT
     (Hint, put that sequence into a file, and then run REFORMAT on it).
     Now look at the resulting file (TYPE) - does the line with a ".."
     indicate that this is protein?  Figure out the correct operation
     to make this sequence into a GCG protein sequence file.
     (Note, get in the habit of naming GCG protein sequence files
     whatever.pep, and GCG nucleic sequence files whatever.seq -
     that way you can more easily keep track of them.)

REFORMAT has to guess if this is a peptide or a nucleic acid and does so
based on composition.  In this case, it (understandably) guesses wrong: 

Killme.Seq  Length: 16  January 16, 1999 16:47  Type: N  Check: 625  ..
If it were in fact a peptide, you could have forced it to the correct
type with:

 $ reformat/infile=killme.pep/protein

6D.  Use the GCG program SEQED to edit this sequence - put a P on the end
     and change the first A to an S.  What happens if you leave the program
     with a QUIT, and what happens when you leave with an EXIT?  (Look
     at the edited file to see).

QUIT doesn't save your changes, EXIT does.  This is the same as the
OpenVMS EDT editor.