Fundamentals of Sequence Analysis, 1998-1999
Problem set 5:  Assembling sequences.

If you get stuck, refer to the OpenVMS and GCG resources in
the class home page. 

References:
 
 See the GCG and EGCG manuals.


Problem group 1.  Dealing with ABI sequences

Create a subdirectory and set your default directory to it. 
Issue the command: 

  $ copy class:*example*.* []

You will now see two files with ugly names.

1A.  How do you fix these names?
1B.  What might have caused these awful names?


Configure your terminal for GCG graphics.  Issue these commands:

  $ abiprintout/infile=ABI_EXAMPLE_M13F.ABI;1/begin=180/end=200
  $ abiprintout/infile=ABI_EXAMPLE_M13F.ABI;1/begin=180/end=200/pnt=500
  $ abiprintout/infile=ABI_EXAMPLE_M13F.ABI;1/begin=180/end=200/pnt=FIT

1B.  How do the two plots differ?

1C.  What two commands could be used to get the sequence
         into GCG format?
     What two commands could you use to extract a subsequence 
         (ie, trim off the cruddy sequence on the ends)?

Problem group 2.  Sequence assembly

(When you are done remember to delete the files created
during this exercise!) 

Create a sequencing project (assuming that bluescript was
the only vector used) and use gelenter to put into it these
files: 

   class:test*.seq
   class:bad*.seq
   class:rest*.seq

Assemble it.

2A.  What was wrong with bad0002.seq?
     What was wrong with bad0001.seq?


2B.  Is there a difference in the quality of the TEST* and
     REST* sequences? 

2C.  Ignoring the BAD0001 contig, there are two contigs
     covering 1164 and 1569 bases (the pieces were
     shotgunned from a fragment of size 3000). Should you
     continue with shotgun sequencing for this insert? 

2D.  Anything else you might want to do?