Fundamentals of Sequence Analysis, 1995-1996
Problem set 7:  Phylogenetic analysis.

If you get stuck, refer to the OpenVMS and GCG resources in the 
class home page.

References:
 
  See the GCG and PHYLIP documentation.

  Reviews:

      Felsenstein, J.  Phylogenies from Molecular Sequences: Inference
      and Reliability (1988)  Ann. Rev. of Genetics 22:521-565

Problem group 1.  Artificial data

Create two sets of data (STAR and FORK)  as for Problem set 2.  For the
STAR set start with PIR1:A1HU and produce sequences that differ from it by
50, 100, 150, 200, 250, 300, 350, and 400 substitutions (no indels). That
is A1HU->A50, A1HU->A100, A1HU->A150, etc.. For the FOR set,  again start
with PIR1:A1HU, but this time have each sequence in the order differ from
the preceding one by 50.  That is A1HU->A50, A50->A100, A100->A150 and so
forth. 


Align each set separately using PILEUP.  Set the gap penalty so that
no gaps are introduced.  Save the tree that results from each run.


1A.  Use the PARSIMONY method to derive a phylogenetic tree.  How do the
     trees produced by pileup and the phylogenetic tree compare?


1B.  Use the FITCH, NEIGHBOR joining, and UPGMA methods to derive
     phylogenetic trees.


Problem group 2.  Real data

2A. The phylogenetic relationships between humans and the great apes has 
    been a topic of great debate.  Solve it to your own satisfication using
    the protein sequences in Swiss-Protein.