Phylogenetic Analysis by Maximum Likelihood (PAML)

Version 3.1, July 2001

Ziheng Yang (z.yang@ucl.ac.uk)


Table of contents

Introduction

PAML is a program package for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes as well as PowerMAC and Windows 95/NT executables are provided.

This document is about downloading and compiling PAML and getting started. See the manual (pamlDOC.pdf) for more information about running programs in the package.

Possible uses of the programs are

A summary of the types of analyses performed by different programs in the package is given below.

What does PAML do?

PAML is not good for tree making. There are a few options for heuristic tree search, but they do not work well except for small data sets of only a few species. If you hope to use PAML to compare trees from relatively large data sets, one possibility is to get a collection of candidate trees and then compare them using more sophisticated models implemented in PAML. You can get candidate trees by using other programs/methods implemented in PAUP*, PHYLIP, MOLPHY etc.

PAML may be useful if you are interested in the process of sequence evolution. The two main programs, baseml and codeml, implement a number of sophisticated models, which you can use to construt likelihood ratio tests of evolutionary hypotheses. Right now, the following options/models do not seem available in other packages.

Downloading and Compiling PAML

PAML download files are at ftp://abacus.gene.ucl.ac.uk/pub/paml/.

UNIX or other systems for workstations and mainframes. Get the compressed archive paml*.*.tar.Z, where *.* represents the version numbers. This archive contains the source codes, example data files, control files, and documentation. You log on to your UNIX account, and create a directory to hold the paml files. You then use the ftp command in UNIX to get the archive:

  • ftp abacus.gene.ucl.ac.uk
    (use anonymous for username and your email for passwd)
    cd pub/paml
    ls
    bin
    get paml*.*.tar.Z
    quit
  • You then use the following commands to unpack and decode the archive.

  • uncompress paml*.*.tar.Z
    tar xf paml*.*.tar

    make
  • The make command is to compile the programs. Programs in PAML are written in ANSI C and should be compilable with any ANSI C-compatible compiler. You may need to change a few flags at the beginning of Makefile. For example, if the compiler complains about -fast, you can change it into -O2 or -O3. If you have gcc rather than cc, you make that change too. You can read the mannual pages for your compiler (say cc) by typing

    man cc

    The page will be very long, but look for options for optimization. The -lm switch forces a link to the math library. With some compilers or installations, this switch is unnecessary but using it does not harm. With some others, the switch causes compiling errors and has to be removed. Look at the readme.txt file in the paml/src directory.

    PowerMacs (PPC or G3). Download the compressed self-extracting archive for the PowerMac. When you get the archive by fetch, the file will change its name into paml*.*.G3.sea. Move it to a subdirectory on your hard disk and double-click on it so that it explodes into a directory containing all files in the package (source codes, example data files, control files, documentation, and executables). When you run the PowerMac programs, a command-line window will pop up. You can then type in the name of the control file. You can also hit Enter to use the default control file names (baseml.ctl for baseml and basemlg, codeml.ctl for codeml). The sequence data files and tree structure files do not have fixed names and can be specified in the control files. Thanks to Andrew Rambaut for preparing the archive.

    Windows 95/98/NT/2000. Download the compressed archive for Windows. This has all the files for the package (source codes, example data files, control files, documentation, and executables). This is a self-extracting archive. You can run it and it will explode into a directory with all the files. Programs in the package (the .EXE files) are simple Win32 Console applications, and do not support mice or menues. When you run a program, open a "command prompt" box and type the name of the program rather than double-clicking the program name from Windows Explorer.

    Running Programs in PAML

    The programs in distribution are essentially the copies I work on every day, as I make only minor changes before release to the public. So the programs are not always well tested. Models that I have never used myself, even it they look sensible or possible from options in the control file, should be taken with great caution. I have included example data sets that were used in our papers for the purpose of error checking. You are encouraged to duplicate our analysis first to check that the program works and also to get familiar with the format of the data file and the interpertation of results.

    Programs baseml and codeml estimate parameters and calculate the log likelihood values, but do not calcualte the likelihood ratio statistics. You need to do the subtraction yourself. The theory is like this. If a more-general model involves p parameters and has log likelihood l1, and a simpler model (which is a special case of the general model) has q parameters with log-likeliood value l0, then 2(l1 - l0) can be comared with a chi-square distribution with d.f. = p - q. Suppose we want to test whether the transition/transversion rate ratio kappa = 1. We run the JC69 model and get l0, and run K80 to get l1. Then we compare 2(l1 - l0) with the chi-square distribution with one degree of freedom.

    Running PAML. Most programs in the PAML package have control files that specify the names of the sequence data file, the tree structure file, and models and options for the analysis. The default control files are baseml.ctl for baseml and basemlg, codeml.ctl for codeml, pamp.ctl for pamp, mcmctree.ctl for mcmctree. The progam evolver does not have a control file, and uses a simple user interface. All you do is to type evolver and then choose the options. For other programs, you should prepare a sequence data file and a tree structure file, and modify the appropriate control files before running the programs. The formats of those files are detailed in the documentation in the package.

    You need to prepare a sequence data file (e.g., brown.nuc) and modify the options in the appropriate control file. If you have chosen runmode = 0 or 1 in the control file, which means that the tree topologies are specified, you also need to prepare a tree structure file (e.g., trees.4s). On UNIX or Windows systems, you run the programs from a command prompt by

    ProgramName [for example, baseml]

    or

    ProgramName ControlFileName

    On the Mac, you simply click on the program name or icon. You can do this on a Windows machine too, but it is better if you open a command box and run the program from there.

    History and Bug Fixes

    Update history and bug fixes collected here.

    Your suggestions and criticisms are appreciated. Send a message to z.yang@ucl.ac.uk.

    When reporting problems, please mention the version number of the package you use (for example, 3.0c for UNIX) and include a copy of the control file (baseml.ctl or codeml.ctl). Please let me know exactly what happened and when, and inlcude screen output generated by the program, especially the last few lines on the screen. I would also like to know the number of sequences and the sequence length in case the problem has to do with the size of the data set.

    Try to provide enough information for me to understand and reproduce the problem. The most frustrating email I get says "PAML does not work on my data. Can you help?", without any explanation about what the problem is.


    Last modified by Ziheng Yang
    15 July, 2001