ct1.dat mapc Test data 1 ct1.res mapc result from test data 1 ct2.dat mapc Test data 2 ct2.res mapc result from test data 2 lt1.dat mapl Test data 1 lt1.res mapl result from test data 1 lt2.dat mapl Test data 2 lt2.res mapl result from test data 2 readme.1st Documentation The following instructions are extracted from the Fundamentals of Sequence Analysis Course
There are three programs available for this task: DIGED, MAPC, and MAPL. These are quite old. They were written by W. Pearson, and published in: Nucleic Acids Research (1982)10:217-227.
DIGED is used for data entry. For circular data it may ask for Xoff, which is the offset of the first known cut for each enzyme from the 0.0 point - if you don't know this value enter -1.0. There must be at least one fragment in each double digest that isn't present in either of the singles.
Like this:
$ diged
1 new data
1 linear
one short name for first enzyme
two short name for second enzyme
blank line to terminate the enzyme list
7.0 3.0 / single digest pieces for "one", in Kb.
3.0 3.0 4.0 / single digest pieces for "two"
1 2 / IBEG and IEND, see below
1 3 /
3.0 3.0 1.0 3.0 / double digest pieces
6 write it out
mydata.txt the file to put it in
one line comment a one line comment
7 quit diged
Consider a case where an enzyme cuts 4 times, and makes fragments of size 1.1, 2.0, 2.5, 3.1, and 4.1. If you don't know where any of those fragments are on the original linear fragment, you would specify IBEG=1 and IEND=5. However, if you knew that 2.0 was on the left side, and 2.5 on the right, you would instead enter the data like this:
Here's an example using real data, which is supplied with Pearson's programs. This is a 5.55 Kb fragment which was singly and doubly digested with BamI, BglII, and PstI. The reason you have to copy the data file to your local directory is that the program opens it with read/write access, which would be denied on the shared file.
$!
$ copy digest:ct2.dat []ct2.dat
$ mapc
ct2.dat
0.2 2.0
test.out
$!
$ type test.out
pGT55 glutathione-s-transferase 16-Jan-81
ERROR = 0.2000 EFACT = 2.000
1764 Digestions calculated in 0.02 sec
Bam1
A 3.7600 B 1.3800 C 0.42000
Bgl2
A 5.5500
Pst1
A 4.3200 B 0.43000 C 0.39000 D 0.34000
T ERROR=0.425E-02 D ERROR=0.437E-02
Bam1 ---1----------------A----------------1-C1------B--|--1-------
Bam1 0.375 <-A-> 4.135 <-C-> 4.555 <-B->
Bgl2 -------------A------------------------2-----------|----------
Bgl2 4.267 <-A->
Pst1 -------------A------------------3-B-3-C-3D-3------|----------
Pst1 3.612 <-B-> 4.042 <-C-> 4.432 <-D-> 4.772 <-A->
Bam1 A C B
Bgl2 A
Pst1 B C D A
For each solution (there are usually many) each enzyme is represented by three lines. On the first line the cuts are shown by a number, and the fragments by letters. In the second line, the numbers represent the position of the cuts, and the letter flanked by arrows by the fragments. The third line shows the order of the fragments as entered into the program. For instance, in the data shown above, the fragments for Bam1 were: A=3.76, B=1.38, and C=0.42. The map shows cuts at positions 0.375, 4.135, and 4.555. Fragment A is between the first and second cuts, C between the second and third, and B wraps around and lies between the third and fourth.
The T error represents an error measurement for the first two enzymes, the D error represents an error measurement covering all enzymes (if only two enzymes, the values are the same). There is some form of normalization applied to these though, so that either T or D might be larger in any given solution. When the programs ask for Error and Efact what it wants are the upper limits for the T and D errors. Here is another solution which has slightly higher error values. The only difference is that the Bgl2 site is moved 141 bases.
T ERROR=0.550E-02 D ERROR=0.437E-02 Bam1 ---1----------------A----------------1-C1------B--|--1------- Bam1 0.375 <-A-> 4.135 <-C-> 4.555 <-B-> Bgl2 --------------A------------------------2----------|---------- Bgl2 4.408 <-A-> Pst1 -------------A------------------3-B-3-C-3D-3------|---------- Pst1 3.612 <-B-> 4.042 <-C-> 4.432 <-D-> 4.772 <-A-> Bam1 A C B Bgl2 A Pst1 B C D A
It can take a relatively long time to figure out a complicated map (or it may fail altogether if the accuracy of the fragment sizes is poor). For instance, there is a lambda digest dataset supplied with the software, it uses 5 enzymes which make 23 unique single digest fragments, and around 80 double digest fragments, and it takes 72 seconds to run.
Normally you only know the molecular weights of your fragments to an accuracy of perhaps 2-5%. Unless the analysis returns a single clear result, with no reasonable alternative solutions, it is a worthwhile exercise to repeat the analysis using slightly different values for the molecular weights. That is, intead of 3.1,2.1, and 5.1, you might use 3.0, 2.2, and 5.0. By comparing the results you should get a feel for the "stability" of the solutions. It will also help you determine which other data you may need to obtain in order to resolve the map. For instance, you might find that you need to put one or more cuts inside a particular fragment in order to nail down its true position.
The MAPC and MAPL programs are coded such that they cannot handle a digest with one enzyme that creates more than 10 fragments.