If you get stuck, refer to the OpenVMS and GCG resources in the class home page.
Sequence data used in this problem set, for offsite readers:
You have isolated the cDNA for a protein that interests you. The predicted translation is present in CLASS:PROTEIN1.PEP.
1A. You want to determine if the protein is likely present in a subcellular fraction, which you have run on a 2D gel (SDS-PAGE/isoelectric focusing.) There are spots at:
Spot MW (kd) Isoelectric pH A 8.1 6.5 B 10.7 10.3 C 22.3 9.6 D 47.2 5.4 ( plus an assortment of pH and MW markers)
If it is there, which one is it?
$ peptidesort/infile=class:protein1.pep/enzyme=nocut $ type protein1.pepsort Find: Molecular weight = 10699.76 Isoelectric point = 10.28
Which is quite close to the "B" protein spot.
1B. You want to further confirm that the spot you see is actually the protein that you have isolated. You run a preparatory isoelectric focusing gel and isolate the band at pH 10.3, then digest it with CNBr and run the fragments out on an SDS-PAGE gel. THEN you calculate what the expected fragment sizes should have been. What happens?
Probably you kick yourself.
$ peptidesort/infile=class:protein1.pep/enzyme=CNBr $ type protein1.pepsort
CNBr doesn't cut this protein! If the protein in the gel was cut, then you have excluded the possibility that it is the protein of known sequence. However, if it didn't cut, you have not learned much. It would have been more productive to have done:
$ peptidesort/infile=class:protein1.pep/enzyme=* $ type protein1.pepsort
*before* doing the digest, to see which of the enzymes would have been more informative. Trypsin, for instance, would cut the known protein into many pieces, most of which would have been too small to resolve on the gel. In this instance, NH2OH digestions might have been best, since it would result in only two, fairly large, fragments. Or you might have used trypsin, but then put the mix through a HPLC, where you could have resolved many of the small fragments.
After a bit more work you have isolated a second protein from this complex. One pass through the protein sequencer yielded this sequence: YAACSTPQ
2A. Design a primer that will be radioactively labeled and used to screen a rat cDNA library for this sequence.
$ create 2a.pep .. YAACSTPQ ^Z $ reformat/protein/nodots 2a.pep $ backtranslate/infile1=2a.pep/infile2=gcgdatafiledirs:rat.cod - /menu=a/default $ type 2a.seq Which contains this: TAC 0.60 GCC 0.39 GCC 0.39 TGC 0.56 AGC 0.24 ACC 0.38 CCC 0.32 CAG 0.75 TAT 0.40 GCT 0.31 GCT 0.31 TGT 0.44 TCC 0.23 ACA 0.27 CCA 0.29 CAA 0.25 GCA 0.21 GCA 0.21 TCT 0.19 ACT 0.24 CCT 0.29 GCG 0.09 GCG 0.09 TCA 0.14 ACG 0.10 CCG 0.10 AGT 0.13 TCG 0.06 If you want to try hybridizing with a family of degenerate primers: TACGCAGCATGCACAACACCACAA T C C TTGC C C G T T T T T G G G G G
Unfortunately, that primer is quite degenerate, and there isn't enough sequence to try shifting one way or the other to find a better region. Otherwise, the most likely sequence, found in the bottom of that file, is:
This is unlikely to be the correct sequence, but might still stick to the right place if the stringency was low enough.
You have sequenced the cDNA clone and found an open reading frame that translates to CLASS:PROTEIN2.PEP. Assume that a database search failed to turn up any similar sequences (don't try it - this example sequence is made from bits and pieces of several proteins and will actually return hits.)
2B. What can you say about this protein?
Does it have signal sequence? $ sigcleave/infile=class:PROTEIN2.PEP/default $ type protein2.sig Apparently not. Does it have a PEST sequence? $ pestfind class:protein2.pep $ type pestfind.out Yes, several, many with very high scores. It is likely an unstable protein. Are there any antigenic regions? (You are considering making antibodies against peptides.) $ antigenic/infile=class:protein2.pep/default $ type protein2.anti Here are the top 4 scoring regions: (1) Score 1.199 length 34 at residue 167-200 * Sequence: GHSVCSTSSLYLQDLSAAASECIDPSVVFPYPLN | | 167 200 (2) Score 1.191 length 25 at residue 289-313 * Sequence: KPPHSPLVLKRCHVSTHQHNYAAPP | | 289 313 (3) Score 1.185 length 16 at residue 67-82 * Sequence: SGLCSPSYVAVTPFSL | | 67 82 (4) Score 1.184 length 12 at residue 15-26 * Sequence: DYDSVQPYFYCD | | 15 26 Are there any coiled coil regions? $! set up your GCG graphics first, ie REGIS VT241 TT $ pepcoil/infile=class:protein2.pep/default
Yes. There are two coiled-coil locations which score at probabilities of 1.0: 403-444, 672-720 Are there any helix-turn-helix regions? $ helixturnhelix/infile=class:protein2.pep/default $ type protein2.hth Apparently not - no high scoring regions were found. What other known signatures does it have? $ motifs/infile=class:protein2.pep/default $ type protein2.motifs This turns up two motifs: bZIP transcription factors basic domain signature at 670-684 Myc-type, 'helix-loop-helix' dimerization domain signature at 391-406 Any statistical anomalies in the sequence? $ readseq -f4 class:protein2.pep -oprotein2.saps $ saps -s rat -o saps.results protein2.saps $ type saps.results Summary (positive findings only) There are a couple of high scoring negative charge segments. There are beta compatible repeats (period 2) for E 5X at 253- 262 and for S 4X at 277- 284. There are alpha compatible repeats (period 7 = 3.6 x 2) for L...... 4X at 413- 440 and for L...V.. 4X at 692- 719 There are alpha compatible charge repeats (period 7) for (KRED).(neutral).... 6X at 357-398 (with 2 discrepancies in the neutral position). What is the protein's subcellular location? $ tofasta/infile=class:protein2.pep/out=protein2.fsa $ psort2 protein2.fsa --------------------------------------------------------------------------- Protein2 psg: 0.44 gvh: 0.16 alm: 0.43 top: 0.53 tms: 0.00 mit: 0.22 mip: 0.03 nuc: 0.14 erl: 0.00 erm: 0.00 pox: 0.00 px2: 0.00 vac: 0.00 rnp: 0.00 act: 0.00 caa: 0.00 yqr: 0.00 tyr: 0.00 leu: 0.00 gpi: 0.00 myr: 0.00 dna: 0.25 rib: 0.00 bac: 0.00 m1a: 0.00 m1b: 0.00 m2 : 0.00 mNt: 0.00 m3a: 0.00 m3b: 0.00 m_ : 1.00 ncn: 1.00 lps: 0.06 len: 0.17 clr: 1.00 91.3 %: nuclear 4.3 %: vacuolar 4.3 %: plasma membrane >> prediction for Protein2 is nuc (k=23) Taken together, this set of characteristics suggests that the protein is likely a DNA binding/regulating protein.
A third protein from this complex has been isolated and fully sequenced. The sequence is stored in the file CLASS:PROTEIN3.PEP, A series of mutations in this protein have been isolated, and all fall between amino acids 100 and 200.3A. What is the secondary structure in that region?
This is actually NRL_3D:1GPAA, glycogen phosphorylase (and the mutations' positions are completely made up.) Here is its real secondary structure as determined from the X-ray structure: alpha helix : 95-106, 109-115, 125-141 beta sheet : 153-154, 158-162, 165-169, 182-184, 189-200 turn : 162-165, 172-181, 184-187 Let's try a couple of the secondary structure servers and see how they do. Feed them the entire sequence, but only look at the region 100-200 in the results. $ em_phd class:protein3.pep $ nnpredict <cr> class:protein3.pep Wait around for the results to come back in the mail (For nnpredict one could also use a Web browser to connect to "http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html" then paste the PLAIN text sequence in. HEL: H=helix, E=extended (sheet), T = turn, -=other 100 110 120 130 140 150 160 170 180 190 200 | | | | | | | | | | | seq DEATYQLGLDMEELEEIEEDAGLGNGGLGRLAACFLDSMATLGLAAYGYGIRYEFGIFNQKICGGWQMEEADDWLRYGNPWEKARPEFTLPVHFYGRVEHT PHD HHHHHHHH--HHHHHHHH-------HHHHHHHHHHHHHHHHH--------EEEE---EEEE----------HHHHH------E------EEEEEEEEEEE- NN HHHHHH----HHHHHHHHHH---------HHHHHHHHHHHHHHHHH----EEEEE-------------HHH-HHHH---------------------EE-- stru HHHHHH---HHHHHHH---------HHHHHHHHHHHHHHHHH-----------EE---EEEEETTEEEEE--TTTTTTTTTTEEETTT-EEEEEEEEEEEE So, in this instance, the two predictions are relatively close, giving 4 alpha helices, and 2 common sheet regions, plus some disagreement on two other sheets. Comparison with the known structure shows that 3 of the 4 predicted helices are quite close to those found in the model, with the fourth helix corresponding to a region of turns. The sheet regions predicted by PHD picked up 4 out of 5 real sheets, nnpredict did less well with only 2 out of 5.
3B. Suppose that replacing any of the amino acids between 171 and 175 with a proline radically decreases the function of this enzyme. Propose a model based on the secondary structure predictions to explain this.
Both secondary structure prediction programs suggest that this region
is an alpha helix. Naturally, if you believed these results, you
would propose that this helix is required for enzyme activity, and
that the proline disrupts it. Unfortunately, in this case both
predictions are wrong, and so the model is incorrect. (It is not
uncommon for multiple programs to give the same wrong prediction for a
region - they are "trained" or tested against similar sets of data,
and so tend to predict the same secondary structure for the same
sequence. However, the protein may have evolved to force that region
into another secondary structure.)
When it comes to secondary structure predictions. caveat emptor!