Fundamentals of Sequence Analysis, 1998-1999
Problem set 9:  RNA folding.

If you get stuck, refer to the OpenVMS and GCG resources in the 
class home page.

References:
 
  See the GCG documentation.

Problem group 1.  Stem-loop structures

1A.  Are there any stem/loop structures in the tet gene of pBR322 that
     have at least 16 hydrogen bonds in the stem?


pBR322 is GB_SY:Synpbr322, use FETCH then examine the reference
information to find that tet is from 86 to 1276.

 $ stemloop/infile=gb_sy:Synpbr322/begin=86/end=1276/bonds=16 -
   /menu1=3/menu2=2
 $ type Synpbr322.stem
  STEMLOOP of: Synpbr322  check: 5483  from: 86  to: 1276

LOCUS       SYNPBR322    4361 bp    DNA   circular  SYN       29-JUN-1994
DEFINITION  Cloning vector pBR322, complete genome.
ACCESSION   J01749 K00005 L08654 M10282 M10283 M10286 M10356 M10784 M10785
            M10786 M33694 V01119
NID         g208958
KEYWORDS    ampicillin resistance; beta-lactamase; cloning vector; . . .

 Minimum Stem: 6  Minimum bonds/stem: 16.00  Maximum loop size: 20
 Stems found: 3  Stems shown: 3
 Average Match: 1.80  Average Mismatch: 0.00  Nibbling Threshold:  0.50

                           March  5, 1996 11:30  ..

    413 GGCGCCA  CAGGTG    7, 20.0
        |||||||        C
    439 CCGCGGT  CGTTGG    13

    906 TCGGCGA  GAAGCAG    7, 18.0
        |||||||
    933 GGCCGCT  ATTACCG    14

    385 CGCCGG  ACGCATCGTG    6, 18.0
        ||||||
    416 GCGGCC  ACTACGGCCG    20

At first glance there appear to be three.  However, two of them have
overlapping stems (4 bases from 413 to 416) and so could not form two
simple, independent, stem loop structures at the same time.  (Note
that 411 - 414 is also an echinomycin binding site.) 


Problem group 2.  RNA folding

2A.  Does the tet RNA from pBR322 have a defined secondary structure?


Since the RNA is quite long, we should run this as a batch job.

 $ mfold/infile=gb_sy:Synpbr322/begin=86/end=1276/batch/default

This completed in 25 minutes on a lightly loaded system.  First step,
have a look at the P-num plot - see if it looks like any region is
conserved.  Here is that plot: 



Since no base has fewer than 160 different pairing partners, it is a
pretty safe bet that this RNA doesn't fold up into a single
conformation. 


2B.  Does the 3' end of the Drosophila bicoid mRNA have a defined
     structure?  (gb_in:X14458, 1550-2456)


Again, since the RNA is long, run it in batch mode

  $ mfold/infile=gb_in:X14458/begin=1550/end=2456/default/batch

This only took 13 minutes to complete.  Here is the p-num plot



This is quite a different story from the preceding case - there are
several regions with very low p-num values, for instance, in the
regions centered around 1820,2110,2350.  Probably there is some
structure in this region. 


Problem group 3.  RNA folding with pseudoknots
It is a good idea to rerun MFold and PlotFold for
sequences which may include pseudoknots.  However, doing so is
actually a little tricky.  The problem is that if you try to force
MFold to have pseudoknots, it will ignore one of the two
helices specified, unfortunately, without giving any warning that it
is doing so.  If you try to exclude these regions with /Closedexcise,
MFold will ignore that directive in some instances.  This is a
bug of some sort.  This leaves only the /prevent qualifier,
which may be used to force the pseudoknot regions into a large,
noninteracting loops. 

3A.  Use RNABOB to analyze gendocdata:alucons.seq, then run
     MFold and PlotFold to see how removing the
     pseudoknot regions affects the rest of the fold. 

First find the regions which might have pseudoknot conformations:

$ rnabob "-sqF" rnabob:pseudoknot.des gendocdata:ALUCONS.SEQ
    64     93  Alucons.Seq
|CTTGA|GC|CCAGG|AGT|TCGAG|ACCAG|CCTGG|
   167    191  Alucons.Seq
|CAGCT|A|CTCG|GGA|GGCTG|AGG|CGGG|
   249    278  Alucons.Seq
|CTCCG|GCC|TGGG|CGA|CGGAG|CGAGAC|CCCG|

Next force those three regions not to participate in the folding: $ mfold/infile=gendocdata:alucons.seq/out=pknot.mfold - /prevent1=64,0,30 - /prevent2=167,0,25 - /prevent3=249,0,30 $ plotfold/infile=pknot.mfold/menu=f/list=1/seqheight=1.5
3B. Use PlotFold to output a connect file, manually edit it so that the pseudoknot links are present, then use Squiggles to display it. What happens? $ plotfold/infile=pknot.mfold/menu=h/list=1 Then edit pknot.mfold so that these positions are modified as shown: 64 C 63 65 83 64 65 T 64 66 82 65 66 T 65 67 81 66 67 G 66 68 80 67 68 A 67 69 79 68 71 C 70 72 93 71 72 C 71 73 92 72 73 A 72 74 91 73 74 G 73 75 90 74 75 G 74 76 89 75 79 T 78 80 68 79 80 C 79 81 67 80 81 G 80 82 66 81 82 A 81 83 65 82 83 G 82 84 64 83 89 C 88 90 75 89 90 C 89 91 74 90 91 T 90 92 73 91 92 G 91 93 72 92 93 G 92 94 71 93 The file is drawn, but the scaling is bizarre. Basically there is no way to draw a pseudoknot properly using this software. The best you can do is to modify the connect file to show only one of the two helices, then display that. If you're really desperate send the output to a Figure file and edit in the extra connections.