Fundamentals of Sequence Analysis, 1995-1996
Problem set 9:  RNA folding.

If you get stuck, refer to the OpenVMS and GCG resources in the 
class home page.

References:
 
  See the GCG documentation.

Problem group 1.  Stem-loop structures

1A.  Are there any stem/loop structures in the tet gene of pBR322 that
     have at least 16 hydrogen bonds in the stem?


pBR322 is GB_SY:Synpbr322, use FETCH then examine the reference
information to find that tet is from 86 to 1276.

 $ stemloop/infile=gb_sy:Synpbr322/begin=86/end=1276/bonds=16 -
   /menu1=3/menu2=2
 $ type Synpbr322.stem
  STEMLOOP of: Synpbr322  check: 5483  from: 86  to: 1276

LOCUS       SYNPBR322    4361 bp    DNA   circular  SYN       29-JUN-1994
DEFINITION  Cloning vector pBR322, complete genome.
ACCESSION   J01749 K00005 L08654 M10282 M10283 M10286 M10356 M10784 M10785
            M10786 M33694 V01119
NID         g208958
KEYWORDS    ampicillin resistance; beta-lactamase; cloning vector; . . .

 Minimum Stem: 6  Minimum bonds/stem: 16.00  Maximum loop size: 20
 Stems found: 3  Stems shown: 3
 Average Match: 1.80  Average Mismatch: 0.00  Nibbling Threshold:  0.50

                           March  5, 1996 11:30  ..

    413 GGCGCCA  CAGGTG    7, 20.0
        |||||||        C
    439 CCGCGGT  CGTTGG    13

    906 TCGGCGA  GAAGCAG    7, 18.0
        |||||||
    933 GGCCGCT  ATTACCG    14

    385 CGCCGG  ACGCATCGTG    6, 18.0
        ||||||
    416 GCGGCC  ACTACGGCCG    20

At first glance there appear to be three.  However, two of them have
overlapping stems (4 bases from 413 to 416) and so could not form two
simple, independent, stem loop structures at the same time.  (Note that
411 - 414 is also an echinomycin binding site.)


Problem group 2.  RNA folding

2A.  Does the tet RNA from pBR322 have a defined secondary structure?


Since the RNA is quite long, we should run this as a batch job.

 $ mfold/infile=gb_sy:Synpbr322/begin=86/end=1276/batch/default

This completed in 25 minutes on a lightly loaded system.  First step,
have a look at the P-num plot - see if it looks like any region is 
conserved.  Here is that plot:



Since no base has fewer than 160 different pairing partners, it is
a pretty safe bet that this RNA doesn't fold up into a single conformation.


2B.  Does the 3' end of the Drosophila bicoid mRNA have a defined
     structure?  (gb_in:X14458, 1550-2456)


Again, since the RNA is long, run it in batch mode

  $ mfold/infile=gb_in:X14458/begin=1550/end=2456/default/batch

This only took 13 minutes to complete.  Here is the p-num plot



This is quite a different story from the preceding case - there are several
regions with very low p-num values, for instance, in the regions centered
around 1820,2110,2350.  Probably there is some structure in this region.