Fundamentals of Sequence Analysis, 1995-1996 Problem set 9: RNA folding. If you get stuck, refer to the OpenVMS and GCG resources in the class home page. References: See the GCG documentation. Problem group 1. Stem-loop structures 1A. Are there any stem/loop structures in the tet gene of pBR322 that have at least 16 hydrogen bonds in the stem? pBR322 is GB_SY:Synpbr322, use FETCH then examine the reference information to find that tet is from 86 to 1276. $ stemloop/infile=gb_sy:Synpbr322/begin=86/end=1276/bonds=16 - /menu1=3/menu2=2 $ type Synpbr322.stem STEMLOOP of: Synpbr322 check: 5483 from: 86 to: 1276 LOCUS SYNPBR322 4361 bp DNA circular SYN 29-JUN-1994 DEFINITION Cloning vector pBR322, complete genome. ACCESSION J01749 K00005 L08654 M10282 M10283 M10286 M10356 M10784 M10785 M10786 M33694 V01119 NID g208958 KEYWORDS ampicillin resistance; beta-lactamase; cloning vector; . . . Minimum Stem: 6 Minimum bonds/stem: 16.00 Maximum loop size: 20 Stems found: 3 Stems shown: 3 Average Match: 1.80 Average Mismatch: 0.00 Nibbling Threshold: 0.50 March 5, 1996 11:30 .. 413 GGCGCCA CAGGTG 7, 20.0 ||||||| C 439 CCGCGGT CGTTGG 13 906 TCGGCGA GAAGCAG 7, 18.0 ||||||| 933 GGCCGCT ATTACCG 14 385 CGCCGG ACGCATCGTG 6, 18.0 |||||| 416 GCGGCC ACTACGGCCG 20 At first glance there appear to be three. However, two of them have overlapping stems (4 bases from 413 to 416) and so could not form two simple, independent, stem loop structures at the same time. (Note that 411 - 414 is also an echinomycin binding site.) Problem group 2. RNA folding 2A. Does the tet RNA from pBR322 have a defined secondary structure? Since the RNA is quite long, we should run this as a batch job. $ mfold/infile=gb_sy:Synpbr322/begin=86/end=1276/batch/default This completed in 25 minutes on a lightly loaded system. First step, have a look at the P-num plot - see if it looks like any region is conserved. Here is that plot: Since no base has fewer than 160 different pairing partners, it is a pretty safe bet that this RNA doesn't fold up into a single conformation. 2B. Does the 3' end of the Drosophila bicoid mRNA have a defined structure? (gb_in:X14458, 1550-2456) Again, since the RNA is long, run it in batch mode $ mfold/infile=gb_in:X14458/begin=1550/end=2456/default/batch This only took 13 minutes to complete. Here is the p-num plot This is quite a different story from the preceding case - there are several regions with very low p-num values, for instance, in the regions centered around 1820,2110,2350. Probably there is some structure in this region.