GCG beginner's FAQ


This is the GCG (Genetics Computer Group) package beginner's FAQ.

Questions related to this FAQ.
  Who wrote this?
  What is GCG, what is EGCG?
  What version of GCG does it apply to?
  What version of the FAQ is this?
  Where can I get a newer copy?
  Can I give this to other users or post it on my server?
  What conventions are used in this FAQ?
  What other FAQs are out there?
  What courses are available?

 
Questions related to using the computer (OpenVMS specific!!!).
  How do I ...
    do the standard sorts of computer things?
    configure my account to set things up for me automatically?

 
Questions related to the GCG command line.
  How do I ...
    find out which program to use?
    find out which options to use with each program?
    tell a program to do a bunch of sequences at once?
    rerun or modify the last command?
    run the program in batch mode?
 
 
Confusing parts of the GCG system.
  Why doesn't it work when I ...
    use  /infile=whatever.msf  ?
    use  /pattern=(GGG,AAA)    ?
    try to get graphics out    ?
    try to reformat this sequence ?
    try to use Seqed or Lineup?
    try to modify GCG graphics on my Mac or PC?
    type a long command line?
    allow more gaps in PILEUP?
  
  What databases are available locally?
  How do I retrieve a sequence?
  How do I design an oligonucleotide that contains a 
    translationally silent restriction site?

 
SEQED usage
  How do I start SEQED?
  What modes does SEQED have?
  What SEQED mode am I in?
  How do I learn to use SEQED?

LINEUP/GELASSEMBLE usage
  How do I start LINEUP/GELASSEMBLE?
  What modes does LINEUP/GELASSEMBLE have?
  What LINEUP/GELASSEMBLE mode am I in?
  How do I learn to use LINEUP/GELASSEMBLE?
  How do I edit more than 30 aligned sequences?

 
WPI (the graphical user interface for GCG).
  What do I need to use WPI?
  Should I use WPI?
  I clicked on RUN and nothing happened!
  WPI won't start!
    
back to top of document

Who wrote this? David Mathog Manager, sequence analysis facility, Biology Division, Caltech mathog@seqaxp.bio.caltech.edu back to top of document
What is GCG, what is EGCG? GCG stands for the "Genetics Computer Group". GCG is a commercial package of computer programs for doing many different analyses of nucleic acid and peptide sequences. This is their home page. EGCG is a set of programs that extend the GCG package. It contains submissions from many different people and sites. EGCG programs look and feel like GCG programs, and may be either slightly modified versions of the original, perhaps implementing some new command line switches, or they may be completely new and novel programs which use the GCG function libraries. EGCG is maintained by Peter Rice.
What version of GCG/EGCG does it apply to? GCG 8.1 for OpenVMS/AXP, EGCG 8.1 for OpenVMS/AXP. Parts of what is in here will apply to GCG/EGCG on Unix systems, given the caveat that these Unix specific changes must be made: 1. $ prompt becomes # or % (depending on the user's shell) 2. "/qualifier" becomes " -qualifier" (the space before the "-" is mandatory on Unix, allowed, but not required, on OpenVMS). 3. - to continue a line, becomes \, and the operating system won't generate a continuation prompt. 4. commands and filenames will be case sensitive back to top of document
What version of the FAQ is this? 1.06 05-DEC-1996 Thanks for comments to: Peter Woollard Fyodor Urnov Francois Jeanmougin Johanne Duhamie Caren Smith/Countway back to top of document
Where can I get a newer copy? http://seqaxp.bio.caltech.edu/www/gcg_beginners_faq.html back to top of document
Can I give this to other users or post it on my server? Yes. Please do put copies on local servers rather than just putting in links to our copy - it will distribute the load better. back to top of document
What conventions are used in this FAQ? Mostly this is just text, however, when describing command lines, or how to use programs: [] Indicates an optional parameter, qualifier, or value except where it indicates part of a directory specification or a UIC value. {} Indicates a key name when used in text, otherwise, it is part of the command line (verbatim). $ Indicates a command that is often found on OpenVMS systems that use GCG, but is not part of either. bold text Indicates what the computer wrote to the screen. Regular text Indicates what the user is to type/has typed. italic text Indicates a comment - it should NOT appear on the command line! It is understood that each command line is terminated by a single {RETURN} keystroke when entered interactively. back to top of document
What other FAQs are out there? Some of these are FAQs, some are site manuals UK Human Genome Mapping Project, GCG Faq (Unix syntax). Unofficial Guide to GCG Software prepared by AGRENET (OpenVMS syntax). Biocompting Survival Guide R. Doelz (Unix syntax) The Biocompanion Biozentrum, Basel (Unix or OpenVMS syntax) (Since the actual Biocompanion document is retrieved via an FTP link it is often unreachable as their server usually has no free anonymous FTP slots.) back to top of document
What courses are available? Fundamentals of Sequence Analysis, Caltech 1995-1996 The course focus is on identifying, and learning to use, the appropriate computational tools for common problems in sequence analysis. BioComputing Hypertext Coursebook Thorough coverage of the theory behind sequence analysis. GNA-VSNS Biocomputing class Discussion class which was run in a BioMOO. Algorithms for Molecular Biology References and assignments only. See also What other FAQs are out there? back to top of document
How do I do the standard sorts of computer things? General questions about how to use the computer are answered in the OpenVMS beginner's FAQ and the OpenVMS Overview. Look in these for information on how to create / delete / edit / print / type files, or to list directories, and so forth. back to top of document
How do I configure my account to set things up for me automatically? Create a file called LOGIN.COM and place it in SYS$LOGIN, the directory you start out in when you log in. If you already have such a file you will need to modify it. The few command lines that follow are a typical configuration for a person who uses Versaterm on a Macintosh, and has a networked laser printer available (here configured to run from the print queue "LOCAL_LW": $! Keep the next two lines only if your site has PRINTLOCAL.COM $ DEFINE LOCPRINT LOCAL_LW $ PRINT_HERE:== @shared_programs:printlocal $! $ DEFINE SYS$PRINT LOCAL_LW $ TK:== 'tektronix' VersaTerm-Tek4105 term $ PS:== 'postscript' laserwriter "|printg " $ PRINT :== PRINT/queue=LOCAL_LW $ PRINTG :== PRINT/queue=LOCAL_LW/form=PS_PLAIN $ TK When this person logs in GCG will send all graphics in the appropriate Tektronix format to Versaterm. The user can toggle the graphics destination and graphics format during the session by typing either: $ TK Tell GCG to send graphics to the terminal $ PS Tell GCG to send graphics (or other output to the local printer. Warning: some GCG programs may glitch and send output to the terminal _anyway_. To get around this, use the /FIGURE=name.figure switch, and then use the PS command followed by the FIGURE program to send the plot to your LaserWriter. Other commands that will have been defined are: $ PRINT_HERE FILENAME.EXTENSION [DELETE] Prints the specified file locally. The printlocal program figures out if it is PostScript or Ascii and treats it appropriately. If the word "delete" follows the file, then that file will be deleted after it is printed. $ PRINTG Print a postscript file to the local printer $ PRINT Print a text file to the local printer NOTE: The system manager must configure a print queue for your local networked printer - it is not something that you can do from your OpenVMS account. To do so, the manager will need to know the printers name and address. (If you rename your printer and don't tell the manager, your print queue will cease to function!!!) back to top of document
How do I find out which program to use? There are several methods. GCG and EGCG supply these commands: $ GENHELP Help on GCG programs by name $ GENMANUAL Help on GCG programs by topic $ EGENHELP Help on EGCG programs by name $ EGENMANUAL Help on EGCG programs by topic In addition, the SAF Software Documentation page has indexed all of the above, so that they may be searched using keywords and boolean operators. back to top of document
How do I find out which options to use with each program? The command line options are all described in the on line help. In addition, all GCG commands can be used like this: $ gcg_command/check "/check" instructs the program to put up a menu of the available command line options and then to accept further input. For instance, after giving the command above, one would type (or cut and paste) the desired options in after the prompt that it provides. There are also a standard set of command line options which are NOT described by /check, but which apply to pretty much all programs, and if they are graphical in nature, graphics devices. (Unfortunately, all of this information is only described in the paper User's manual and are not to be found in either genhelp or genmanual :-(. ) Many of these command line switches also have a complementary command that sets/unsets that option for all commands given later in the session. Note: Many of the following refer to platen units. The abstract GCG graphics device is 150 platen units wide by 100 high. /AUTOfeed For plotters, automatically feed paper /BOX=horizontal_start,horizontal_end,vertical_start,vertical_end- ,grid_color,distance_between_frames,line_width Draws a box on the plot. /CHECK List command line options. Command equivalents: $ comcheck Like /CHECK. $ nocincheck Inverse of /CHECK. /CLIpping Do not draw lines that pass through the clipping limits. /NOCLIpping Do draw such lines. /COLor=# Change the color of the plot, # = 1 Black 2 Green 3 Blue 4 Red /COPies=# Print this number of copies per page. /Default Take command line options and use default values for everything else. /DOClines=# Restrict the number of comment (documentation) lines copied to an output file. Default is 6. Command equivalent: $ doclines=# Like /DOClines. /FAITHful Copy all comment documentation to the output. /FASt Do not plot any text /FIGure=file.fig Override the plot configuration and send graphics to a FIGURE file. /FONT=# Use a different font, default is 0 (zero), a monospaced device font. Hint, using GCG fonts is rarely worth the trouble. /INITialize=filename.init Read command line options for this command out of this file. /GRId=grid_interval,grid_color Draw a grid *behind* the plot. Interval in platen units, color as above. /LINEWidth=# In percentage of a platen unit. /NODOCumentation No program banner at run time. Command equivalents: $ NODOC Like /NODOC. $ TERSE Like /NODOC. $ DOC Inverse of /NODOC. $ VERBOSE Inverse of /NODOC. /NOTEXT Do not plot any text /NOUNload Leave the paper in so that the next graphic will write over it. Only works on a few devices. /PASSthru Send graphics out the printer port on a terminal. /PLOT=string Redirect the same graphics type to another port, queue, or device. $ PLOTCHECK Enable display of plot configuration when a graphics program runs. $ NOPLOTCHECK Inverse of PLOTCHECK. $ PLOTTERM Disable broadcast messages at terminal - as these can destroy plots.. $ NOPLOTTERM Enable broadcast message reception. /PORtrait Turn the plot sideways on the page. /PSINClude Odds are you'll never use this. /QUIET No bells on error or otherwise. Command equivalents: $ NOBEEP Like /QUIET. $ QUIET Like /QUIET. $ BEEP Inverse of /QUIET. $ NOISY Inverse of /QUIET. /SCAle=# Change the scale (from 1.0). /SPEed=# Only for HPGL plotters, penspeed between 1.0 and 10.0. /STADEN Input will be Staden format, not GCG. Command equivalents: $ SEQFORMAT STADEN Like /STADEN. $ SEQFORMAT GCG Inverse of /STADEN. $ V132 Set a VT terminal to a wider mode. $ V80 Set a VT terminal to a narrower mode. /XPAN=# /YPAN=# Shift the plot this many platen units /XSCAle=# Change the scale on X only (from 1.0). /YSCAle=# Change the scale on Y only (from 1.0). back to top of document
How do I tell a program to do a bunch of sequences at once? Either use wild cards such as * (=match anything) or % (=match any one character) or use a list file (containing a list of sequences). Here is an example using wildcards: $ pileup/infile=my*final%.pep would act on all matching files in the present directory, such as myabcfinal1.pep, myabcfinal.pep, myfrodofinal.pep, myfrodofinalz.pep and so forth. Alternatively, with a text editor create a list file called, for instance, myfil.fil, similar to this: .. file1.pep file2.pep file3.pep sw:1434_Maize pir:A1HU (Comments can be placed above the ".." line). This would act on just the 5 sequences shown, three from disk files, and two straight from the databases.) Then invoke the program like this (note the @): $ pileup/infile=@myfil.fil List files are particularly powerful because they can be used to iteratively with many of the search programs to incrementally refine a list of sequences. For instance: $ stringsearch/menu=A/strings="melanogaster"/out=pass1.fil $ findpattern/infile=@pass1.fil/pattern="GGG"/names/out=pass2.fil The final list, pass2.fil will contain a list of those Genbank entries that contain both the pattern and the keyword. NOTE: The name of the option switch that triggers creation of a list file is NOT standardized. Most commonly, it is /LISt, but for BLAST,FindPatterns, and Motifs it is /NAMes, for FASTA and related it is /NOALIGN, for Pretty and related it is /UGLy, and no switch at all is required for STRINGSEARCH. back to top of document
How do I rerun or modify the last command? If your terminal is configured properly, then you can recall previous command lines using the up and down arrow keys. Move on the recalled line with the left and right arrow keys. Note: This answer is correct for OpenVMS systems (all) and for Unix systems if tcsh is in use. back to top of document
How do run the program in batch mode? Some of the GCG programs are very time intensive and should not be run from the terminal, or if in WPI, as a subprocess. Instead submit these to a batch queue. For most such programs use the /batch command line option to do just this, for instance: $ framesearch/infile1=myseq.seq/infile2=SW:*/batch Alternatively, you can set up command files yourself and manually submit those to a batch queue. This is an operating system question rather than a GCG question, so look here for a description of the method. back to top of document
Why doesn't /infile=whatever.msf work? MSF (Multiple Sequence Files) must be used with a bit of care. The way GCG handles them they are more analogous to directories than to regular files. So, just as "[.subdirectory]" does not specify a particular file, "whatever.msf" does not specify a particular entry or entries. The syntax for using an MSF file is this: /infile=whatever.msf{*} All entries. /infile=whatever.msf{ENTRY_NAME} Just this one entry. /infile=whatever.msf{*o*} All entries with an "o". Here's another form that looks like it should work, but it doesn't: /infile=whatever.msf{ENTRY1,ENTRY2,ENTRY3} 3 of N entries by name. back to top of document
Why doesn't /pattern=(GGG,AAA) work? Certain parts of the text on a command line must be enclosed in double quotes so that OpenVMS and GCG can figure out exactly what is meant. Here is an example: $ findpatterns/infile=gb:*/pattern=(ggg,aaa) Generates an error message. $ findpatterns/infile=gb:*/pattern="(ggg,aaa)" Find "ggg" or "aaa". Hint: whenever a piece of text is placed between double quotes, and it doesn't contain any single quotes, it is passed verbatim as a single text block into the program. If the same piece of text is not enclosed in double quotes the command line interpreter may try to "make sense" of any commas, periods, semicolons, brackets, spaces, and so forth. back to top of document
Why don't graphics work? You must set up GCG graphics before you run any program. The default graphics setting is, by the jelly side down rule, always the wrong one for the terminal you are using. It is best to have these set for you at login. Relevant commands are: $ showplot Show the current graphics setting. $ setplot Interactive program to select from a preset menu of graphics settings. $ plottest Send a test page to the graphics device. GCG supports the following types of graphics devices - each type corresponds to a command of the same name that will configure GCG to use that type of device: CGM For moving to Macs and PCs, see here for more information TEKTRONIX For many terminal emulators (like Xterm). REGIS For many VT240 and higher terminals. SIXEL For some LA type printing devices. HPGL For assorted Hewlett-Packard printers and plotters. GKS For anything with a GKS driver - GKS must be installed on your system. XWINDOW For X11 servers (like workstation consoles). POSTSCRIPT For postscript printers. After giving the "major" command shown above, you will be prompted for details. Note: Unfortunately, none of this is included in the on line GCG documentation! back to top of document
Why doesn't it work when I try to reformat this sequence? Reformat is very powerful but it has its limits. The most common problems are: 1. Reformat puts the comments in as part of the sequence. Fix: Go into the file with an editor and put ".." after the comments, before the sequence, and try reformat again. 2. Reformat says that the sequence is Protein when it is Nucleotide, or vice versa. (NOTE: always check that it came out right!!!) Fix: Because GCG supports degenerate sequence codes it cannot easily tell a protein sequence from a nucleic acid one. Use the command line switches /PROtein or /NUCleotide to force the issue. 3. Reformat says that the line is too long. Fix: This happens when you transfer a text file from another machine with no line breaks. Some operating systems are happy with text lines that are infinitely long. OpenVMS isn't one of them. You can either reformat the file on the remote machine (for instance, save as "text with line breaks") or use the program CHOPUP on the OpenVMS machine. Then run reformat again. 4. Reformat makes a mess of Genbank files. Fix: Don't use REFORMAT on known formats, use one of the "FROM" commands, which exist for EMBL, FASTA, GENBANK, IG, PIR, and STADEN formats. W. Gilbert's program READSEQ will also do these conversions if it is installed on your system. back to top of document
Why doesn't it work when I try to use Seqed or Lineup This is usually a problem with the user's terminal emulator. Seqed and Lineup need to know about arrow keys and the application keypad, and if the terminal emulator messes up, so do these programs. Typically the culprit is either MacIP or NCSA telnet. If either of those is the problem, issue the appropriate one of these commands $ MacIP $ NCSA before starting Seqed or Lineup. For more difficult communications problems, consult your system manager. Note. For users and system managers not at our site, here are the fixes (OpenVMS specific, sorry Unix folks): $ sho sym macip MACIP == "@GENSITECOM:RESETCURSOR" $ type GENSITECOM:RESETCURSOR.com $! Mathog 9-Apr-1992 $! From GCG: This fixes the problem with MacIP cursor's not working $! it is [?1l (lower case L) NOT [?11 $! $ ESC[0,8] = %x1B $ OPEN/WRITE TempTerm TT $ Write TempTerm ESC + "[?1l" $ Close TempTerm $ $ sho sym ncsa NCSA == "@GCGEXCOM:FIXNCSA.COM" $ type GCGEXCOM:FIXNCSA.COM $! FIXNCSA.COM $! 9-APR-1992 By David Mathog $! $! Makes NCSA Telnet (Mac version) work correctly with Seqed/Gelassemble $!************************************************** $ set term/inquire $ set term/application back to top of document
Why doesn't it work when I try to modify GCG graphics on my Mac or PC? GCG graphics when output to most devices draw all text characters as a series of small line segments. This renders the resulting figure essentially unusable on many platforms, no matter how it is captured. For instance, Versaterm Pro on the Macintosh can save a Tektronix document as a PICT document. Doing so is rarely worth the trouble since the thousands of line segments comprising the text cannot be manipulated in any simple manner, and only the fastest machines can even redraw the image at a reasonable rate. So, if you want to modify GCG graphics your primary options are: 1. Obtain and install the CGM driver for GCG graphics. http://seqaxp.bio.caltech.edu/pub/SOFTWARE/gcgcgm.tar or http://seqaxp.bio.caltech.edu/pub/SOFTWARE/gcgcgm.zip Then you can create CGM output files, which many Mac and PC graphics programs can successfully import once you have moved them in BINARY mode to that machine. Text will be text, all types of lines will be lines, but due to limitations in the GCG graphics model, curves will still be piles of line segments, and all text will be Courier. 2. Save the graphic as a .FIGURE file. Edit and modify that. The syntax inside .FIGURE files is quite simple. Often, the quickest way to the desired graphics is a few cycles of .FIGURE editing and then (re)rendering with: $ FIGURE/infile=modified.figure This approach is viable if you just need the figure on paper or a slide. It isn't very useful if you need to import it into a document, except when it can go in as encapsulated postscript. In that one instance you can use: $ POSTSCRIPT EPSF final.epsf $ FIGURE/infile=modified.figure This will render the figure document into an EPSF file. Afterwards move that to your Mac or PC and load it into your word processor. Most likely you will not be able to modify it there, but it will show up properly when you print. 3. If you need to modify text another approach is required. If you happen to have a program that can edit encapsulated postscript, then you can set your graphics device with: $ POSTSCRIPT EPSF final.epsf then generate the graphic, then move it to a Mac or PC that has the EPSF editor. In our experience very few machines have this sort of software available. If you use a Macintosh a better option is available - use D. Gilbert's hypercard stack hp2pict or the program "graphic converter" (for the Macintosh, available on many fine FTP sites) to convert hpgl to PICT. As long as you stick with the base font, it will come through as text instead of as graphics. Once converted through either of these to PICT, it will as modifiable as any other Macintosh graphic. A. Configure GCG graphics to go to a file in hpgl format. $ HPGL HP7580 output.hpgl A4 Then run whatever command generates the graphic. B. Move the file to a Mac. C. Convert it with hp2pict or graphic converter. D. Open your favorite draw program and read in in the PICT file. E. Warning!!!! If you accidentally used anything other than the base font you will have some huge vector graphics. For intance, the PLOTTEST figure is full of these. If you need what those vector fonts say, then the best move it to change the order slightly: Run whatever command generates the graphic with the /FIGURE=temp.fig switch added. Edit the temp.fig file, change all lines like: .fo xx (ie, xx = 13) to .fo 0 Then render the figure file to an HPGL file with: $ HPGL HP7580 output.hpgl A4 $ FIGURE/infile=temp.figure and proceed as above with hp2pict. back to top of document
Why doesn't it work when I type a long command line? OpenVMS does not autowrap command lines. If a command line is much wider than a window you should break it up onto several lines by placing "-" on the end of each line to be continued, like this: $ FIGURE/infile=temp.figure - _$ /scale=1.5 - _$ /portrait back to top of document
Why doesn't it work when I allow more gaps in PILEUP? The default setting for Pileup only allows up to 2000 gaps in the total sequence. If your sequence has more than that, you will get one of these: *** ERROR! More than 2000 gap insertions. *** There is a /MAXGAP=N switch, which increases the number of allowed gaps. HOWEVER, there is an undocumented gotcha, which is that you must also employ the /MAXSEG=M switch at the same time, such that N+M<=7000. The default for MAXSEG is 5000, so any /MAXGAP=N, with N>2000 will fail. back to top of document
How do I start SEQED? This is the simplest way: $ SEQED mysequence.seq Sometimes you want it to do a bit more though, for instance, tell it to highlight some restriction sites and warn you if you are entering vector sequence: $ SEQED/VECtor=EMBL:PBR322/SITes=GAATTC,CATTAG mysequence.seq If your terminal gets disconnected during a session, or some other calamity occurs before you can exit the program, all of your edits will be found in a file called "SeqEd.Log" in the directory where you were working. When you have reestablished a session you can recover your work by issuing the same SEQED command in the same directory. Conversely, if you do NOT want to recover your work, then you should delete this file before trying to use SeqEd in that directory again! back to top of document
What modes does SEQED have? SEQED is always in one of five modes. These are: Mode Operations in this mode Command Help, Exit (save changes), Quit (don't save), entering other modes, setting program state (ie, OVERSTRIKE vs. INSERT). Screen Editing the sequence, Reading/ writing pieces. Heading Editing the header information (the stuff above the ".." in GCG sequence files.) Use the arrow keys to move up and down and to scroll through lines. Digitizer Entering sequence from an X-ray film of a sequencing gel. Requires special hardware. Comment Entering or modifying comments in the sequence. These are the ones that show up like: AGCT<binding site>AGCT in a GCG sequence file. The locations of comments are indicated by ":" rather than a "." under the sequence. Here's how you get into and out of the various modes: Mode Enter (from Command) Exit command, to () Command (startup mode) EXIT or QUIT, (program) Screen Change or Screen {^Z}, (Command mode) Heading Heading {^Z}, (Screen mode) Digitizer Digitizer Click "Keyboard" on digitizer menu, (Screen mode) Comment Comment {Return} or {^Z},(Screen mode) back to top of document
What SEQED mode am I in? Command There is a ":" at the lower left corner of the screen, and the cursor is also on that line. Screen The cursor will be on the sequence. Heading There are ":" in the 2nd through 5th lines from the top of the screen and the cursor is on one of them. Digitizer The Digitizer is working, the keyboard isn't. Comment The cursor is on a line below the Heading ":" but above the sequence. At the left of the line will be a number corresponding to the last sequence position of the cursor. back to top of document
How do I learn to use SEQED? Enter command mode and issue the command HELP{return}. Read everything it says and then jump in and try editing a sequence. SEQED is not difficult to use, but to use it effectively you will have to learn a few common commands or you will spend half your time scanning the HELP for information. In general though, you just enter the mode you want and type in what should go there. Use {DELETE} to remove characters in either comments, heading, or sequence. back to top of document
How do I start LINEUP/GELASSEMBLE? Here are two ways: $ LINEUP mysequences Loads the sequences named in mysequences.fil $ LINEUP/MSF mysequences Loads the sequences in mysequences.msf The GELSTART command tells GELASSEMBLE where to look for the sequences it will need, so just invoke it with: $ GELASSEMBLE back to top of document
What modes does LINEUP have? LINEUP is always in one of four modes. These are: Mode Operations in this mode Command Help, Exit (save changes), Quit (don't save), entering other modes, setting program state (ie, OVERSTRIKE vs. INSERT). Screen Editing the sequence, reading/ writing pieces. Heading Editing the header information (the stuff above the ".." in GCG sequence files.) Use the arrow keys to move up and down and to scroll through lines. Comment Entering comments into the sequence. These are the ones that show up like: AGCT<binding site>AGCT in a GCG sequence file. GelAssemble has these, plus an extra mode: Contig Effectively this is one level above command. It selects a contig, and those sequences are fed into the program, which is then placed in Command mode. Here's how you get into and out of the various modes: Mode Enter (from Command) Exit command, to () Command (startup mode) EXIT or QUIT, (program) Screen Change or Screen {^Z}, (Command mode) Heading Heading {^Z}, (Screen mode) Comment Comment {Return} or {^Z},(Screen mode) In GelAssemble, when you are done editing a contig, do: CONSENSUS to recalculate the consensus WRITE to save this and your other edits to the sequences in this contig CONTIG to return to the Contig mode to select another contig to work on. If you don't want to save your changes, or you didn't make any, exit the program with QUIT. Otherwise, exit with EXIT. back to top of document
What LINEUP/GELASSEMBLE mode am I in? Command There is a ":" at the lower left corner of the screen, and the cursor is also on that line. Screen The cursor will be on the sequence. Heading There are ":" in the 2nd through 5th lines from the top of the screen and the cursor is on one of them. Comment The cursor is on a line below the Heading ":" but above the sequence. At the left of the line will be a number corresponding to the last sequence position of the cursor. GelAssemble only: Contig There is a bar graph on the screen and it says "contig" in about 5 places. Use the arrow keys to move up and down between contigs. Use {^K} to load a contig and drop into the Command mode so that you can edit it. back to top of document
How do I learn to use LINEUP/ GelAssemble? These are a bit more complicated to learn than is SEQED. To get a quick start, generate (for instance, with PILEUP) an .MSF file (or borrow one) and then run LINEUP. Enter command mode and issue the command HELP{return}. Read everything it says and then jump in and try editing the sequences. In addition to the commands that worked in Seqed, learn especially how to use the LOCK, UNLOCK, ANCHOR, and UNANCHOR commands (all in Command mode, the first pair controls the write locking of certain sequences, the second pair controls grouping of sequences - when two or more sequences are anchored together, edits and displacements apply to all sequences in the group.) LINEUP is not difficult to use, but to use it effectively you will have to learn a few common commands or you will spend half your time scanning the HELP for information. back to top of document
How do I edit more than 30 aligned sequences? The multiple sequence editors that come with GCG cannot handle more than 30 sequences at once. However, experience has shown that most of the time when somebody needs to edit vast numbers of sequences they really only want to do simple column operations, such as extract bases 10 through 20, or invert those, or delete them. REFORMAT has been modified at our site (sytem managers - code available upon request) to perform the following extra operations: /BEGIN /END /DELETE /REVERSE So for instance: $ reformat/infile=big.msf{*}/outfile=frag.msf - $ /msf/begin=10/end=20/reverse Will create a file "frag.msf" which contains bases 10 through 20, reverse complmented. Here's how to delete the same range: $ reformat/infile=big.msf{*}/outfile=frag.msf - $ /msf/begin=10/end=20/delete These operations also apply for /infile=@list.fil and for single files. back to top of document
What databases are available locally? Use the command: $ versions Any GCG program can use the sequences in these databases directly - it is not generally necessary to copy them into each user's local directory. Access can be either by accession number or entry name. Examples: $ mapplot/infile=gb:dmwhite/default $ mapplot/infile=gb_in:X02974/default back to top of document
How do I retrieve a sequence? It depends a bit on what you want to do with the sequence and what you know about it. For instance, if a BLAST search has an entry GB:Z12345 then you know the database (GENBANK) and the accession number. (If the description is fuzzier, see below.) Given this information you can retrieve the sequence from the local database with: $ fetch/infile=GB:Z12345 NOTE 1: since all GCG programs will accept the format shown there is usually no reason to keep a local copy of such sequences. For instance: $ mapplot/infile=GB:Z12345 NOTE 2: The abbreviations for the assorted databases are: GB Genbank EMBL EMBL PIR PIR SW SWISS-PROT NRL_3D NRL_3D (Sequences for PDB files) EPD Eukaryotic Promoter Database This will leave a copy of the sequence in your current directory in GCG format. If it fails, and it might if the entry is newer than the local copy of the database, then try next: $ gopher NOTE: this is NOT a GCG program!!! Select "database searches" Site dependent!!! Select the database you are interested in Enter the accession number, terminated by a carriage-return A list of matches will come up Move the cursor to the one you want using the arrow keys. Press the {s} key (it must be lowercase). You will be prompted for the name of the file to save the sequence in. To finish, use one of the GCG "FROMxxxxx" commands to convert the file to GCG format. For instance, if the file was saved in "tempfile.gb": $ FROMGENBANK/infile=tempfile.gb/outfile=z12345.seq Many, many, many other methods are available for retrieving sequences over the Internet. Most leave the file in its original format on disk or in a mail message. None of these are GCG programs, some are site specific interfaces to remote servers! $ Mosaic Like gopher, requires X11 server. $ Entrez Requires X11 server. $ clever Command line version of Entrez, works with any terminal. $ ncbi_retrieve Retrieve from the NCBI. $ embl_retrieve Retrieve from EMBL. $ fl_retrieve Retrieve from FLAT (Japan). Fuzzy sequence specifications If, on the other hand, you want "the E. coli sequence from So-and-So's laboratory published last year" you have a bit of work to do. The tools that GCG supplies to address this problem are: $ stringsearch Search documentation records. $ lookup Search indices of documentation. Lookup will only work for databases that have had indices built for them. Still, try lookup first (since it will likely be fastest), and the output list, if any, can be fed directly into FETCH via: $ FETCH/INFILE=@lookup.list If LOOKUP fails, you are likely better off using gopher or one of the other tools described above - many of them accept multiple keywords and allow Boolean logic (AND/OR/NOT). HINT. Many times the keywords in hand correspond quite well to those found in the single line .SEQCAT files that GCG maintains for each database. It is often fastest to use the operating system's search utility on these relatively small files to come up with the accession number for a sequence of interest. For instance: $ SEARCH/match=and pirdir:pir*.seqcat - photosystem,liverwort,chlorophyll ****************************** GENPIRDISK:[GCGPIR]PIR1.SEQCAT;1 F2lv44 photosystem II chlorophyll a-binding protein psbC - liverwort (Marchantia polymorpha) chloroplast 473bp Qjlv6a photosystem II chlorophyll a-binding protein psbB - liverwort (Marchantia polymorpha) chloroplast 508bp ****************************** GENPIRDISK:[GCGPIR]PIR2.SEQCAT;1 S01548 photosystem II chlorophyll a-binding protein psbB - liverwort (Marchantia polymorpha) chloroplast 508bp S01594 photosystem II chlorophyll a-binding protein psbC - liverwort (Marchantia polymorpha) chloroplast 473bp back to top of document
How do I design an oligonucleotide that contains a translationally silent restriction site? Starting with a known coding region, use the MAP command with the /SILent switch. This will produce a restriction map with the extra silent sites shown (along with the sites that don't require sequence modification). The /SILent command line option tells MAP, that in addition to checking for enzymes the way it usually does, to also modify the sequence at each position to accommodate each enzyme being tested, and to accept those where the resulting sequence changes do not alter the coded product. Often you want to also restrict the enzymes checked to be only six cutters, here is an example: $ MAP/infile=my_sequence.seq/SIX/SILENT - _$ /begin=2/end=100/outf=my_sequence.map/default Note 1: MAP figures out which frame to translate in, with respect to /SILENT, from the first base specified. In the example frame 2 was coding. If /begin had NOT been specified it would have defaulted to /begin=1, which means frame=1, and the resulting tests for silent restriction sites would have been incorrect! Note 2: There need not be a translationally silent restriction site where you want one! back to top of document
What do I need to use WPI? In order to use WPI you need an X11 server that can connect to your GCG computer. For instance, Most Unix and OpenVMS workstation consoles are X11 servers. You can also turn a Macintosh or a PC into an X11 server by loading some software, which is analogous to a terminal emulator. Examples of this software (that are used at the SAF) are MacX for the Macintosh, and Micro-Xwin for the PC. Realistically, you also want a fairly large screen, at a fairly high resolution. If you can't manage at least 800 x 600 on a 15" monitor it might run, but you won't like using it. You also want a reasonably fast computer, where "fast" here refers primarily to graphics speed. Note: WPI is the X11 client. back to top of document
Should I use WPI? Obviously, the following is just an opinion! Probably not. At GCG 8.1 WPI is basically a glorified menu system, and offers no significant functionality over the command line version. Often, it is less functional than the command line version, for instance: WPI lacks a GUI sequence editor analogous to SEQED or LINEUP, and resorts to using the terminal versions of those. (Not that there is anything wrong with those per se, it's just that they seem glaringly out of place in a GUI program!) WPI lacks a simple way to retain options used in preceding commands. Perhaps worst of all, WPI gives the mistaken impression that "that's all there is." Most sequence analysis facilities offer dozens of other programs beyond GCG, most notably, the entire EGCG package, all of which are unreachable from within WPI. (Yes, it is possible to install other programs into WPI, usually after considerable effort, and the most recent EGCG releases contain instructions for doing so for EGCG. However, out of the box, WPI will not let a user access nonGCG programs.) When users cannot find what they want in WPI they often assume it exists nowhere on the system. The organization of the options within WPI is essentially that in GENMANUAL. So use GENMANUAL when you can't remember a program's name, then use /CHECK to see the command line options if you can't remember those. GCG 9.0, due out in late 1996, will merge the GDE and WPI interfaces, which should result in a much more usable GUI. back to top of document
I clicked on RUN and nothing happened! As discussed in the topic above, WPI is essentially a menu system. When you clicked on RUN something did happen, but you have to go look to find out what. Select the MAIN WPI window. Pull down the WINDOWS menu and select Job Manager This window will tell you what has happened to your assorted jobs, which is another word for the process(es) that are created when you click on run. If the process blew up for some reason, you should see it here. If it is still running it should tell you that too. Pull down the WINDOWS menu and select Output Manager This window will show you the result of any of your runs, be it text or graphics. Note that by default it will only show the results from the present session. You can load older files in to view them, but they won't be there automatically. You must also manually delete files that are produced, especially the insidious WPI_JOB_##.LOG files, which will pile up in your WPI directory. If you can, it is probably best to leave both of these windows open whenever you use WPI. back to top of document
WPI won't start! First, make sure that you gave it the right command, try: $ WPI or $ WPI/small For smaller displays. Still didn't start? Then, one of the following is likely the problem: 1. WPI is not installed on your system. 2. Either your process or the system as a whole doesn't have enough virtual memory to run WPI. 3. Your X11 server is not configured properly. 4. The host machine has not been instructed where to send your X11 sessions, ie, does not know about your server. 5. You are out of disk space. The first two of these can only be remedied by the system manager, but before you bother him or her, first rule out the final three possibilies. These commands are OpenVMS specific! $ SHOW QUOTA Check that you have free disk space. Note that the rest of this is pretty standard for debugging X11 servers and clients - nothing special about WPI here. $ SHOW DISPLAY Check that the host machine knows where to send the display. Device: WSA102: Node: MYPC.WHEREVER.EDU Transport: TCPIP Server: 0 Screen: 0 If it doesn't say something like this, configure the display: $ SET DISPLAY/CREATE/TRANSPORT=TCPIP/NODE=MYPC.WHEREVER.EDU Test that at least one X11 client can access your X11 server: $ RUN SYS$SYSTEM:DECW$CLOCK If your X11 server is configured correctly, a clock should appear on it. If not, you will get a message something like: Xlib: connection to "_WSA104:" refused by server Xlib: Client is not authorized to connect to Server X Toolkit Error: Can't Open display Message number 03AB8204 Verify that your server machine is reachable from your host with: $ MULTINET PING mypc.wherever.edu Site specific command! If the server machine can be reached but won't allow connections check its security setting. Poke around, and you will likely find a menu option something like "allow any connections" or "restrict connections". If the server is a Unix workstation try the command "xhost +", if an OpenVMS workstation, check under "Security" on the "Options" menu in the "Session Manager" window. To start with, set security wide open, to allow connections from anywhere (at your own risk on the workstations!) Try the clock again. If it still doesn't work, then contact your system manager. If it does work, then set the security on your X11 server to be a bit more restrictive, but to still allow connections from the WPI client. If the server machine cannot be PING'd, contact your system manager. back to top of document