This is the GCG (Genetics Computer Group) package beginner's FAQ.
Questions related to this FAQ.
Who wrote this?
What is GCG, what is EGCG?
What version of GCG does it apply to?
What version of the FAQ is this?
Where can I get a newer copy?
Can I give this to other users or post it on my server?
What conventions are used in this FAQ?
What other FAQs are out there?
What courses are available?
Questions related to using the computer (OpenVMS specific!!!).
How do I ...
do the standard sorts of computer things?
configure my account to set things up for me automatically?
Questions related to the GCG command line.
How do I ...
find out which program to use?
find out which options to use with each program?
tell a program to do a bunch of sequences at once?
rerun or modify the last command?
run the program in batch mode?
Confusing parts of the GCG system.
Why doesn't it work when I ...
use /infile=whatever.msf ?
use /pattern=(GGG,AAA) ?
try to get graphics out ?
try to reformat this sequence ?
try to use Seqed or Lineup?
try to modify GCG graphics on my Mac or PC?
type a long command line?
allow more gaps in PILEUP?
What databases are available locally?
How do I retrieve a sequence?
How do I design an oligonucleotide that contains a
translationally silent restriction site?
SEQED usage
How do I start SEQED?
What modes does SEQED have?
What SEQED mode am I in?
How do I learn to use SEQED?
LINEUP/GELASSEMBLE usage
How do I start LINEUP/GELASSEMBLE?
What modes does LINEUP/GELASSEMBLE have?
What LINEUP/GELASSEMBLE mode am I in?
How do I learn to use LINEUP/GELASSEMBLE?
How do I edit more than 30 aligned sequences?
WPI (the graphical user interface for GCG).
What do I need to use WPI?
Should I use WPI?
I clicked on RUN and nothing happened!
WPI won't start!
back to top of document
Who wrote this?
David Mathog
Manager, sequence analysis facility, Biology Division, Caltech
mathog@seqaxp.bio.caltech.edu
back to top of document
What is GCG, what is EGCG?
GCG stands for the "Genetics Computer Group". GCG is a
commercial package of computer programs for doing many
different analyses of nucleic acid and peptide sequences.
This is their home page.
EGCG is a set of programs that extend the GCG package. It
contains submissions from many different people and sites.
EGCG programs look and feel like GCG programs, and may be
either slightly modified versions of the original, perhaps
implementing some new command line switches, or they may be
completely new and novel programs which use the GCG function
libraries. EGCG is maintained by Peter Rice.
What version of GCG/EGCG does it apply to?
GCG 8.1 for OpenVMS/AXP, EGCG 8.1 for OpenVMS/AXP. Parts of
what is in here will apply to GCG/EGCG on Unix systems,
given the caveat that these Unix specific changes must be
made:
1. $ prompt becomes # or % (depending on the user's shell)
2. "/qualifier" becomes " -qualifier" (the space before the "-"
is mandatory on Unix, allowed, but not required, on OpenVMS).
3. - to continue a line, becomes \, and the operating system
won't generate a continuation prompt.
4. commands and filenames will be case sensitive
back to top of document
What version of the FAQ is this?
1.06 05-DEC-1996
Thanks for comments to:
Peter Woollard
Fyodor Urnov
Francois Jeanmougin
Johanne Duhamie
Caren Smith/Countway
back to top of document
Where can I get a newer copy?
http://seqaxp.bio.caltech.edu/www/gcg_beginners_faq.html
back to top of document
Can I give this to other users or post it on my server?
Yes. Please do put copies on local servers rather than just
putting in links to our copy - it will distribute the load
better.
back to top of document
What conventions are used in this FAQ?
Mostly this is just text, however, when describing command
lines, or how to use programs:
[] Indicates an optional parameter, qualifier, or value
except where it indicates part of a directory
specification or a UIC value.
{} Indicates a key name when used in text, otherwise, it
is part of the command line (verbatim).
$ Indicates a command that is often found on
OpenVMS systems that use GCG, but is not part of either.
bold text Indicates what the computer wrote to the screen.
Regular text Indicates what the user is to type/has typed.
italic text Indicates a comment - it should NOT appear on the
command line!
It is understood that each command line is terminated by a
single {RETURN} keystroke when entered interactively.
back to top of document
What other FAQs are out there?
Some of these are FAQs, some are site manuals
UK Human Genome Mapping Project, GCG Faq (Unix syntax).
Unofficial Guide to GCG Software prepared by AGRENET (OpenVMS syntax).
Biocompting Survival Guide R. Doelz (Unix syntax)
The Biocompanion Biozentrum, Basel (Unix or OpenVMS syntax)
(Since the actual Biocompanion document is retrieved via an FTP link
it is often unreachable as their server usually has no free
anonymous FTP slots.)
back to top of document
What courses are available?
Fundamentals of Sequence Analysis, Caltech 1995-1996
The course focus is on identifying, and learning to use, the
appropriate computational tools for common problems in sequence
analysis.
BioComputing Hypertext Coursebook
Thorough coverage of the theory behind sequence analysis.
GNA-VSNS Biocomputing class
Discussion class which was run in a BioMOO.
Algorithms for Molecular Biology
References and assignments only.
See also What other FAQs are out there?
back to top of document
How do I do the standard sorts of computer things?
General questions about how to use the computer are answered in the
OpenVMS beginner's FAQ and the OpenVMS Overview.
Look in these for information on how to create / delete /
edit / print / type files, or to list directories, and so
forth.
back to top of document
How do I configure my account to set things up for me automatically?
Create a file called LOGIN.COM and place it in SYS$LOGIN,
the directory you start out in when you log in. If you
already have such a file you will need to modify it. The
few command lines that follow are a typical configuration
for a person who uses Versaterm on a Macintosh, and has a
networked laser printer available (here configured to run
from the print queue "LOCAL_LW":
$! Keep the next two lines only if your site has PRINTLOCAL.COM
$ DEFINE LOCPRINT LOCAL_LW
$ PRINT_HERE:== @shared_programs:printlocal
$!
$ DEFINE SYS$PRINT LOCAL_LW
$ TK:== 'tektronix' VersaTerm-Tek4105 term
$ PS:== 'postscript' laserwriter "|printg "
$ PRINT :== PRINT/queue=LOCAL_LW
$ PRINTG :== PRINT/queue=LOCAL_LW/form=PS_PLAIN
$ TK
When this person logs in GCG will send all graphics in the
appropriate Tektronix format to Versaterm. The user can
toggle the graphics destination and graphics format during
the session by typing either:
$ TK Tell GCG to send graphics to the terminal
$ PS Tell GCG to send graphics (or other output
to the local printer. Warning: some GCG
programs may glitch and send output to the
terminal _anyway_. To get around this, use the
/FIGURE=name.figure switch, and then use the
PS command followed by the FIGURE program to send
the plot to your LaserWriter.
Other commands that will have been defined are:
$ PRINT_HERE FILENAME.EXTENSION [DELETE]
Prints the specified file locally. The printlocal
program figures out if it is PostScript or Ascii
and treats it appropriately. If the word "delete"
follows the file, then that file will be deleted
after it is printed.
$ PRINTG Print a postscript file to the local printer
$ PRINT Print a text file to the local printer
NOTE: The system manager must configure a print queue for
your local networked printer - it is not something that you
can do from your OpenVMS account. To do so, the manager
will need to know the printers name and address. (If you
rename your printer and don't tell the manager, your print
queue will cease to function!!!)
back to top of document
How do I find out which program to use?
There are several methods. GCG and EGCG supply these commands:
$ GENHELP Help on GCG programs by name
$ GENMANUAL Help on GCG programs by topic
$ EGENHELP Help on EGCG programs by name
$ EGENMANUAL Help on EGCG programs by topic
In addition, the SAF Software Documentation page has indexed all of the above,
so that they may be searched using keywords and boolean operators.
back to top of document
How do I find out which options to use with each program?
The command line options are all described in the on line
help. In addition, all GCG commands can be used like this:
$ gcg_command/check
"/check" instructs the program to put up a menu of the
available command line options and then to accept further
input. For instance, after giving the command above, one
would type (or cut and paste) the desired options in after
the prompt that it provides.
There are also a standard set of command line options
which are NOT described by /check, but which apply to pretty
much all programs, and if they are graphical in nature,
graphics devices. (Unfortunately, all of this information
is only described in the paper User's manual and are not to
be found in either genhelp or genmanual :-(. ) Many of these
command line switches also have a complementary command that
sets/unsets that option for all commands given later in the
session.
Note: Many of the following refer to platen units. The
abstract GCG graphics device is 150 platen units wide by 100
high.
/AUTOfeed For plotters, automatically feed paper
/BOX=horizontal_start,horizontal_end,vertical_start,vertical_end-
,grid_color,distance_between_frames,line_width
Draws a box on the plot.
/CHECK List command line options.
Command equivalents:
$ comcheck Like /CHECK.
$ nocincheck Inverse of /CHECK.
/CLIpping Do not draw lines that pass through
the clipping limits.
/NOCLIpping Do draw such lines.
/COLor=# Change the color of the plot, # =
1 Black
2 Green
3 Blue
4 Red
/COPies=# Print this number of copies per page.
/Default Take command line options and use
default values for everything else.
/DOClines=# Restrict the number of comment
(documentation) lines copied to an
output file. Default is 6.
Command equivalent:
$ doclines=# Like /DOClines.
/FAITHful Copy all comment documentation to
the output.
/FASt Do not plot any text
/FIGure=file.fig Override the plot configuration and
send graphics to a FIGURE file.
/FONT=# Use a different font, default is 0
(zero), a monospaced device font. Hint, using
GCG fonts is rarely worth the trouble.
/INITialize=filename.init
Read command line options for this command
out of this file.
/GRId=grid_interval,grid_color
Draw a grid *behind* the plot.
Interval in platen units, color as above.
/LINEWidth=# In percentage of a platen unit.
/NODOCumentation No program banner at run time.
Command equivalents:
$ NODOC Like /NODOC.
$ TERSE Like /NODOC.
$ DOC Inverse of /NODOC.
$ VERBOSE Inverse of /NODOC.
/NOTEXT Do not plot any text
/NOUNload Leave the paper in so that the next
graphic will write over it. Only works
on a few devices.
/PASSthru Send graphics out the printer port
on a terminal.
/PLOT=string Redirect the same graphics type to
another port, queue, or device.
$ PLOTCHECK Enable display of plot configuration
when a graphics program runs.
$ NOPLOTCHECK Inverse of PLOTCHECK.
$ PLOTTERM Disable broadcast messages at terminal -
as these can destroy plots..
$ NOPLOTTERM Enable broadcast message reception.
/PORtrait Turn the plot sideways on the page.
/PSINClude Odds are you'll never use this.
/QUIET No bells on error or otherwise.
Command equivalents:
$ NOBEEP Like /QUIET.
$ QUIET Like /QUIET.
$ BEEP Inverse of /QUIET.
$ NOISY Inverse of /QUIET.
/SCAle=# Change the scale (from 1.0).
/SPEed=# Only for HPGL plotters, penspeed
between 1.0 and 10.0.
/STADEN Input will be Staden format, not GCG.
Command equivalents:
$ SEQFORMAT STADEN Like /STADEN.
$ SEQFORMAT GCG Inverse of /STADEN.
$ V132 Set a VT terminal to a wider mode.
$ V80 Set a VT terminal to a narrower mode.
/XPAN=#
/YPAN=# Shift the plot this many platen units
/XSCAle=# Change the scale on X only (from 1.0).
/YSCAle=# Change the scale on Y only (from 1.0).
back to top of document
How do I tell a program to do a bunch of sequences at once?
Either use wild cards such as * (=match anything) or %
(=match any one character) or use a list file (containing a
list of sequences).
Here is an example using wildcards:
$ pileup/infile=my*final%.pep
would act on all matching files in the present directory,
such as myabcfinal1.pep, myabcfinal.pep, myfrodofinal.pep,
myfrodofinalz.pep and so forth.
Alternatively, with a text editor create a list file called, for instance,
myfil.fil, similar to this:
..
file1.pep
file2.pep
file3.pep
sw:1434_Maize
pir:A1HU
(Comments can be placed above the ".." line). This would
act on just the 5 sequences shown, three from disk files,
and two straight from the databases.)
Then invoke the program like this (note the @):
$ pileup/infile=@myfil.fil
List files are particularly powerful because they can be
used to iteratively with many of the search programs to
incrementally refine a list of sequences. For instance:
$ stringsearch/menu=A/strings="melanogaster"/out=pass1.fil
$ findpattern/infile=@pass1.fil/pattern="GGG"/names/out=pass2.fil
The final list, pass2.fil will contain a list of those
Genbank entries that contain both the pattern and the
keyword.
NOTE: The name of the option switch that triggers creation
of a list file is NOT standardized. Most commonly, it is
/LISt, but for BLAST,FindPatterns, and Motifs it is /NAMes,
for FASTA and related it is /NOALIGN, for Pretty and related
it is /UGLy, and no switch at all is required for
STRINGSEARCH.
back to top of document
How do I rerun or modify the last command?
If your terminal is configured properly, then you can recall
previous command lines using the up and down arrow keys.
Move on the recalled line with the left and right arrow
keys.
Note: This answer is correct for OpenVMS systems (all) and
for Unix systems if tcsh is in use.
back to top of document
How do run the program in batch mode?
Some of the GCG programs are very time intensive and should
not be run from the terminal, or if in WPI, as a subprocess.
Instead submit these to a batch queue. For most such
programs use the /batch command line option to do just this,
for instance:
$ framesearch/infile1=myseq.seq/infile2=SW:*/batch
Alternatively, you can set up command files yourself and
manually submit those to a batch queue. This is an
operating system question rather than a GCG question, so
look here for a description of the method.
back to top of document
Why doesn't /infile=whatever.msf work?
MSF (Multiple Sequence Files) must be used with a bit of
care. The way GCG handles them they are more analogous to
directories than to regular files. So, just as
"[.subdirectory]" does not specify a particular file,
"whatever.msf" does not specify a particular entry or
entries. The syntax for using an MSF file is this:
/infile=whatever.msf{*} All entries.
/infile=whatever.msf{ENTRY_NAME} Just this one entry.
/infile=whatever.msf{*o*} All entries with an "o".
Here's another form that looks like it should work, but it
doesn't:
/infile=whatever.msf{ENTRY1,ENTRY2,ENTRY3}
3 of N entries by name.
back to top of document
Why doesn't /pattern=(GGG,AAA) work?
Certain parts of the text on a command line must be enclosed
in double quotes so that OpenVMS and GCG can figure out
exactly what is meant. Here is an example:
$ findpatterns/infile=gb:*/pattern=(ggg,aaa)
Generates an error message.
$ findpatterns/infile=gb:*/pattern="(ggg,aaa)"
Find "ggg" or "aaa".
Hint: whenever a piece of text is placed between double
quotes, and it doesn't contain any single quotes, it is
passed verbatim as a single text block into the program. If
the same piece of text is not enclosed in double quotes the
command line interpreter may try to "make sense" of any
commas, periods, semicolons, brackets, spaces, and so forth.
back to top of document
Why don't graphics work?
You must set up GCG graphics before you run any
program. The default graphics setting is, by the jelly side
down rule, always the wrong one for the terminal you are
using. It is best to have these set for you at login. Relevant commands are:
$ showplot Show the current graphics setting.
$ setplot Interactive program to select from a
preset menu of graphics settings.
$ plottest Send a test page to the graphics device.
GCG supports the following types of graphics devices - each
type corresponds to a command of the same name that will
configure GCG to use that type of device:
CGM For moving to Macs and PCs, see here for more information
TEKTRONIX For many terminal emulators (like Xterm).
REGIS For many VT240 and higher terminals.
SIXEL For some LA type printing devices.
HPGL For assorted Hewlett-Packard printers and plotters.
GKS For anything with a GKS driver - GKS must be installed
on your system.
XWINDOW For X11 servers (like workstation consoles).
POSTSCRIPT For postscript printers.
After giving the "major" command shown above, you will be
prompted for details.
Note: Unfortunately, none of this is included in the on
line GCG documentation!
back to top of document
Why doesn't it work when I try to reformat this sequence?
Reformat is very powerful but it has its limits. The most
common problems are:
1. Reformat puts the comments in as part of the sequence.
Fix: Go into the file with an editor and put ".." after
the comments, before the sequence, and try reformat again.
2. Reformat says that the sequence is Protein when it is
Nucleotide, or vice versa. (NOTE: always check that it
came out right!!!)
Fix: Because GCG supports degenerate sequence codes it
cannot easily tell a protein sequence from a nucleic
acid one. Use the command line switches /PROtein or
/NUCleotide to force the issue.
3. Reformat says that the line is too long.
Fix: This happens when you transfer a text file from
another machine with no line breaks. Some operating
systems are happy with text lines that are infinitely
long. OpenVMS isn't one of them. You can either
reformat the file on the remote machine (for instance,
save as "text with line breaks") or use the program
CHOPUP on the OpenVMS machine. Then run reformat again.
4. Reformat makes a mess of Genbank files.
Fix: Don't use REFORMAT on known formats, use one of
the "FROM" commands, which exist for EMBL, FASTA,
GENBANK, IG, PIR, and STADEN formats. W. Gilbert's
program READSEQ will also do these conversions if it is
installed on your system.
back to top of document
Why doesn't it work when I try to use Seqed or Lineup
This is usually a problem with the user's terminal emulator.
Seqed and Lineup need to know about arrow keys and the
application keypad, and if the terminal emulator messes up,
so do these programs. Typically the culprit is either MacIP
or NCSA telnet. If either of those is the problem, issue
the appropriate one of these commands
$ MacIP
$ NCSA
before starting Seqed or Lineup. For more difficult
communications problems, consult your system manager.
Note. For users and system managers not at our site, here
are the fixes (OpenVMS specific, sorry Unix folks):
$ sho sym macip
MACIP == "@GENSITECOM:RESETCURSOR"
$ type GENSITECOM:RESETCURSOR.com
$! Mathog 9-Apr-1992
$! From GCG: This fixes the problem with MacIP cursor's not working
$! it is [?1l (lower case L) NOT [?11
$!
$ ESC[0,8] = %x1B
$ OPEN/WRITE TempTerm TT
$ Write TempTerm ESC + "[?1l"
$ Close TempTerm
$
$ sho sym ncsa
NCSA == "@GCGEXCOM:FIXNCSA.COM"
$ type GCGEXCOM:FIXNCSA.COM
$! FIXNCSA.COM
$! 9-APR-1992 By David Mathog
$!
$! Makes NCSA Telnet (Mac version) work correctly with Seqed/Gelassemble
$!**************************************************
$ set term/inquire
$ set term/application
back to top of document
Why doesn't it work when I try to modify GCG graphics on my Mac or PC?
GCG graphics when output to most devices draw all text
characters as a series of small line segments. This renders
the resulting figure essentially unusable on many platforms,
no matter how it is captured. For instance, Versaterm Pro
on the Macintosh can save a Tektronix document as a PICT
document. Doing so is rarely worth the trouble since the
thousands of line segments comprising the text cannot be
manipulated in any simple manner, and only the fastest
machines can even redraw the image at a reasonable rate.
So, if you want to modify GCG graphics your primary options
are:
1. Obtain and install the CGM driver for GCG graphics.
http://seqaxp.bio.caltech.edu/pub/SOFTWARE/gcgcgm.tar
or
http://seqaxp.bio.caltech.edu/pub/SOFTWARE/gcgcgm.zip
Then you can create CGM output files, which many Mac and PC
graphics programs can successfully import once you have
moved them in BINARY mode to that machine. Text will be
text, all types of lines will be lines, but due to
limitations in the GCG graphics model, curves will still be
piles of line segments, and all text will be Courier.
2. Save the graphic as a .FIGURE file. Edit and modify
that. The syntax inside .FIGURE files is quite simple. Often,
the quickest way to the desired graphics is a few cycles of
.FIGURE editing and then (re)rendering with:
$ FIGURE/infile=modified.figure
This approach is viable if you just need the figure on paper
or a slide. It isn't very useful if you need to import it
into a document, except when it can go in as encapsulated
postscript. In that one instance you can use:
$ POSTSCRIPT EPSF final.epsf
$ FIGURE/infile=modified.figure
This will render the figure document into an EPSF file.
Afterwards move that to your Mac or PC and load it into your
word processor. Most likely you will not be able to modify
it there, but it will show up properly when you print.
3. If you need to modify text another approach is required.
If you happen to have a program that can edit encapsulated
postscript, then you can set your graphics device with:
$ POSTSCRIPT EPSF final.epsf
then generate the graphic, then move it to a Mac or PC that
has the EPSF editor. In our experience very few machines
have this sort of software available.
If you use a Macintosh a better option is available - use D.
Gilbert's hypercard stack hp2pict or the program
"graphic converter" (for the Macintosh, available on many fine
FTP sites) to convert hpgl to PICT. As long as you
stick with the base font, it will come through as text
instead of as graphics. Once converted through either of
these to PICT, it will as modifiable as any other Macintosh
graphic.
A. Configure GCG graphics to go to a file in hpgl format.
$ HPGL HP7580 output.hpgl A4
Then run whatever command generates the graphic.
B. Move the file to a Mac.
C. Convert it with hp2pict or graphic converter.
D. Open your favorite draw program and read in in the PICT file.
E. Warning!!!! If you accidentally used anything other than
the base font you will have some huge vector graphics.
For intance, the PLOTTEST figure is full of these. If
you need what those vector fonts say, then the best move
it to change the order slightly:
Run whatever command generates the graphic with the
/FIGURE=temp.fig switch added.
Edit the temp.fig file, change all lines like:
.fo xx (ie, xx = 13)
to
.fo 0
Then render the figure file to an HPGL file with:
$ HPGL HP7580 output.hpgl A4
$ FIGURE/infile=temp.figure
and proceed as above with hp2pict.
back to top of document
Why doesn't it work when I type a long command line?
OpenVMS does not autowrap command lines. If a command line
is much wider than a window you should break it up onto
several lines by placing "-" on the end of each line to be
continued, like this:
$ FIGURE/infile=temp.figure -
_$ /scale=1.5 -
_$ /portrait
back to top of document
Why doesn't it work when I allow more gaps in PILEUP?
The default setting for Pileup only allows up to 2000 gaps in the total
sequence. If your sequence has more than that, you will get one of these:
*** ERROR! More than 2000 gap insertions. ***
There is a /MAXGAP=N switch, which increases the number of allowed gaps.
HOWEVER, there is an undocumented gotcha, which is that you must also
employ the /MAXSEG=M switch at the same time, such that N+M<=7000. The
default for MAXSEG is 5000, so any /MAXGAP=N, with N>2000 will fail.
back to top of document
How do I start SEQED?
This is the simplest way:
$ SEQED mysequence.seq
Sometimes you want it to do a bit more though, for instance,
tell it to highlight some restriction sites and warn you
if you are entering vector sequence:
$ SEQED/VECtor=EMBL:PBR322/SITes=GAATTC,CATTAG mysequence.seq
If your terminal gets disconnected during a session, or some
other calamity occurs before you can exit the program, all
of your edits will be found in a file called "SeqEd.Log" in
the directory where you were working. When you have
reestablished a session you can recover your work by issuing
the same SEQED command in the same directory. Conversely,
if you do NOT want to recover your work, then you should
delete this file before trying to use SeqEd in that
directory again!
back to top of document
What modes does SEQED have?
SEQED is always in one of five modes. These are:
Mode Operations in this mode
Command Help, Exit (save changes), Quit (don't save),
entering other modes, setting program
state (ie, OVERSTRIKE vs. INSERT).
Screen Editing the sequence, Reading/
writing pieces.
Heading Editing the header information (the stuff
above the ".." in GCG sequence files.)
Use the arrow keys to move up and down
and to scroll through lines.
Digitizer Entering sequence from an X-ray film of a
sequencing gel. Requires special hardware.
Comment Entering or modifying comments in the sequence.
These are the ones that show up like:
AGCT<binding site>AGCT
in a GCG sequence file. The locations of
comments are indicated by ":" rather than a "."
under the sequence.
Here's how you get into and out of the various modes:
Mode Enter (from Command) Exit command, to ()
Command (startup mode) EXIT or QUIT, (program)
Screen Change or Screen {^Z}, (Command mode)
Heading Heading {^Z}, (Screen mode)
Digitizer Digitizer Click "Keyboard" on
digitizer menu, (Screen mode)
Comment Comment {Return} or {^Z},(Screen mode)
back to top of document
What SEQED mode am I in?
Command There is a ":" at the lower left corner of the
screen, and the cursor is also on that line.
Screen The cursor will be on the sequence.
Heading There are ":" in the 2nd through 5th lines
from the top of the screen and the cursor is on
one of them.
Digitizer The Digitizer is working, the keyboard isn't.
Comment The cursor is on a line below the Heading ":"
but above the sequence. At the left of the
line will be a number corresponding to the last
sequence position of the cursor.
back to top of document
How do I learn to use SEQED?
Enter command mode and issue the command HELP{return}. Read
everything it says and then jump in and try editing a
sequence. SEQED is not difficult to use, but to use it
effectively you will have to learn a few common commands or
you will spend half your time scanning the HELP for
information.
In general though, you just enter the mode you want and type
in what should go there. Use {DELETE} to remove characters
in either comments, heading, or sequence.
back to top of document
How do I start LINEUP/GELASSEMBLE?
Here are two ways:
$ LINEUP mysequences Loads the sequences named
in mysequences.fil
$ LINEUP/MSF mysequences Loads the sequences
in mysequences.msf
The GELSTART command tells GELASSEMBLE where to look for the
sequences it will need, so just invoke it with:
$ GELASSEMBLE
back to top of document
What modes does LINEUP have?
LINEUP is always in one of four modes. These are:
Mode Operations in this mode
Command Help, Exit (save changes), Quit (don't save),
entering other modes, setting program
state (ie, OVERSTRIKE vs. INSERT).
Screen Editing the sequence, reading/
writing pieces.
Heading Editing the header information (the stuff
above the ".." in GCG sequence files.)
Use the arrow keys to move up and down
and to scroll through lines.
Comment Entering comments into the sequence.
These are the ones that show up like:
AGCT<binding site>AGCT
in a GCG sequence file.
GelAssemble has these, plus an extra mode:
Contig Effectively this is one level above
command. It selects a contig, and those
sequences are fed into the program, which
is then placed in Command mode.
Here's how you get into and out of the various modes:
Mode Enter (from Command) Exit command, to ()
Command (startup mode) EXIT or QUIT, (program)
Screen Change or Screen {^Z}, (Command mode)
Heading Heading {^Z}, (Screen mode)
Comment Comment {Return} or {^Z},(Screen mode)
In GelAssemble, when you are done editing a contig, do:
CONSENSUS to recalculate the consensus
WRITE to save this and your other edits to the
sequences in this contig
CONTIG to return to the Contig mode to select another
contig to work on.
If you don't want to save your changes, or you didn't make
any, exit the program with QUIT. Otherwise, exit with EXIT.
back to top of document
What LINEUP/GELASSEMBLE mode am I in?
Command There is a ":" at the lower left corner of the
screen, and the cursor is also on that line.
Screen The cursor will be on the sequence.
Heading There are ":" in the 2nd through 5th lines
from the top of the screen and the cursor is on
one of them.
Comment The cursor is on a line below the Heading ":"
but above the sequence. At the left of the
line will be a number corresponding to the last
sequence position of the cursor.
GelAssemble only:
Contig There is a bar graph on the screen and it says
"contig" in about 5 places. Use the arrow keys
to move up and down between contigs. Use {^K}
to load a contig and drop into the Command mode
so that you can edit it.
back to top of document
How do I learn to use LINEUP/
GelAssemble?
These are a bit more complicated to learn than is SEQED.
To get a quick start, generate (for instance, with PILEUP)
an .MSF file (or borrow one) and then run LINEUP. Enter
command mode and issue the command HELP{return}. Read
everything it says and then jump in and try editing the
sequences. In addition to the commands that worked in
Seqed, learn especially how to use the LOCK, UNLOCK, ANCHOR,
and UNANCHOR commands (all in Command mode, the first pair
controls the write locking of certain sequences, the second
pair controls grouping of sequences - when two or more
sequences are anchored together, edits and displacements
apply to all sequences in the group.)
LINEUP is not difficult to use, but to use it effectively
you will have to learn a few common commands or you will
spend half your time scanning the HELP for information.
back to top of document
How do I edit more than 30 aligned sequences?
The multiple sequence editors that come with GCG cannot
handle more than 30 sequences at once. However, experience
has shown that most of the time when somebody needs to edit
vast numbers of sequences they really only want to do simple
column operations, such as extract bases 10 through 20, or
invert those, or delete them. REFORMAT has been modified at
our site (sytem managers - code available upon request) to
perform the following extra operations:
/BEGIN
/END
/DELETE
/REVERSE
So for instance:
$ reformat/infile=big.msf{*}/outfile=frag.msf -
$ /msf/begin=10/end=20/reverse
Will create a file "frag.msf" which contains bases 10 through
20, reverse complmented.
Here's how to delete the same range:
$ reformat/infile=big.msf{*}/outfile=frag.msf -
$ /msf/begin=10/end=20/delete
These operations also apply for /infile=@list.fil and for
single files.
back to top of document
What databases are available locally?
Use the command:
$ versions
Any GCG program can use the sequences in these databases
directly - it is not generally necessary to copy them into
each user's local directory. Access can be either by
accession number or entry name. Examples:
$ mapplot/infile=gb:dmwhite/default
$ mapplot/infile=gb_in:X02974/default
back to top of document
How do I retrieve a sequence?
It depends a bit on what you want to do with the sequence
and what you know about it.
For instance, if a BLAST search has an entry GB:Z12345 then
you know the database (GENBANK) and the accession number.
(If the description is fuzzier, see below.)
Given this information you can retrieve the sequence from
the local database with:
$ fetch/infile=GB:Z12345
NOTE 1: since all GCG programs will accept the format
shown there is usually no reason to keep a local copy of
such sequences. For instance: $ mapplot/infile=GB:Z12345
NOTE 2: The abbreviations for the assorted databases are:
GB Genbank
EMBL EMBL
PIR PIR
SW SWISS-PROT
NRL_3D NRL_3D (Sequences for PDB files)
EPD Eukaryotic Promoter Database
This will leave a copy of the sequence in your current
directory in GCG format. If it fails, and it might if the
entry is newer than the local copy of the database, then try
next:
$ gopher NOTE: this is NOT a GCG program!!!
Select "database searches" Site dependent!!!
Select the database you are interested in
Enter the accession number, terminated by a
carriage-return
A list of matches will come up
Move the cursor to the one you want using the arrow
keys.
Press the {s} key (it must be lowercase).
You will be prompted for the name of the file to
save the sequence in.
To finish, use one of the GCG "FROMxxxxx"
commands to convert the file to GCG format. For
instance, if the file was saved in "tempfile.gb":
$ FROMGENBANK/infile=tempfile.gb/outfile=z12345.seq
Many, many, many other methods are available for retrieving
sequences over the Internet. Most leave the file in its
original format on disk or in a mail message.
None of these are GCG programs, some are site specific interfaces
to remote servers!
$ Mosaic Like gopher, requires X11 server.
$ Entrez Requires X11 server.
$ clever Command line version of Entrez,
works with any terminal.
$ ncbi_retrieve Retrieve from the NCBI.
$ embl_retrieve Retrieve from EMBL.
$ fl_retrieve Retrieve from FLAT (Japan).
Fuzzy sequence specifications
If, on the other hand, you want "the E. coli sequence from
So-and-So's laboratory published last year" you have a bit
of work to do. The tools that GCG supplies to address this
problem are:
$ stringsearch Search documentation records.
$ lookup Search indices of documentation.
Lookup will only work for databases that have had indices
built for them. Still, try lookup first (since it will
likely be fastest), and the output list, if any, can be
fed directly into FETCH via:
$ FETCH/INFILE=@lookup.list
If LOOKUP fails, you are likely better off using gopher or
one of the other tools described above - many of them accept
multiple keywords and allow Boolean logic (AND/OR/NOT).
HINT. Many times the keywords in hand correspond quite well
to those found in the single line .SEQCAT files that GCG
maintains for each database. It is often fastest to use the
operating system's search utility on these relatively small
files to come up with the accession number for a sequence of
interest. For instance:
$ SEARCH/match=and pirdir:pir*.seqcat -
photosystem,liverwort,chlorophyll
******************************
GENPIRDISK:[GCGPIR]PIR1.SEQCAT;1
F2lv44 photosystem II chlorophyll a-binding protein psbC - liverwort (Marchantia polymorpha) chloroplast 473bp
Qjlv6a photosystem II chlorophyll a-binding protein psbB - liverwort (Marchantia polymorpha) chloroplast 508bp
******************************
GENPIRDISK:[GCGPIR]PIR2.SEQCAT;1
S01548 photosystem II chlorophyll a-binding protein psbB - liverwort (Marchantia polymorpha) chloroplast 508bp
S01594 photosystem II chlorophyll a-binding protein psbC - liverwort (Marchantia polymorpha) chloroplast 473bp
back to top of document
How do I design an oligonucleotide that contains a
translationally silent restriction site?
Starting with a known coding region, use the MAP command
with the /SILent switch. This will produce a restriction
map with the extra silent sites shown (along with the
sites that don't require sequence modification). The
/SILent command line option tells MAP, that in addition to
checking for enzymes the way it usually does, to also modify
the sequence at each position to accommodate each enzyme
being tested, and to accept those where the resulting
sequence changes do not alter the coded product. Often you
want to also restrict the enzymes checked to be only six
cutters, here is an example:
$ MAP/infile=my_sequence.seq/SIX/SILENT -
_$ /begin=2/end=100/outf=my_sequence.map/default
Note 1: MAP figures out which frame to translate in, with
respect to /SILENT, from the first base specified. In the
example frame 2 was coding. If /begin had NOT been
specified it would have defaulted to /begin=1, which means
frame=1, and the resulting tests for silent restriction
sites would have been incorrect!
Note 2: There need not be a translationally silent
restriction site where you want one!
back to top of document
What do I need to use WPI?
In order to use WPI you need an X11 server that can connect to
your GCG computer. For instance, Most Unix and OpenVMS
workstation consoles are X11 servers. You can also turn a
Macintosh or a PC into an X11 server by loading some
software, which is analogous to a terminal emulator.
Examples of this software (that are used at the SAF) are
MacX for the Macintosh, and Micro-Xwin for the PC.
Realistically, you also want a fairly large screen, at a
fairly high resolution. If you can't manage at least 800 x
600 on a 15" monitor it might run, but you won't like using
it.
You also want a reasonably fast computer, where "fast" here
refers primarily to graphics speed.
Note: WPI is the X11 client.
back to top of document
Should I use WPI?
Obviously, the following is just an opinion!
Probably not.
At GCG 8.1 WPI is basically a glorified menu system, and
offers no significant functionality over the command line
version. Often, it is less functional than the command line
version, for instance:
WPI lacks a GUI sequence editor analogous to SEQED or
LINEUP, and resorts to using the terminal versions of
those. (Not that there is anything wrong with those per
se, it's just that they seem glaringly out of place in a
GUI program!)
WPI lacks a simple way to retain options used in preceding
commands.
Perhaps worst of all, WPI gives the mistaken impression that
"that's all there is." Most sequence analysis facilities
offer dozens of other programs beyond GCG, most notably, the
entire EGCG package, all of which are unreachable from within WPI.
(Yes, it is possible to install other programs into WPI,
usually after considerable effort, and the most recent
EGCG releases contain instructions for doing so for EGCG.
However, out of the box, WPI will not let a user access
nonGCG programs.) When users cannot find what they want in
WPI they often assume it exists nowhere on the system.
The organization of the options within WPI is essentially
that in GENMANUAL. So use GENMANUAL when you can't remember
a program's name, then use /CHECK to see the command line
options if you can't remember those.
GCG 9.0, due out in late 1996, will merge the GDE and WPI
interfaces, which should result in a much more usable GUI.
back to top of document
I clicked on RUN and nothing happened!
As discussed in the topic above, WPI is essentially a menu system.
When you clicked on RUN something did happen, but you have to go
look to find out what.
Select the MAIN WPI window.
Pull down the WINDOWS menu and select Job Manager
This window will tell you what has happened to your
assorted jobs, which is another word for the process(es)
that are created when you click on run. If the process
blew up for some reason, you should see it here. If it
is still running it should tell you that too.
Pull down the WINDOWS menu and select Output Manager
This window will show you the result of any of your
runs, be it text or graphics. Note that by default it
will only show the results from the present session.
You can load older files in to view them, but they
won't be there automatically. You must also manually
delete files that are produced, especially the insidious
WPI_JOB_##.LOG files, which will pile up in your WPI
directory.
If you can, it is probably best to leave both of these
windows open whenever you use WPI.
back to top of document
WPI won't start!
First, make sure that you gave it the right command, try:
$ WPI
or
$ WPI/small For smaller displays.
Still didn't start? Then, one of the following is likely
the problem:
1. WPI is not installed on your system.
2. Either your process or the system as a whole
doesn't have enough virtual memory to run WPI.
3. Your X11 server is not configured properly.
4. The host machine has not been instructed where to
send your X11 sessions, ie, does not know about
your server.
5. You are out of disk space.
The first two of these can only be remedied by the system
manager, but before you bother him or her, first rule out
the final three possibilies.
These commands are OpenVMS specific!
$ SHOW QUOTA Check that you have free disk space.
Note that the rest of this is pretty standard for
debugging X11 servers and clients - nothing special about
WPI here.
$ SHOW DISPLAY Check that the host machine
knows where to send the display.
Device: WSA102:
Node: MYPC.WHEREVER.EDU
Transport: TCPIP
Server: 0
Screen: 0
If it doesn't say something like this, configure the display:
$ SET DISPLAY/CREATE/TRANSPORT=TCPIP/NODE=MYPC.WHEREVER.EDU
Test that at least one X11 client can access your X11 server:
$ RUN SYS$SYSTEM:DECW$CLOCK
If your X11 server is configured correctly, a clock should
appear on it. If not, you will get a message something like:
Xlib: connection to "_WSA104:" refused by server
Xlib: Client is not authorized to connect to Server
X Toolkit Error: Can't Open display
Message number 03AB8204
Verify that your server machine is reachable from your host with:
$ MULTINET PING mypc.wherever.edu Site specific command!
If the server machine can be reached but won't allow
connections check its security setting. Poke around, and
you will likely find a menu option something like "allow any
connections" or "restrict connections". If the server is a
Unix workstation try the command "xhost +", if an OpenVMS
workstation, check under "Security" on the "Options" menu in
the "Session Manager" window. To start with, set security wide
open, to allow connections from anywhere (at your own risk
on the workstations!) Try the clock again. If it still
doesn't work, then contact your system manager. If it does
work, then set the security on your X11 server to be a bit
more restrictive, but to still allow connections from the
WPI client.
If the server machine cannot be PING'd, contact your system
manager.
back to top of document