The Gibbs program typically performs well using the default parameter settings. However, the user can control some parameters optionally, as shown by the 'Usage' report (just type 'gibbs'): ---------------------------------------------------------------------- Usage: gibbs file [options] options: -n - use nucleic acid alphabet -o - use element order in probabilities -r - randomly shuffle input sequences -s - give seed for random number generator -t<#times> - maximum number of independent starts (default=10) -c<#cycles> - number of cycles between shifts (default=1) -m<#cycles> - maximum number of cycles in each run (default=500) --------------------------------------------------------------------- It is interesting to experiment with the -s, -t, -c and -m options, and the same settings should be used for random simulations (performed using the -r option) to evaluate the significance of whatever is found. Because the gibbs program uses a stachastic sampling strategy there is never any "guarantee" that it will have found the best solution. However, increasing the values for -t<#times> and -m<#cycles> and decreasing the -c<#cycles> will increase the likelihood that the program will find the best result but will also increase the run time. (Note: increasing the value of m will have little or no effect; it is probably best to just use the default setting.) These comand line options are explained more fully below. -c<#cycles> - number of cycles between shifts This sets how often the gibbs tries to escape from suboptimal local energy wells. (default=1) -t<#times> - maximum number of independent starts This reruns the program using independent starting configurations and returns the best alignment out of all of the runs. However, if the sampler finds the same alignment in two independent runs it assumes that that alignment is best and will stop immediately. (default=10) -s - give seed for random number generator This allows you to specify the seed used in the random number generated. In this way you can reproduce a run. -o - use element order in probabilities This causes the gibbs sampler to weight the selection of the motif sites toward the most likely order. Very subtle patterns can be more easily identified when there is information (obtained from the less subtle patterns) about the order of the motifs in the sequences. Note: this option only works when there is more than one type of motif and using it will NOT force an ordering on the motifs. (default=500) This program will not find the "optimum" element length automatically, but you can run the program at different widths and determine an "optimum" width using information-per-parameter (ipp) (the higher the value the better the width). Although the ipp measure is very useful to compare independent runs to each other, it can not be used to determine whether an optimum or a statistically significant solution has been found. The program uses a fasta input file format: >A28627 spoIIIC protein - Bacillus subtilis MPPLFVMNNE ILMHLRALKK TKKDVSLHDP IGQDKEGNEI SLIDVLKSEN EDVIDTIQLN MELEKVKQYI DILDDREKEV IVGRFGLDLK KEKTQREIAK ELGISRSYVS RIEKRALMKM FHEFYRAEKE KRKKAKGK The output includes the "optimum" alignment detected by the algorithm, the corresponding motif model (given as amino acid percentages), several statistics on the model, and a second "edited" alignment with weakly matching segments removed and any additional segments with significant similarity to the model included. The log-likelihood ratio statistic is a measure of the RELATIVE significance of the alignment/model. The ipp statistic is calculated by dividing the log-likelihood ratio statistic by the number of degrees of freedom. (The number of degrees of freedom is a function of motif width.) The iterations output of the program (the stderr output) can be eliminated by by using the unix shell command: (cat cmd | gibbs infile > outfile) >& junk where "outfile" will then contain the important output and "junk" the irritating stderr output. The file cmd contains the command sequence: (e.g., cmd = 1 15 y 1 ).