To: GCG users

From: Steve Hardies

4/2/02

As Bo Demeler has just circulated, the GCG Wisconsin package of sequence analysis programs is about to be switched from the DEC Alpha implementation on gcg.v19.uthscsa.edu to the bioinformatics VALinux server, bioinf.uthscsa.edu. This will give faster speed and consolidate systems for better management and security. It will, however, require some adjustments on the part of users. This is a summary of some issues for users to aid in the switch over.

Summary:

What local databases are available:

For those of you who search databases on GCG, READ THE FOLLOWING CAREFULLY: all protein databases that were present in the gcg.v19.uthsca.edu implementation are ported to bioinfo.uthscsa.edu. However, they are no more up-to-date than they were on gcg.v19.uthscsa.edu. The release dates for the various databases is appended to this message. Be reminded that it is a bad idea to search local databases for reasons other than retrieving long established reference sequences because local databases lag behind the international databases, in this case by about a year. Instead, use netblast from within command line GCG or from within SeqWeb to search at NCBI directly, or go straight to search sites at ncbi, pir, expasy, or pfam. You can find links to these sites at http://biochem.uthscsa.edu/~hs_lab/bkmk.html.

The pfam hmm model database that was advertised but missing in the gcg.v19.uthscsa.edu implementation is still missing.

Nucleic acid databases that have been ported include EST databases and EMBL without HTG (high throughput genome) databases. The GenBank nucleic acid and HTG databases that were present in the gcg.v19.uthscsa.edu system have not been ported because of space limitations. Because of the missing HTG sequences and because of confusion in how gcg has distributed sequences among EMBL, GenBank, and HTG, IT IS DOUBLY IMPERATIVE NOT TO TRUST THE LOCAL GCG DATABASES FOR RETRIEVING GENOMIC DNA SEQUENCES.

Note: The command line version, but not SeqWeb allows you to create personal databases, which could include the missing updates, space permitting. Bo is looking into some more disk space and a daily autoupdate procedure, so perhaps the local databases will be brought up to spec. at some future time. At this time, the only true reason I know to search a local protein database is if you are building a protein family hmm model. I know of no reason to want to search a local nucleotide database, unless you are testing your own search algorithm.

SeqWeb

New login procedure:

The switch is accompanied by an upgrade from GCG version 10.1 to version 10.3. No new major functions appear to have been added to SeqWeb, however:

SeqWeb lets you get by with nothing but a web browser, but at the cost of leaving out many of the functions. Below is a partial list of things you can do with GCG but not the simplified SeqWeb interface:

Changes to access of command line GCG and SeqLab.

Access to command line GCG, and its graphics user interface named SeqLab will no longer be permitted by telnet and ftp. In their place, login access will require a secure shell (ssh) client on your machine, and transferring files from the outside in will require secure copy (scp). ftp and telnet clients are still available from within the bioinformatics commandline, but telnet or ftp connections to bioinformatics will no longer work. Secure shell and secure copy encrypt passwords during transmission and protect against some common kinds of hacker attacks. These clients are generally native on unix installations. From the usual unix client, if your bioinf username is not the same as that already known by the client, then generally ssh -l <username> bioinf.uthscsa.edu or ssh username@bioinf.uthscsa.edu is required to make a connection. (That's an "el" in the first form shown). Answer "yes" when asked if to accept a connection from bioinf.uthscsa.edu. For most clients, the entire word "yes" will need to be typed in, a simple "y" wil not work.

From a Windows or Mac platform, you will need to install a client, but there is freeware available for both. Each of the below now comes with scp support included. For details and freeware downloads of Mac or Windows clients, please visit:

http://www.bioinformatics.uthscsa.edu

click on "Documentation" and select "ssh/scp telnet/ftp replacements " from the menu.

To start gcg, type "gogcg" at the command line. Your home directory, username and password for command line gcg will be the same as for any other bioinf service you use,

If you use both SeqWeb and command line GCG or other bioinf services, your username and password for SeqWeb is the same as for command line gcg and other bioinf services. Although SeqWeb maintains a separate directory structure for its saved files, Bo can set a symbolic link in your bioinf home directory to allow direct access to the SeqWeb files. We haven't explored this much, but it would allow extracting result files from the SeqWeb directory for further processing without going through the web browser, and pushing html files into the SeqWeb result queue to allow accessing them remotely by web browser (subject to your password authentication). The sequence files in SeqWeb appear to be in some special format, so exchanging them this way may take some tinkering.

SeqLab:

Setting up the X windows interface to access SeqLab behind secure shell has some extra steps. One problem is that XDMBC queries will not work due to the firewall installed on the bioinformatics server. That means that the X windows server, say on your PC, can't automatically initiate a session by the usual XDMBC query. Another problem is that SeqLab exports graphics that some X servers can't handle. The following procedure solves both problems:

Be aware that in this method, you are transmitting your X data unencrypted. In principle a hacker might try to gain control of your machine by sending counterfeit bioinf IPs. More likely, if you use the kde window to log to other computers, your usernames and passwords might get intercepted since they are unencrypted.

Alternatively, if you have a unix computer, you can tunnle X through the secure shell protocol. This is the preferred method and is secure at any point in the transfer. When starting your local ssh client, simply add "-X" to the command, this will allow secure shell to tunnel all X-traffic through the secure shell connection. Example:

ssh username@bioinf.uthscsa.edu -X

As of 4/2/02, these are the versions of the installed gcg local databases:

Nucleic acid:
GenBank Release 117.0 (04/2000)
EST and non-HTG stuff
EMBL (Abridged) Release 62.0 (03/2000)
Abridged means no HTG stuff and maybe filtering
out of redundancies with GenBank
Protein:
GenPept Release 117.0 (04/2000)
PIR-Protein Release 64.0 (03/2000)
NRL_3D Release 27.0 (03/2000)
SWISS-PROT Release 36.0 (07/1998)
SP-TREMBL Release 12.0 (11/1999)
Other:
PROSITE Release 15.0 (07/1998)
Restriction Enzymes (REBASE) (05/2000)

Not pesent, although the command line version claims they are:
GenBank HTG Release 117.0 (04/2000)
Pfam Release 5.5 (09/2000)

Technical support see: http://www.accelrys.com/support/

Online help: % genhelp or http://www.accelrys.com/support/bio/genhelp/