To: GCG users
From: Steve Hardies
4/2/02
As Bo Demeler has just circulated, the GCG Wisconsin package of sequence analysis programs is about to be switched from the DEC Alpha implementation on gcg.v19.uthscsa.edu to the bioinformatics VALinux server, bioinf.uthscsa.edu. This will give faster speed and consolidate systems for better management and security. It will, however, require some adjustments on the part of users. This is a summary of some issues for users to aid in the switch over.
Summary:
SeqWeb service access changed; still a weak cousin to command line GCG + Seqlab.
See Bo's memo for how to obtain new username/password on bioinf, and for scheduled movement of files.
For those of you who search databases on GCG, READ THE FOLLOWING CAREFULLY: all protein databases that were present in the gcg.v19.uthsca.edu implementation are ported to bioinfo.uthscsa.edu. However, they are no more up-to-date than they were on gcg.v19.uthscsa.edu. The release dates for the various databases is appended to this message. Be reminded that it is a bad idea to search local databases for reasons other than retrieving long established reference sequences because local databases lag behind the international databases, in this case by about a year. Instead, use netblast from within command line GCG or from within SeqWeb to search at NCBI directly, or go straight to search sites at ncbi, pir, expasy, or pfam. You can find links to these sites at http://biochem.uthscsa.edu/~hs_lab/bkmk.html.
The pfam hmm model database that was advertised but missing in the gcg.v19.uthscsa.edu implementation is still missing.
Nucleic acid databases that have been ported include EST databases and EMBL without HTG (high throughput genome) databases. The GenBank nucleic acid and HTG databases that were present in the gcg.v19.uthscsa.edu system have not been ported because of space limitations. Because of the missing HTG sequences and because of confusion in how gcg has distributed sequences among EMBL, GenBank, and HTG, IT IS DOUBLY IMPERATIVE NOT TO TRUST THE LOCAL GCG DATABASES FOR RETRIEVING GENOMIC DNA SEQUENCES.
Note: The command line version, but not SeqWeb allows you to create personal databases, which could include the missing updates, space permitting. Bo is looking into some more disk space and a daily autoupdate procedure, so perhaps the local databases will be brought up to spec. at some future time. At this time, the only true reason I know to search a local protein database is if you are building a protein family hmm model. I know of no reason to want to search a local nucleotide database, unless you are testing your own search algorithm.
New login procedure:
Point your browser to https://gcg.uthscsa.edu (will not be active until 4/5 after 5:00 pm)
Select SeqWeb from the menu
Then click on the word "SeqWeb" that is displayed in the middle of the blue swirly pattern.
Make note of https:// not http://; The s causes encryption during transfer and protects your password from being intercepted. You'll also notice that you will have to enter your password a number of times during a session, so if your browser can support automatic entry of passwords - try it.
http://gcg.v19.uthscsa.edu and http://gcg.uthscsa.edu will cease to work Friday 4/5 at 5:00 pm
The switch is accompanied by an upgrade from GCG version 10.1 to version 10.3. No new major functions appear to have been added to SeqWeb, however:
Netscape now works along with Internet Explorer. Netscape is, however, not fully supported by Accelyrs. The only Netscape-specific bug I've noticed so far is that the job clock on each job submission page just keeps running indefinitely. Ignore the clock and click on the job manager to see when the job completes.
SeqWeb lets you get by with nothing but a web browser, but at the cost of leaving out many of the functions. Below is a partial list of things you can do with GCG but not the simplified SeqWeb interface:
Find out the release dates of the local databases.
PsiBlast
Correct multiple alignment errors prior to further analysis.
Make valid trees with PAUP.
Construct hmm models of protein families.
Construct updated or customized sequence databases.
Statistical testing through Monte Carlo Methods.
Interchange among a variety of sequence formats and prepare sequences for use by other software
Access to command line GCG, and its graphics user interface named SeqLab will no longer be permitted by telnet and ftp. In their place, login access will require a secure shell (ssh) client on your machine, and transferring files from the outside in will require secure copy (scp). ftp and telnet clients are still available from within the bioinformatics commandline, but telnet or ftp connections to bioinformatics will no longer work. Secure shell and secure copy encrypt passwords during transmission and protect against some common kinds of hacker attacks. These clients are generally native on unix installations. From the usual unix client, if your bioinf username is not the same as that already known by the client, then generally ssh -l <username> bioinf.uthscsa.edu or ssh username@bioinf.uthscsa.edu is required to make a connection. (That's an "el" in the first form shown). Answer "yes" when asked if to accept a connection from bioinf.uthscsa.edu. For most clients, the entire word "yes" will need to be typed in, a simple "y" wil not work.
From a Windows or Mac platform, you will need to install a client, but there is freeware available for both. Each of the below now comes with scp support included. For details and freeware downloads of Mac or Windows clients, please visit:
http://www.bioinformatics.uthscsa.edu
click on "Documentation" and select "ssh/scp telnet/ftp replacements " from the menu.
To start gcg, type "gogcg" at the command line. Your home directory, username and password for command line gcg will be the same as for any other bioinf service you use,
If you use both SeqWeb and command line GCG or other bioinf services, your username and password for SeqWeb is the same as for command line gcg and other bioinf services. Although SeqWeb maintains a separate directory structure for its saved files, Bo can set a symbolic link in your bioinf home directory to allow direct access to the SeqWeb files. We haven't explored this much, but it would allow extracting result files from the SeqWeb directory for further processing without going through the web browser, and pushing html files into the SeqWeb result queue to allow accessing them remotely by web browser (subject to your password authentication). The sequence files in SeqWeb appear to be in some special format, so exchanging them this way may take some tinkering.
SeqLab:
Setting up the X windows interface to access SeqLab behind secure shell has some extra steps. One problem is that XDMBC queries will not work due to the firewall installed on the bioinformatics server. That means that the X windows server, say on your PC, can't automatically initiate a session by the usual XDMBC query. Another problem is that SeqLab exports graphics that some X servers can't handle. The following procedure solves both problems:
Start your X server, but do not start any sessions.
Make sure bioinf.uthscsa.edu is in the list of authorized X-hosts.
In a separate window, run your ssh client, log into bioinf, and start command line gcg.
At bioinf: export DISPLAY=<your IP>:0
Use the Linux KDE Window manager for managing your local X resources. You can start kde by typing "kde" at the command prompt under Linux.
KDE will render seqlab's output and export the result to your PC, thus simplifying the task for your X server. Additionally, the other kde desktop functions will be available. For example, you could open kde's Netscape browser for interaction with ncbi such that the downloads would go straight into your bioinf gcg directory.
Look in your X-server window for the desktop display from kde.
On the kde desktop: open a terminal window. In this window, you can execute seqlab by typing seqlab and it should pop directly up on your desktop. You can also open another terminal window and issue gcg text commands.
Be aware that in this method, you are transmitting your X data unencrypted. In principle a hacker might try to gain control of your machine by sending counterfeit bioinf IPs. More likely, if you use the kde window to log to other computers, your usernames and passwords might get intercepted since they are unencrypted.
Alternatively, if you have a unix computer, you can tunnle X through the secure shell protocol. This is the preferred method and is secure at any point in the transfer. When starting your local ssh client, simply add "-X" to the command, this will allow secure shell to tunnel all X-traffic through the secure shell connection. Example:
ssh username@bioinf.uthscsa.edu -X
As of 4/2/02, these are the versions of the installed gcg local databases:
Nucleic acid:
GenBank Release 117.0 (04/2000)
EST
and non-HTG stuff
EMBL (Abridged) Release 62.0 (03/2000)
Abridged means no HTG stuff and maybe filtering
out
of redundancies with GenBank
Protein:
GenPept
Release 117.0 (04/2000)
PIR-Protein Release 64.0
(03/2000)
NRL_3D Release 27.0 (03/2000)
SWISS-PROT
Release 36.0 (07/1998)
SP-TREMBL Release 12.0 (11/1999)
Other:
PROSITE Release 15.0 (07/1998)
Restriction Enzymes (REBASE) (05/2000)
Not pesent, although the command line version claims they
are:
GenBank HTG Release 117.0 (04/2000)
Pfam
Release 5.5 (09/2000)
Technical support see: http://www.accelrys.com/support/
Online help: % genhelp or http://www.accelrys.com/support/bio/genhelp/