Next: 2 New to this
Up: SAM (Sequence Alignment and
Previous: Contents
  Contents
Subsections
1 Introduction
The Sequence Alignment and Modeling system (SAM) is a collection of
software tools for creating, refining, and using a type of statistical
model called a linear hidden Markov model for biological sequence
analysis. Linear hidden Markov models only model primary structure
(sequence) information; long-range iterations, such as base pairing
in RNA, require more complex models such as stochastic context-free
grammars, as described by Sakakibara et. al (NAR
22(23):5112-5120), also available from the UCSC computational biology
WWW site.
The algorithms and methods have been described in several papers, some
of which are available on our WWW site,
A tutorial on the use of SAM and the iterative SAM-T98 method
(the direct predecessor of SAM-T99)
is also available at our WWW site,
SAM-T99 is used at the official SCOP Superfamily server at
http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY
The primary papers from UCSC (copies of these papers and several
others are available from the SAM WWW site) include:
- R. Hughey and A. Krogh.
Hidden Markov models for sequence analysis: Extension and analysis of
the basic method,
CABIOS, 12(2):95-107, 1996.
- K. Karplus, C. Barrett, and R. Hughey,
Hidden Markov Models
for Detecting Remote Protein Homologies,
Bioinformatics, 14(10):846-856, 1998.
- A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler.
Hidden Markov models in computational
biology: Applications to
protein modeling.
Journal of Molecular Biology, 235:1501-1531, February 1994.
- K. Karplus, R. Karchin, C. Barrett, S. Tu, M. Cline,
M. Diekhans, L. Grate, J. Casper, and R. Hughey,
``What is the value added by human intervention in protein structure
prediction?,''
Proteins: Structure, Function, and Genetics, 2001.
- R. Wheeler and R. Hughey,
Optimizing Reduced Space Sequence
Analysis,
BioInformatics, 16(12): 1082-1090, 2000.
- K. Karplus, C. Barrett, M. Cline, M. Diekhans, L. Grate, R. Hughey,
``Predicting Protein Structure using Only Sequence Information,''
Proteins: Structure, Function, and Genetics,
Supplement 3:121-125, 1999.
- J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard,
and C. Chothia,
Sequence Comparisons Using Multiple Sequences Detect
Twice as many Remote Homologues as Pairwise Methods,
Journal of
Molecular Biology, 284(4):1201-1210, 1998.
- R. Karchin and R. Hughey.
Weighting hidden Markov models for maximum discrimination
Bioinformatics, 14(9):772-782, 1998.
- C. Tarnas and R. Hughey.
Reduced space hidden Markov model training.
Bioinformatics, 14(5):401-406, 1998.
- K. Karplus, Kimmen Sjölander, C. Barrett, M. Cline,
D. Haussler, R. Hughey, L. Holm, and C. Sander,
``Predicting protein structure using hidden Markov
models,'' Proteins: Structure, Function, and Genetics,
Supplement 1, 1997.
- K. Sjolander,
K. Karplus, M. Brown, R. Hughey, A. Krogh, I.S. Mian, and D. Haussler.
Dirichlet Mixtures: A Method for Improving
Detection of Weak but Significant Protein Sequence Homology.
CABIOS
12(4):327-345, 1996.
- C. Barrett and R. Hughey and K. Karplus.
Scoring Hidden Markov Models.
CABIOS 13(2):191-199, 1997.
- J. A. Grice, R. Hughey, and D. Speck.
Reduced space sequence alignment.
CABIOS 13(1):45-53, 1997.
- D. Haussler, A. Krogh, I. S. Mian, and K. Sjölander.
Protein modeling using hidden Markov models: Analysis of globins.
In Proceedings of the Hawaii International Conference on System
Sciences, volume 1, pages 792-802, Los Alamitos, CA, 1993. IEEE Computer
Society Press.
- A. Krogh, I. S. Mian, and D. Haussler.
A hidden Markov model that finds genes in E. coli DNA.
Nucleic Acids Research, 1994.
- R. Hughey and A. Krogh.
SAM: Sequence alignment and modeling software system.
Technical Report UCSC-CRL-96-22, University of California,
Santa Cruz, CA, September 1996.
- K. Karplus. Regularizers for Estimating Distributions of
Amino Acids from Small Samples.
Technical Report UCSC-CRL-95-11, University of California,
Santa Cruz, CA, 30 March 1995.
We would appreciate references to the first article in all
work that cites or uses the SAM system, the second for all work that
cites or uses the SAM-T99 method, and the third article in work
that cites or uses HMM methods similar to SAM.
Because the software is an active research tool, there are a vast
selection of options, many of which have, through experimental study,
been set to reasonable defaults.
The SAM software and documentation copyright is held by the Regents of
the University of California. A signed license is required to obtain
a copy of SAM, downloadable from the SAM WWW site,
with no fee for educational research use. If you have
suggestions for enhancements, new ways of using SAM, or other
comments, please contact us.
SAM incorporates the readseq package by D. G. Gilbert, who allows it
to be freely copied and used. The hmmedit and sae
programs use ACEdb by Richard Durbin and Jean Thierry-Mieg. The
source code for hmmedit and sae is available from
ftp://ftp.cse.ucsc.edu/pub/protein/hmmeditsaesrc.tar.Z.
SAM includes the BLAST matrix library for use with SAM's Smith and
Waterman implementation. This work of the U. S. Government is
available at
http://www.ncbi.nlm.nig.gov/BLAST
To be informed of future releases, please send your e-mail
address to sam-info@cse.ucsc.edu for addition to our mailing
list. Please also use this address for any questions or comments you
may have.
You will also find Sean Eddy's system,
HMMER
, to be of interest.
Martin Madera and Julian Gough have written a perl converter between
SAM and HMMer 2.0 formats. (The SAM programs only work with HMMer
1.7.)
We thank I. Saira Mian and Finn Drablos for their important
early evaluations of the system. Finally, we thank the entire UCSC
Computational Biology Group (now forming the core of a Center for
Biomolecular Engineering), led by David Haussler, who got this whole
thing started. This
work was supported in part by NSF grants CDA-9115268, IRI-9123692, BIR
94-08579, MIP-9423985, DBI-9808007, and EIA-9905322; DOE grants
94-12-048216 and DE-FG03-99ER62849; ONR grant N00014-91-J-1162; NIH
grant GM17129; a grant from the Danish Natural Science Research
Council; and a gift from Digital Equipment Corporation.
Next: 2 New to this
Up: SAM (Sequence Alignment and
Previous: Contents
  Contents
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group