Help for profile and structure alignments

By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile alignments allow you to store alignments of your favourite sequences and add new sequences to them in small bunches at a time. A profile is simply an alignment of one or more sequences (e.g. an alignment output file from CLUSTAL W). Each input can be a single sequence. One or both sets of input sequences may include secondary structure assignments or gap penalty masks to guide the alignment.

The profiles can be in any of the allowed input formats with "-" characters used to specify gaps (except for MSF-RSF where "." is used).

You have to specify the 2 profiles by choosing menu items 1 and 2 and giving 2 file names. Then Menu item 3 will align the 2 profiles to each other. Secondary structure masks in either profile can be used to guide the alignment.

Menu item 4 will take the sequences in the second profile and align them to the first profile, 1 at a time. This is useful to add some new sequences to an existing alignment, or to align a set of sequences to a known structure. In this case, the second profile would not be pre-aligned.

The alignment parameters can be set using menu items 5, 6 and 7. These are EXACTLY the same parameters as used by the general, automatic multiple alignment procedure. The general multiple alignment procedure is simply a series of profile alignments. Carrying out a series of profile alignments on larger and larger groups of sequences, allows you to manually build up a complete alignment, if necessary editing intermediate alignments.

SECONDARY STRUCTURE OPTIONS. Menu Option 0 allows you to set 2D structure parameters. If a solved structure is available, it can be used to guide the alignment by raising gap penalties within secondary structure elements, so that gaps will preferentially be inserted into unstructured surface loops. Alternatively, a user-specified gap penalty mask can be supplied directly.

A gap penalty mask is a series of numbers between 1 and 9, one per position in the alignment. Each number specifies how much the gap opening penalty is to be raised at that position (raised by multiplying the basic gap opening penalty by the number) i.e. a mask figure of 1 at a position means no change in gap opening penalty; a figure of 4 means that the gap opening penalty is four times greater at that position, making gaps 4 times harder to open.

The format for gap penalty masks and secondary structure masks is explained in the help under option 0 (secondary structure options). >>HELP B << Help for secondary structure - gap penalty masks

The use of secondary structure-based penalties has been shown to improve the accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty masks to be supplied with the input sequences. The masks work by raising gap penalties in specified regions (typically secondary structure elements) so that gaps are preferentially opened in the less well conserved regions (typically surface loops).

Options 1 and 2 control whether the input secondary structure information or gap penalty masks will be used.

Option 3 controls whether the secondary structure and gap penalty masks should be included in the output alignment.

Options 4 and 5 provide the value for raising the gap penalty at core Alpha Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties are multiplied by the amount specified.

Option 6 provides the value for the gap penalty in Loops. By default this penalty is not raised. In CLUSTAL format, loops are specified by "." in the secondary structure notation.

Option 7 provides the value for setting the gap penalty at the ends of secondary structures. Ends of secondary structures are observed to grow and-or shrink in related structures. Therefore by default these are given intermediate values, lower than the core penalties. All secondary structure read in as lower case in CLUSTAL format gets the reduced terminal penalty.

Options 8 and 9 specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand.

CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input files. For many 3-D protein structures, secondary structure information is recorded in the feature tables of SWISS-PROT database entries. You should always check that the assignments are correct - some are quite inaccurate. CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.

FT   HELIX       100    115
FT   STRAND      118    119

The structure and penalty masks can also be read from CLUSTAL alignment format as comment lines beginning "!SS_" or "!GM_" e.g.

!SS_HBA_HUMA    ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA
!GM_HBA_HUMA    112224444444444222122244444444442222224222111111111222444444
HBA_HUMA        VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK

Note that the mask itself is a set of numbers between 1 and 9 each of which is assigned to the residue(s) in the same column below.

In GDE flat file format, the masks are specified as text and the names must begin with "SS_ or "GM_.

Either a structure or penalty mask or both may be used. If both are included in an alignment, the user will be asked which is to be used.