MARDIGRAS-Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure
MARDIGRAS-Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure

MARDIGRAS-Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure

Table of Contents

mardigras (Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure)

GO TO MARDIGRAS VERSION 3.2

SYNOPSIS VERSION 3.0

mardigras [input-file]

DESCRIPTION

mardigras (Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure)
is a FORTRAN program for calculating proton-proton distances from cross-peak intensities measured from a 2D NOE experiment.

The general approach
A model structure is used to generate a theoretical 2D NOE spectrum. Wherever possible, experimental intensities are substituted into the theoretical spectrum to yield a full spectrum (hybrid intensity matrix) suitable for direct solution of the relaxation rate matrix R and, subsequently, the distance matrix. Iterative steps require that (a) crossrelaxation rates between protons with known interproton distance are not allowed to change; (b) diagonal and offdiagonal elements are required to be consistent; and (c) cross-relaxation rates corresponding to resolved and observed cross-peaks only are allowed to change. This process incorporates all the effects of network relaxation and multiple spin effects.

Programs for structure elucidation by NMR
The mardigras and corma programs are two separate programs for structure elucidation by nmr. newhyd prepares a pdb file for input to corma.in. If the user has the coordinates in the PDB format for heavy atoms but not protons, it supplies the missing protons and their coordinates; it is a stand alone program. corma.in prepares the output pdb file from newhyd for input to mardigras or corma. mardigras uses the output pdb file from corma.in for the iterative relaxation matrix calculation.

New features
The programs mardigras, and corma have been modified to incorporate (i) an improved computation of distances when averaging due to motion or overlap of methyl groups, methylene proton pairs or symmetry-equivalent aromatic ring protons exist, (ii) internal motions as modeled by a `model-free' approach, and (iii) chemical exchange described by a kinetic matrix of exchange rates. corma.in has also been modified to include internal motion parameters.

Calculation of distances for methyl, methylene or symmetryequivalent aromatic protons.

One of the most important additions to the mardigras or corma algorithm lies in the calculation of distances involving methyl groups. Methyl rotation can be modeled in one of four ways by mardigras (see METHYL JUMP below under INPUT). For all models except model 0, distances are calculated iteratively to a pseudoatom at the geometric center of the three methyl protons. Geometric parameters in models 1, 2 and 3 are taken from the input model coordinates and are assumed to be correct. One should therefore have some degree of confidence in the relative orientations of methyl groups in the input model before using one of these more accurate methods of calculating distances to methyl pseudoatoms. The iterative fitting procedure, then, is simply a one-parameter (viz. distance) fit. (One could conceivably perform a multi-parameter fit over methyl orientations as well, but this would not yield a unique solution.) The advantage of this technique is that, when using mardigras for structure refinement, methyl distances can be calculated very accurately, and no additional uncertainty need be added to calculated distance bounds. For model 0, inter-residue distances to methyls (and other pseudoatoms) are calculated for the two extreme geometric cases. These extremes define the upper and lower bounds on possible distances. This will prevent unwarranted restraint on distances when the modelstructure geometry is unreliable.

A similar procedure is followed for distances to pseudoatoms which represent either (1) a methylene proton pair or (2) symmetry-equivalent aromatic ring protons. These protons are chemically equivalent, or sufficiently close such that their cross-peaks are not resolvable. In both cases we assume that motion is occurring which is slow relative to the overall tumbling time (and thus does not contribute to relaxation) but fast relative to the relaxation times. Effective pseudoatom distances are obtained by r-6 averaging over the two actual distances. This approach is rigorous for aromatic ring flip but not for methylene pairs. The equivalence (often approximate) of methylene chemical shifts is not necessarily a result of their mobility, although the assumption of motion may be better for protein side-chains and other freely-rotatable methylenes. In some cases, namely for constrained methylenes in DNA and Proline, it may be a better approximation to treat them as static atoms. Note that this r-6 averaging reduces to the same interaction as the case for static methylene protons in the limit of isolated spins. All of the error is introduced in the term for transfer of magnetization between the two methylene protons. In this model, magnetization is transferred perfectly, since the two atoms are collapsed into a single pseudoatom. The result of this treatment will be an error in the calculation involving local spin-diffusion effects. The error is only in a single spin-pair, and the results of preliminary tests indicate that the the overall error introduced is minor. Consider a methylene pair near two other protons A and B. The error introduced is at its maximum when the four protons are in line and at the closest possible distance. The error is zero when the intra-methylene axis is perpendicular to the A-B axis.

Nevertheless, the pseudoatom approach is the simplest reasonable model, since we have only a single experimental cross-peak and are justified in deriving only one distance. If the model geometry is correct, then we can extract the proper distances from the effective distance (using an iterative fitting procedure similar to that for methyls). If the model geometry is poor, we can at least calculate error bounds on that distance assuming extreme-case geometries. This is what is done. If methyl jump model 0 is chosen, the reported distance is the effective average distance accounting for all surrounding protons, with upper and lower bounds adjusted accordingly. Models 1, 2 and 3 assume that the model geometry should be taken to be correct.

Model-free approach to internal motion
When the details of internal motion are not known, the model-free approach is useful. This model is now one of the options for the calculation of the relaxation matrix. The spectral density Jmodfree for the model-free model is defined in terms of the frequency C, anisotropy parameter A, order parameter S2, overall correlation times I1 and I2, and the internal correlation time Ie as

Jmodfree(C,A,S2,I1,I2,Ie)
=AS2J(I1,C)+(1-A)S2J(I2,C)+A(1-S2)J(Ip,C)+(1-A)(1-S2)J(Ipp,C)

where
J(In,C)=In/(1+(InC)2)

Ip=IeI1/(Ie+I1)

and
Ipp=IeI2/(Ie+I2)

In the program the ratio I2/I1 is used. For I2/I1=1, or A=1, isotropic overall motion is obtained. If the order parameter S2 is set to 1, the internal motion is ignored. If both A and S2 are set to one, the result is isotropic overall motion with no internal motion. The `model-free' approach is same as the well-known `wobble-in-a-cone' model. (Ref.: Lipari, G. and Szabo, A. J. Am. Chem. Soc. , 104, 4546, 4559 (1982))

To use model free approach, the input PDB file has to be prepared using corma.in with model free option. The modified PDB file contains the key word and parameters that are required by model free approach. The order parameters are not included in the PDB file because they are defined for proton pairs, not for individual protons. A seperate input file with the same format as the intensity file is required for the order parameters. It is not necessary to have a complete list of order parameters in the file. The default value is 1.0 for all proton pairs. Only those order parameters which are different from 1.0 need to be included in the order parameter file.

The order parameter may be assessed via molecular dynamics calculations (Ref.: Koning, T.M.G. et al., Biochemistry, 30, 3787-3797 (1991)) or via experimental determination.(Ref.: Lane, A.N., J. Magn. Reson., 78, 425-439 (1988)) Of course, if the order parameter is less than one but identical for all proton resonances, then the relative cross-peak intensities will not be altered.

Quality factors
We added quality factors Q, Q2, Qx and Q2x to mardigras/corma output. The quality Q factor is described in J. M. Withka, J. Srinivasan and P. H. Bolton, J. Magn. Reson. 98, 611-617 (1992). The sixth root quality factors, Qx and Q2x, are obvious extrapolations from sixth root R factors.

Chemical exchange described by kinetic matrix Exchange with bulk solvent protons and exchange between two amino protons are take into account in the iterative procedure of generating distances from a hybrid NOE intensity matrix. NOE intensities of exchangeable protons are reduced by the exchange with solvent protons. Neglect of such effects produces the upper bounds for the distances. Using the upper limit of the exchange rates gives the lower bounds for the distances. Of course, the actual values of the exchange rates can be utilized if the data are available. When the two amino protons are unsolvable, a mean distance is calculated iteratively from the relaxation rate based on r-6 averaging.

The relaxation matrix is modified by adding exchange matrices. The exchange with solvent contributes to direct relaxation, and is described by a diagonal matrix K. The element Kii equals exchange rates for exchangeable protons, and zero for nonexchangeable protons. A second exchange matrix K' is formed for proton pairs which exchange the positions. If two protons, i and j, are exchanging positions with each other with a rate k, then K'ij = K'ji = -k, and K'ii = K'jj = k. Zero is assigned to K' elements if proton pair exchange is not involved. (See, J. Jeener, B. H. Meier, P. Bachmann and R. R. Ernst, J. Chem. Phys. 71, 4546 (1979).) The modified relaxation matrix is then R' = R+K+K'.

Note that if the option ARBITRARY is specified then the modified relaxation matrix is R' = R+K and the exchange matrix K is read as is from the exchange file and is not modified in any manner..

Peak Integration Errors
The percentage errors in the intensities due to peak integrations may be included as an additional column in the intensity file. The format is free, but a key word ERROR% is required at the top of the column, see example1.INT.1. Note, peak integration errors in the intensity file are not used in corma. They are used in mardigras, together with the signal to noise level given in the PARM file, to determine the upper and lower distance bounds (in output file prefix.BNDS).

INPUT

Under UNIX mardigras maybe invoked in two ways:

system-prompt:mardigras input-file

or if a file with name INP.PARM is present in your working directory.

system-prompt:mardigras

If the command line arguments are not allowed on your system, then the second form with the default input file name INP.PARM must be used.

Parameters for mardigras
The INP.PARM (default name for the input-file) or an equivalent file is a required file. It contains the required and optional parameters for mardigras. The following is an example of input-file INP.PARM.example2 for mardigras:

is the INP.PARM.example2 input file for mardigras. Comments may be put anywhere but must not begin with keywords. All keywords are upper case. In the following the lines beginning with the keywords PDB, INT and OUT are required.
1. The input pdb file prepared by corma.in: PDB FILE example2.pdb
2. The input intensity file:
INT FILE example2.INT.1
3. The prefix for the output files:
OUT FILE test52
4. The file containing the kinetic exchange rate matrix and is optional:

EXCHANGE RATE FILE example2.exch
5. The exchange rates are arbitrary.(optional specification): ARBITRARY
6. The spectrometer frequency in MHz.(optional specification, the default is 500.0 MHz):
FREQUENCY 600.0
7. DELTA6D is the minimum value of the shift in the sixth root R factor.
(optional specification, the default is 0.0005): DELTA6D 0.0
8. R6THD is the minimum value of the sixth root R factor. (optional specification, the default is 0.0005): R6THD 0.0
9. MINITN is the minimum number of iterations. (optional specification, the default is 2): MINITN 3
10. MAXITN is the maximum number of iterations. (optional specification, the default is 5): MAXITN 8
11. The NOISE level.
(optional specification, the default is 0): NOISE ABSOLUTE NORMALIZED 0.0025
12. The cross-peak intensity normalization. (optional specification):
NORMALIZE ALL
13. The methyl jump model.
(optional specification):
METHYL JUMP 3

As seen above the INP.PARM (default name for the input-file) is a self-documenting file. However, we provide a more detailed description of some of the contents of this file. As specified in this file, mardigras requires the following files as input:

Input coordinate file in the PDB format A template file example2.pdb is provided. The format of this file is explained more fully in the corma 4.0 program man pages. It must be preprocessed by corma.in.

Input intensity file, again in the same format as for corma. The template file is example2.INT.1. Refer to the corma man pages for more details.

Input exchange rate file, again in the same format as for corma.

Input and output file names cannot exceed 20 characters. Only one space must separate the words and numbers in the input-file. The coordinate file must be specified with the character string:

PDB FILE pdbfilename

The experimental intensity file must be specified by the string:

INT FILE intfilename.INT.1

The characters ".INT.1" must appear at the end of the file name.

The exchange rate file must be specified by the string:

EXCHANGE RATE FILE exchfilename

The exchange rates are in [1/sec].

The prefix for output intensity files must be specified by the string

OUT FILE prefix

All output file names will begin with this string.

The minimum number of iterations for mardigras to cycle through. This number is entered as the string:

MINITN n

where n is an integer.

The maximum number of iterations for mardigras to cycle through. This number is entered as the string:

MAXITN n

where n is an integer.

An estimate of experimental noise level. This is taken to be an absolute error in experimental intensities. When mardigras calculates errors in distances, it takes into account this absolute error as well as the difference in the fit between each experimental NOE intensity and the corresponding intensity calculated from the converged relaxation matrix.

There are three options for entering the noise level. In the input file, the appropriate keywords must precede the noise level estimate. Here are examples of the input for each option:

NOISE ABSOLUTE NORMALIZED 0.0025

The keyword ABSOLUTE indicates that the noise is an absolute, and not a relative quantity. The keyword NORMALIZED means that this value has been normalized so that the 2D NOE intensity of a diagonal peak extrapolated to mixing time 0 is equal to 1.0. So in this case, the noise was estimated to be 0.25% of the diagonal peak intensity at mixing time 0.

NOISE ABSOLUTE UNNORMALIZED 11000.

The keyword UNNORMALIZED means that the value is in the same units as the input experimental intensities. This value, then, will depend on the integration routine used to calculate peak volumes. This is probably the most useful option, since one can get at least a reasonable estimate of the spectral noise level by looking at the size of the smallest observable cross-peak. The noise level is some fraction of this smallest peak intensity.

NOISE RELATIVE PERCENT 1.0

The keyword RELATIVE means that the noise should be considered as a constant percentage of the cross-peak intensity. That is, the error in a given cross-peak is proportional to its magnitude. This is rarely applicable.

How mardigras should normalize the input experimental intensities relative to the calculated absolute values. There are three choices:

Omit the keyword line completely. This option should be chosen if the normalization has already been carried out by some other means. Intensities should be normalized to be a fraction of the diagonal peak intensity (of a single proton) at mixing time 0.

NORMALIZE ALL

The keyword NORMALIZE means that the experimental intensities need to be normalized. The keyword ALL means that the normalization factor should be computed from the sums over all cross-peaks which have observed experimental intensities as follows:

fnorm = ||||R IexptijR Icalcij_______||| This is the recommended option when the starting model is reasonably good.

NORMALIZE FIXED

The keyword FIXED tells mardigras to perform the normalization summation only over those experimental intensities which correspond to fixed distances, e.g., methylene proton distances and aromatic ring proton distances. This option is recommended when the starting model is very poor, e.g., an extended chain structure of a protein.

How mardigras should treat methyl protons. The treatment of methyl protons is specified by

METHYL JUMP n

where n is an integer. The keyword METHYL refers to treatment of methyl rotors. The keyword JUMP indicates that the methyl rotors should be treated as rotating fast compared to the overall molecular tumbling rate. The keyword JUMP must be followed by an integer from 0 to 3. Each value corresponds to a different method of calculating distances to a methyl pseudoatom. In all cases the distance is calculated to a pseudoatom which is the geometric mean position of the three methyl protons. This distance is found by an iterative fitting procedure, and therefore requires no additional adjustment to distance bounds.

JUMP 0: This model calculates intra-residue distances using a 3-site jump model. It assumes that the geometry of the residue itself as given in the model PDB file is essentially correct regardless of the final conformation of the molecule. Therefore, the geometric dependence of methyl relaxation can be calculated for intra-residue NOE cross-peaks. It assumes, however, that absolutely nothing else is known about the molecular conformation. In this case, the methyl orientation relative to any other group is not known, and the distance is calculated according to a simple isotropicmotion only model. This distance has been shown to be very close to the value obtained by averaging over all possible methyl-group orientations. Thus we recommend using this model when the starting model-structure is very poor.

JUMP 1: This model calculates intra-residue distances using the 3-site model as above. Inter-residue distances are calculated according to an 18-site jump, which is a good approximation to free rotational diffusion. While this model is clearly not appropriate for methyl rotation, where the staggered conformation is highly favored, it is probably the best model to use when the model is of intermediate quality. In these cases, approximate relative positions are known. The model angle between the methyl plane of rotation and the other protons is therefore a better approximation than averaging over all possible angles. Nevertheless, the model may not be accurate enough to consider the actual distribution of protons in the plane to be accurate. In this case, it makes sense to average over all possible positions of the methyl protons in the circle described by their rotation-- i.e we should use free rotational diffusion.

JUMP 2: This model calculates both intra- and interresidue methyl distances according to an 18-site jump model. This model is appropriate when the starting model does not have methyl protons placed consistently in energetically favorable (i.e. generally eclipsed) positions. A three-site model assumes that the protons spend all of their time in one of three allowed positions. If this assumption is not appropriate, we should average over all possible positions (wellapproximated by an 18-site model).

JUMP 3: This model calculates both intra- and interresidue methyl distances according to a 3-site jump model. This is the most realistic and precise model, and is recommended when the starting model-structure is good.

The input parameter lines described above can be entered in any order into the file INP.PARM. The maximum number of lines allowed in this file is to 99.

OUTPUT

The program creates one output file for each iteration cycle, plus five other groups:

prefix.DSTn prints out all distances calculated for iteration cycle n. It prints out experimental and calculated intensities for comparison, as well as a comparison of model and mardigras distances. Because it is fairly timeconsuming, iterative distance fits for methyl distances are only calculated for the final cycle, and reported distances for earlier cycles are simple isotropic-motion-only estimates.

prefix.OUT prints out various informational messages.Particularly important are methylene and aromatic ring pseudoatoms found that have been recognized in the input intensity file. Very importantly, this file reports when an atom name in the input intensity file is not recognized. This is often indicative of a nomenclature difference between the PDB file and the intensity file or of a failure to follow pseudoatom name-formation rules (see corma program man pages). This file also contains the proton pairs which have been classified as fixed-distance due to connectivity constraints (for a list of possible fixed distances, see the file constrain.dat).

prefix.BNDS (used to be called prefix.DG) prints out distance constraints suitable for distance geometry or flatwell restrained molecular dynamics input. The particular format is for VEMBED, a version of distance geometry developed at UCSF. The three columns of distances are upper bound, lower bound and actual distance calculated by mardigras. The upper and lower deviations are not generally symmetric since measured errors are in intensities (not distances) and the transformation is sixth power (not linear) with intensity. The distance constraints are calculated by the worst case analysis based on motional averaging values, noise and relative fitting of experimental and converged intensity values.

prefix.CNSTRNT prints out parameters for constraints in

molecular dynamics simulations.These consist of distances of minimum energy and the force constant for a parabolic potential pseudoenergy curve for deviations from this distance. The smaller the estimated error in distance, the larger the value of this constant. The constant is currently scaled to a yield an energy of 1/2kT when the displacement is equal to the estimated distance error. Preliminary results suggest that this is probably too high. Alternatively, a flat well might be allowed over the region of uncertainty. The format for this file is compatible with AMBER.

Note: The prefix.BNDS and prefix.CNSTRNT files may have to be reformatted for input to VEMBED or AMBER since the input format requirements of these programs depend on the particular version available to the user.

prefix.ERRORS prints out any errors which may have occurred during the run. Especially importantly, mardigras prints out a list of any proton pairs which have observed intensities, but to which mardigras was unable to assign a new relaxation rate (and hence a new distance). This seems to occur sometimes when the model is poor over a region which is underdetermined by experimental data-- the resulting inconsistencies may cause physically meaningless rates to appear upon back-transformation from intensities. After the initial efforts at structure refinement, an improved starting model may be available which can eliminate many from this errors list.

prefix.DSTREJ and prefix.BNDSREJ (used to be called prefix.DGREJ) which print out rejected distances. Rejected distances are those for which the relaxation rate is very small -- corresponding to about 5 A -- and thus the distances have high uncertainty.

RECOMMENDED PROCEDURE FOR mardigras

mardigras is an NOE-derived distance refinement program, and must therefore be used in conjunction with other computational techniques from generating and refining geometrically-consistent molecular structures from distance information. The two computations particularly suited to this purpose are distance geometry and restrained molecular dynamics. We have published several studies of nucleic acid structure using mardigras derived distance. Recently, we performed a detailed study on the applicability of mardigras to real problems of protein structure determination. mardigras has been shown to generate accurate distances from a realistically limited and noisy calculated NOE spectrum. Starting from randomized and extended-chain starting modelstructures and using only distance geometry, a set of structures was produced which showed an impressive fit to the data and the actual molecular structure for which the spectrum was calculated. Yet it was also noted, not surprisingly, that mardigras performed better given a better starting structure. Therefore we propose the following general procedure.

Lacking an X-ray or homologous model structure, begin with an extended-chain structure. Run mardigras using methyl-jump model 0 (see above) to calculate distances. These distances are still much more accurate than those derived using the isolated spin-pair approximation. Put these distances, allowing for fairly large deviations estimated by mardigras, into a distance geometry algorithm to generate a family of possible structures. Then use several of these structures as starting models for a second cycle through mardigras, this time choosing either methyl-jump model 1 or 3. The spread of mardigras distances obtained for independent runs will reveal the dependence of the solution on starting model, and will generally be quite small, though less so if the data are inaccurate or misassigned or grossly incomplete. At this point one can generate more distance geometry structures or refine the earlier structures using molecular dynamics.

mardigras permits a logical means to calculate bounds, which are needed for distance geometry calculations or setting flat well size in restrained molecular dynamics calculations. There are various considerations which can be used to establish those bounds. For one thing, the distance "error bar" Wrij can be estimated from

(1) the error Waij calculated between an observed 2D NOE cross-peak intensity aij and the corresponding MARDIGRASconverged matrix element intensity, or (2) the minimum error dictated by the experimental noise level since, using the two-spin assumption,

rij ~~ (kIm)619/aij619 Wrij ~~ -(kIm)619. Waij/6aij67

where the constant k incorporates physical constants and the correlation time.

(3) distances calculated from spectra at different mixing times.

(4) distances calculated using different starting structures.

(5) in the case of motional or overlap averaging, worst case geometries enable calculation of maximum and minimum distances.

(6) distances involving exchangeable protons (imino, amino, amide) an upper limit on the exchange rate with bulk water modifies the lower bound.

This procedure for using mardigras is, of course, just a general suggestion based on our experience with the program. It is a relatively simple computational method for generating molecular structures (for roughly globular and globally rather rigid molecules) from NOE data. The primary assumptions behind mardigras are (1) that the overall tumbling is isotropic (prolate ellipsoids with eccentricity of less than 2.5 should be fine), and (2) that, aside from small local motions such as methyl rotation and aromatic ring flip, the molecule is rigid and interproton distances can be described by a single number. However, by using the model-free approach the effects of anisotropy and flexibility of the molecule can also be taken into account.

BUGS

There probably are some. This program was developed at UCSF on Sun SPARC stations with SunOS Release 4.1.3 UNIX operating system. It has not been fully tested on any other system and may therefore experience machine- or implementationdependent problems.

AUTHORS:

Brandan A. Borgias
Paul D. Thomas
He Liu
Anil Kumar

REFERENCES

The general algorithm is described in B. A. Borgias and T. L. James, J. Magn. Reson. 87 (1990) 475-487 and B. A.

Borgias and T. L. James, Meth. Enzymol. 176 (1989) with features described in B.A. Borgias, M. Gochin, D.J. Kerwood, and T.L. James, In Progress in Nuclear Magnetic Resonance Spectroscopy, J.W. Emsley, J. Feeney, and L.H. Sutcliffe (Eds), Oxford: Pergamon Press, 22, 83-100 (1990), T. L. James, Curr. Opin. Struct. Biol., 1, 1042-1053, (1991). The R factor and the sixth root R factor are considered in P.D. Thomas, V.J. Basus and T.L. James. Proc. Nat. Acad. Sci. USA, 88, 1237-1241 (1991). The description for averaging of pseudoproton relaxation rates and distances can be found in H. Liu, P. D. Thomas and T. L. James, J. Magn. Reson. 98, 163-175 (1992). Effect of internal motion on the interproton distances is considered in A. Kumar, T. L. James, and G. C. Levy, Israel J. Chem. 32, 257-261 (1992). Application to the chemical exchange problems is described in H. Liu, A. Kumar, K. Weisz, U. Schmitz, K. D. Bishop, and T. L. James, J. Am. Chem. Soc., 115, 1590 (1993).

CONTACT

Address questions and problems to:
Thomas James
Department of Pharmaceutical Chemistry
University of California
San Francisco, CA 94143-0446
fax: 415.476.8780

e-mail james@picasso.ucsf.edu

SEE ALSO

corma, corma.in, newhyd, comparison of hydrogen naming convensions for MARDIGRAS and other programs
Table of Contents

back to the software page