mardigras (Matrix Analysis of Relaxation for DIscerning the Geometry of an Aqueous Structure)

Version 3.2.0

Topics


Description

mardigras is a FORTRAN program for calculating proton-proton distances and error bounds from cross-peak intensities measured from a 2D NOESY (ROESY) spectrum. MARDIGRAS algorithm converts the intensity matrix (non-observable intensities are supplied from a model) to a relaxation rate matrix, which is improved by an iterative procedure. Distances are then calculated from the final cross relaxation rates. The error bounds are estimated by adding noise and relative errors to the intensities converted from the final rates assuming a two-spin approximation.

mardigras provides a more logical way to determine the distance error bounds used as the input to restrained molecular dynamics and distance geometry programs: the standard deviations of the distances obtained from N MARDIGRAS calculations using intensities that are modified by adding simulated random noise and relative errors to the original input intensities. This procedure has also been termed the RANDMARDI approach.

The molecular motions which affect the relaxation rates are described as either isotropic, local (scaled using the temperature factors), or model-free. Local motions such as methyl rotation and ring flipping are modeled as N site jumps. Chemical exchange of the exchangeable protons are taken into account by adding an exchange rate matrix to the relaxation rate matrix.

"Preprocessing" is required to convert homo or heteronuclear 3D NOESY data into 2D intensities before they are input to mardigras. The program 3Dcorma provides a procedure that deconvolutes the input homonuclear 3D data into two sets of asymmetric 2D data for the two mixing periods. Using the program symm, the asymmetric data can be converted to intensities that are equivalent to 2D NOESY intensities for the two mixing times. Partially relaxed NOESY intensities may also be corrected using symm to obtain fully relaxed intensities.


Input parameters

List of keywords and parameters read by mardigras :

mardigras reads input parameters from a parameter file file.PARM . If the default name INP.PARM is used and the file is present in your working directory, mardigras can be invoked by:

otherwise the file name needs to be specified:

If the command line arguments are not allowed on your system, then the default input file name INP.PARM must be used.
file.PARM contains the parameters necessary for mardigras calculations. The parameter follows the keyword which is always in upper case. Some of the parameters are required and some are optional. Comments can be included in file.PARM as long as they do not begin with the keywords (it is recommended that comments be in lower case). The parameter lines and comment lines can appear in any order. mardigras reads only the first 100 lines in file.PARM.


The following keywords (bold, upper case) and parameters (italic) are required for all mardigras calculations:

PDB FILE pdbfile INT FILE file.INT.1 OUT FILE prefix ORD FILE file.order

The following lines are required for calculating distances from ROESY intensities:

ROESY PPM FILE file.ppm JCP FILE file.jc CARRIER FREQUENCY 5.12 SPIN-LOCK FIELD 45.4545

The following lines are optional and, if missing from file.PARM, default parameters (which may not be appropriate in some situations) will be assumed:

FREQUENCY 600.0 MINITN 3 MAXITN 8 DELTA6D 0.0001 R6THD 0.0001 METHYL JUMP 3 NOISE NORMALIZE

The following lines are optional and, if missing from file.PARM, will NOT be used :

RANDMARDI 50 PRINT DISTANCES
PRINT RATES
PRINT ERRORS
EXCHANGE RATE FILE file.exch ARBITRARY DST FILE file.dst

FILES

mardigras uses some of the following files :

Input files

Output files

NOTE: The input and output file names should not exceed 20 characters.
Parameter file

The file.PARM file (default name INP.PARM) contains a list of input parameters for running mardigras. All input and output filenames are specified within this file. See Input parameters for a detailed list.


PDB file

pdbfile is a modified PDB (Protein Data Bank) coordinate file and is prepared by using corma.in .

A HEADER card should go in line 1 to describe the source of the coordinates. This file must also contain the keyword ISOTROPIC, LOCAL or MODELFREE on line two in the format:

REMARK CORRELATION TIME: ISOTROPIC

The atomic coordinates are entered in PDB format beginning on line 3. There is no need to eliminate non-hydrogen atoms from the PDB file, the program can handle stripped or full coordinate sets. All atoms starting with "H" or a digit followed by "H" are assumed to be protons.
All methyl protons are labeled in the order:

and must be placed in the same order in the PDB file.
If more than one methyl group appears in a residue, their names must be distinct. Protons forming other pseudo-protons such as unresolved methylene geminal protons and symmetric ring protons follow the same rules. mardigras recognizes methyl protons in the amino and nucleic acids as long as they follow the name of the atom to which they are bonded. Other types of pseudoprotons defined in the input intensity file are recognized by following the naming convention for pseudo-protons described later.

The occupancy factor, the first field after the coordinates, is used to turn protons "on" or "off" for the case of deuteration or alternate conformations (enter a value of 1.00 or 0.00). To specify fractional occupancy, use a number between 0.0 and 1.0. This may be appropriate, for example, in the case of partial amide exchange for a deuteron.

The atomic diffusion times (TDIFF; in nanoseconds) substitutes the last field which usually contains the crystallographic B factors in a PDB file :


Intensity file (file.INT.1)

This file contains the experimental intensities in the format:

line #
1 HEADER Whatever you like can go here.
2 REMARK Whatever you like can go here.
3 MIXING TIME: 0.275 (sec.)
4 ATOM1   ATOM2    INTENSITY
5 HB2  11 HG1  32  0.020
6 (etc.)
or
line #
1 HEADER Whatever you like can go here.
2 REMARK Whatever you like can go here.
3 MIXING TIME: 0.275 (sec.)
4 ATOM1   ATOM2    INTENSITY ERROR% NORM
5 HB2  11 HG1  32  0.020     10.0   1
6 (etc.)

More than one line is allowed for REMARK, HEADER or MIXING, but the total (not include ATOM line) must not exceed 6. Only one line should begin with ATOM, and this line must immediately precede the data.
The format for atom names is strictly fixed: (a4,i3,x,a4,i3)
The protons are identified by atom names and residue numbers. Atom names are up to 4 characters and residue numbers should be smaller than 999. The atom pairs for an intensity is separated by a space. The intensity column and optional relative intensity error and normalization flag columns are in free format.
The error column is used in RANDMARDI procedure and the calculation of distance error bounds.
The normalization flag column takes integer the 0 or 1. Zero indicates the corresponding intensity is not used for intensity normalization.
Note: the ERROR and NORMALIZATION FLAG columns are optional. If the keyword ERROR or NORM is in the line beginning with atom the program expects data in the column.

Unresolved peaks may be entered in the experimental intensity file, but they must be assigned to a pair or a group of cross-peak intensities. This can be done in two ways:

1) For methyl, methylene and symmetry-equivalent aromatic ring protons, the unresolved peaks may be entered simply by referring to the group by a general name. The general names are as follows :

pseudoatom character example :
  • methyl
  • M HD11, HD12, HD13 -> MD1
  • methylen
  • Q HB1, HB2 -> QB
  • amino proton pair
  • Q HN21, HN22 -> QN2
  • ring proton
  • R HD1, HD2 -> RD

    Note: for methyls in residues other than 'standard aminoacid' and THY, the nomenclature of methyl protons in pdb file has to start with "HM" instead of "H". For the two unresolved geminal protons, for example, HB1 and HB2, the order of 1 and 2 in the PDB file can NOT be reversed.
    Other problems can arise if a methyl is expected by the built-in naming rules and only one or two protons are actually found in the pdb file, e.g. the structure is a high pH structure and side-chain amines are not protonated. This will result in unpredictable results.

    2) 'UNRESOLVED' intensities (see man pages for corma) are read but not used by mardigras.


    constrain.dat

    constrain.dat - the name is strictly fixed - defines fixed distances in the spin system. If a pair of protons from the pdbfile or input intensity file matches the atom and residue names in the constrain.dat file, the pair is considered as having a fixed distance. A list is then made for all the fixed distances in the system. The distance values appearing in constrain.dat are not used in the calculation; instead, the corresponding distances calculated from the pdbfile are used. The list of fixed distances plays two roles in the calculation: defining fixed distances for fixed distance normalization, and defining constraints for for mardigras iteration procedure.

    constrain.dat contains fixed distances for standard nucleotides and amino acids with conventional atom names. The user must modify the file for non-standard residues or non- conventional nomenclature, following the correct format. For example, there is one fixed distance in CYS (residue name is up to three characters). If we wish to include three possible nomenclatures, the constrain.dat file should have the following lines:

    CYS 3
    1HB 2HB 1.772
    HB1 HB2 1.772
    QB QB 1.772
    
    The first line indicates that there are three possible fixed distances (or one fixed distance with three possible names) in residue CYS. The following three lines define the atom names for the fixed distances.


    pseudo.inp

    pseudo.inp - the name is strictly fixed - is used to define pseudoatoms in non-standard residues. Each pseudoatom is defined by two lines, for example :

    PSEUDO QG    3
    HG1   3 HG2   3
    PSEUDO MG    4
    HG1   4 HG2   4 HG3   4
    
    The first line must start with the keyword PSEUDO, followed by the pseudoatom name (starting with letter M, Q or R) and the residue number to which it belongs.
    The second line contains the list of atoms that are part of the pseudoatom. The atoms are written as atom name and residue number and are read with format (a4,i3,1x,a4,i3,1x,etc.).


    Chemical shift file

    Chemical shift file has the same format as PDB file for the first 5 columns (record type, atom numbers, atom names, residue names and residue numbers). The 6th column is the chemical shift in ppm. In the case that chemical shift assignment is incomplete, the unassigned proton resonances by default take the value of carrier frequency.


    J coupling file

    J-couplings are in Hz. The file has the same format as the input intensity file, i.e., (a4,i3,x,a4,i3) for the atom names and free format for the data column. It is found that only the strong scalar couplings are important in HOHAHA corrections. As an option, a program named jcoup is provided for simulating J coupling constants from a pdbfile or an ensemble of pdbfiles. For more details, see man pages for jcoup.


    Exchange rate file

    The program handles two type of chemical exchanges :

    The total exchange matrix is K"=K+K'. The relaxation matrix R is modified by R'=R+K".
    The exchange file contains atom names in (a4,i3,x,a4,i3) format and a column of data that can be either the exchange rate k (ATOM1 and ATOM2 are the same) and k' (ATOM1 and ATOM2 are different), or the elements of the exchange matrix K". If ARBITRARY is not present in file.PARM, the input data are exchange rates and will be converted to the matrix elements n the program. Otherwise if ARBITRARY is present, the exchange matrix is read as is from the exchange file and is not modified in any manner.


    Order parameter file

    Order parameters are also defined for proton pairs. The order parameter file has the same format as the intensity or exchange rate file. A default order parameter is used for proton pairs which do not have an input order parameter. The default value is 1.0, but it can be modified at the beginning of the order parameter file by including the following line before the data (for example):

    The file may then contain only order parameters that are different from the default.


    Distance file

    Distances which are determined previously can be used as input constraints for mardigras calculations, i.e, these distances are fixed in the iterations. The corresponding intensity (if there is one) should be removed from the intensity file, otherwise the input distance is changed to fit the intensity in the iterations. distance file serves different purpose from constrain.dat. It provides additional fixed distances (independent from the model) for mardigras iterations, but does not alter the fixed-distance list for normalization using fixed-distance intensities. distance.file has the same format for the atom names as file.INT.1, with free format for the distance column.


    Output files :

    Input files

    NOTE: The input and output file names should not exceed 20 characters.

    If RANDMARDI is not applied , the program creates the following files:


    prefix.DSTXX

    Prints out all distances calculated for iteration cycle XX. It prints out experimental and calculated intensities for comparison, as well as a comparison of model and mardigras distances. Because it is fairly time-consuming, iterative distance fits for methyl and other pseudoatom distances, which are defined to the geometric center of the protons are only calculated for the final cycle, and reported distances for earlier cycles are simple isotropic-motion-only estimates.


    prefix.OUT

    prints out various informational messages. Particularly important are methylene and aromatic ring pseudoatoms that have been recognized in the input intensity file. Very importantly, this file reports when an atom name in the input intensity file is not recognized. This is often indicative of a nomenclature difference between the PDB file and the intensity file or a failure to follow pseudoatom name- formation rules. This file also contains the proton pairs which have been classified as fixed-distance pairs due to connectivity constraints (for a list of possible fixed distances, see the file constrain.dat).


    prefix.BNDS (used to be called prefix.DG)

    Prints out distance constraints suitable for distance geometry or flatwell-restrained molecular dynamics input. This file contains lower bounds, upper bounds, widths, distances calculated by mardigras, and the input model distances. The upper and lower deviations are not generally symmetric since measured errors are in intensities (not distances) and the transformation is not linear with intensity. prefix.BNDS for different input parameters can be used to determine the final bounds. The program avgbnds is available to average (or find the minima and maxima) of a number of prefix.BNDS files.


    prefix.CNSTRNT

    Prints out parameters for constraints in molecular dynamics simulations, including the force constant for a parabolic potential pseudoenergy curve for deviations from this distance. The smaller the estimated error in distance, the larger the value of this constant. The constant is currently scaled to yield an energy of 1/2kT when the displacement is equal to the estimated distance error. Preliminary results suggest that this is probably too high. This file is compatible with very old AMBER format, which is not used anymore at UCSF.


    prefix.ERRORS

    Prints out any errors which may have occurred during the run. Quite importantly, mardigras prints out a list of any proton pairs which have observed intensities, but to which mardigras was unable to assign a new relaxation rate (and hence a new distance). This error sometimes occurs when strong spin diffusion is present and the weak intensity is underdetermined by experimental data, or the model is poor over such a region -- the resulting inconsistencies may cause physically meaningless rates to appear upon backtransformation from intensities. Random variation of the intensities within the limits of experimental errors (RANDMARDI) may compensate the weak intensities and allows the distances to be calculated. If the error is due to a poor model, after the initial efforts at structure refinement, an improved starting model may be available which can eliminate many intensities from this errors list.


    prefix.DSTREJ and prefix.BNDSREJ

    Print out rejected distances. Rejected distances are those for which the relaxation rate is very small - corresponding to distances greater than 5 A - and thus the distances have high uncertainty.


    The following files are generated if RANDMARDI is applied :


    prefix.bnds

    Is similar to prefix.BNDS, but the distances are the average of N calculations using randomly modified intensity data, and the distance bounds are the average distance plus and minus the standard deviation (STD) of the distances. The width is two times the STD. The minimum and maximum values of each distance and the number of times the distance is calculated are also included in prefix.bnds. The program avgbnds takes either the SDT bounds or the minimal and maximal distances calculated for different input parameters (or intensities from spectra of different mixing times) to determine the final distance constraints suitable for distance geometry or restrained molecular dynamics input.


    prefix.dstxx, prefix.ratexx and prefix.errxx

    Contain calculated distances, calculated relaxation rates and errors added to experimental intensities for each of the N RANDMARDI calculations, respectively. The values for the first 10 calculations are in the files with xx = 01, the next 10 are in files with xx = 02. For example the first 10 distances calculated are in prefix.dst01, the next 10 in prefix.dst02, and so on. xx increments 1 for every 10 calculations. The number N specified in file.PARM for RANDMARDI does not need to be a multiple of 10; the residual will make the last file. These files are printed by mardigras only if specifically requested by user (keywords PRINT DISTANCES, PRINT RATES and PRINT ERRORS in file.PARM).


    prefix.OUT

    is the same as described above for regular mardigras, but contrains information for N calculations, where N is the number of random variations.


    prefix.ERRORS

    Is the same as described above, but for N calculations.


    If the input intensities are ROESY, the following file is generated:

    prefix.ROEFCTR

    Is output for ROESY simulation. It contains information about the chemical shift dependence and HOHAHA corrections of the REOSY intensities. A brief explanation can be found in the header of this output file. The purpose of this file is to give the user a feeling about how much correction to the intensities has been made for the chemical shift dependence and HOHAHA effect, and whether the correction for the direct HOHAHA effect is in the valid range.


    Bugs

    This program was developed at UCSF on Sun SPARC stations with SunOS Release 4.1.3 UNIX operating system. It has not been fully tested on any other system and may therefore experience machine- or implementation-dependent problems.


    Authors

  • Brandan A. Borgias
  • Paul D. Thomas
  • He Liu
  • Anil Kumar
  • Marco Tonelli

    References

    The general algorithm is described in : The R factor and the sixth root R factor are considered in : The description for averaging of pseudoproton relaxation rates and distances can be found in : Effect of internal motion on the interproton distances is considered in : Application to the chemical exchange problems is described in : An application of the ensemble averaging is given in : ROESY calculations are published in : The method and application of RANDMARDI is found in : Simulation and correction of partially relaxed intensities is found in :

    Contact

    Address technical questions and problems to:
    Marco Tonelli
    Department of Pharmaceutical Chemistry
    University of California
    San Francisco, CA 94143-0446
    Tel.: (415) 476-4378
    E-mail: tonelli@picasso.nmr.ucsf.edu

    Address other questions to:
    Prof. Thomas L. James
    Department of Pharmaceutical Chemistry
    University of California
    San Francisco, CA 94143-0446
    Tel.: (415) 476-1569
    E-mail: james@picasso.nmr.ucsf.edu


    Last revised 10-19-2000 by Marco Tonelli.