mardigras is a FORTRAN program for calculating proton-proton distances and error bounds from cross-peak intensities measured from a 2D NOESY (ROESY) spectrum. MARDIGRAS algorithm converts the intensity matrix (non-observable intensities are supplied from a model) to a relaxation rate matrix, which is improved by an iterative procedure. Distances are then calculated from the final cross relaxation rates. The error bounds are estimated by adding noise and relative errors to the intensities converted from the final rates assuming a two-spin approximation.
mardigras provides a more logical way to determine the distance error bounds used as the input to restrained molecular dynamics and distance geometry programs: the standard deviations of the distances obtained from N MARDIGRAS calculations using intensities that are modified by adding simulated random noise and relative errors to the original input intensities. This procedure has also been termed the RANDMARDI approach.
The molecular motions which affect the relaxation rates are described as either isotropic, local (scaled using the temperature factors), or model-free. Local motions such as methyl rotation and ring flipping are modeled as N site jumps. Chemical exchange of the exchangeable protons are taken into account by adding an exchange rate matrix to the relaxation rate matrix.
"Preprocessing" is required to convert homo or heteronuclear 3D NOESY data into 2D intensities before they are input to mardigras. The program 3Dcorma provides a procedure that deconvolutes the input homonuclear 3D data into two sets of asymmetric 2D data for the two mixing periods. Using the program symm, the asymmetric data can be converted to intensities that are equivalent to 2D NOESY intensities for the two mixing times. Partially relaxed NOESY intensities may also be corrected using symm to obtain fully relaxed intensities.
List of keywords and parameters read by mardigras :
mardigras reads input parameters from a parameter file file.PARM . If the default name INP.PARM is used and the file is present in your working directory, mardigras can be invoked by:
otherwise the file name needs to be specified:
If the command line arguments are not allowed on your system,
then the default input file name INP.PARM must be used.
file.PARM contains the parameters necessary for mardigras calculations. The parameter follows the keyword which is always in upper case. Some of the parameters are required and some are optional. Comments can be included in file.PARM as long as they do not begin with the keywords (it is recommended that comments be in lower case). The parameter lines and comment lines can appear in any order. mardigras reads only the first 100 lines in file.PARM.
The following keywords (bold, upper case) and parameters (italic) are required for all mardigras calculations:PDB FILE pdbfile
The following lines are required for calculating distances from ROESY intensities:ROESY
The following lines are optional and, if missing from file.PARM, default parameters (which may not be appropriate in some situations) will be assumed:FREQUENCY 600.0
There are three options for entering the noise level in
file.PARM. The noise (and optionally the relative errors of
the intensities) is used to determine the distance error
bounds by regular or RANDMARDI mardigras calculations. If
NOISE is omitted, the noise is zero. The appropriate keywords
must precede the noise level estimate.
Here are examples of the input for each option:
There are three options of how mardigras should normalize the input experimental intensities relative to the calculated absolute values:NORMALIZE ALL
If the normalization has already been carried out by some other means, the keyword line can be completely omitted. Intensities should be normalized to be a fraction of the diagonal peak intensity (of a single proton) at mixing time 0.
The following lines are optional and, if missing from file.PARM, will NOT be used :RANDMARDI 50
The RANDMARDI procedure is invoked by the keyword RANDMARDI. If a number N (for example 50) follows, the input intensities will be randomly modified n-1 times (the original input intensities without any modification are used as the first set of data) within the limits of input noise level and relative errors, and N sets of distances will be calculated. If the number is missing, N=30 will be assumed. If N=1, the distances will be calculated one time without modifying the input intensities. The keyword RANDMARDI also invokes a different output format from the regular mardigras. If RANDMARDI is missing, the regular mardigras calculation is assumed, which is equivalent to RANDMARDI of N=1, but with the original output format (see Output files).
An estimate of experimental noise level specified by the keyword NOISE is taken as an absolute error in experimental intensities. The relative intensity errors are included in the file.INT.1 file described later. mardigras calculates error bounds for distances using both absolute and relative input intensity errors.Bugs: RANDMARDI will not do more than 999 cycles.
The file.PARM file (default name INP.PARM) contains a list of input parameters for running mardigras. All input and output filenames are specified within this file. See Input parameters for a detailed list.
pdbfile is a modified PDB (Protein Data Bank) coordinate file and is prepared by using corma.in .
A HEADER card should go in line 1 to describe the source of the coordinates. This file must also contain the keyword ISOTROPIC, LOCAL or MODELFREE on line two in the format:REMARK CORRELATION TIME: ISOTROPIC
ISOTROPIC assumes a single effective isotropic correlation time TAUc for all interactions. TAUc for the H-H vectors is defined as:
where TDIFF(i) and TDIFF(j) is "atomic diffusion time" for proton i and j respectively. For ISOTROPIC, all protons should have the same TDIFF.
LOCAL assumes that each interaction can be assigned an effective isotropic correlation time that is a function of the local motion or diffusion time (TDIFF) determined by the average temperature factor for each residue. The program corma.in will read a PDB file that contains temperature factors and calculate an appropriate TDIFF value for each residue.
MODELFREE assumes the model-free approach for molecular motions (Lipari, G. and Szabo, A. J. Am. Chem. Soc. 104, 4546, 4559 (1982)). The appropriate parameters will be generated by corma.in.
The atomic coordinates are entered in PDB format beginning
on line 3. There is no need to eliminate non-hydrogen atoms
from the PDB file, the program can handle stripped or full
coordinate sets. All atoms starting with "H" or a digit
followed by "H" are assumed to be protons.
All methyl protons are labeled in the order:
The occupancy factor, the first field after the coordinates, is used to turn protons "on" or "off" for the case of deuteration or alternate conformations (enter a value of 1.00 or 0.00). To specify fractional occupancy, use a number between 0.0 and 1.0. This may be appropriate, for example, in the case of partial amide exchange for a deuteron.
The atomic diffusion times (TDIFF; in nanoseconds) substitutes the last field which usually contains the crystallographic B factors in a PDB file :
For ISOTROPIC, TDIFF should be the same for all protons, but the program allows the user to to specify different overall correlation times for different parts of the molecule when it is felt to be appropriate. It has been suggested, for example, that base-moiety protons in DNA may have a different effective correlation time than sugar protons.
For LOCAL, the TDIFF value of the first H atom in each residue is used in calculating the correlation times involving all the protons in that residue.
For MODELFREE, the last four columns are tau1, tau2/tau1, A, and taue for each H atom, where tau1 and tau2 are the overall correlation times (they would be equal for a spherically symmetric molecule, but may differ if the molecule is elongated), A is anisotropy parameter, and taue is the internal motion correlation time.
This file contains the experimental intensities in the format:
line # 1 HEADER Whatever you like can go here. 2 REMARK Whatever you like can go here. 3 MIXING TIME: 0.275 (sec.) 4 ATOM1 ATOM2 INTENSITY 5 HB2 11 HG1 32 0.020 6 (etc.)or
line # 1 HEADER Whatever you like can go here. 2 REMARK Whatever you like can go here. 3 MIXING TIME: 0.275 (sec.) 4 ATOM1 ATOM2 INTENSITY ERROR% NORM 5 HB2 11 HG1 32 0.020 10.0 1 6 (etc.)
More than one line is allowed for REMARK, HEADER or MIXING,
but the total (not include ATOM line) must not exceed 6.
Only one line should begin with ATOM, and this line must
immediately precede the data.
The format for atom names is strictly fixed: (a4,i3,x,a4,i3)
The protons are identified by atom names and residue numbers. Atom names are up to 4 characters and residue numbers should be smaller than 999. The atom pairs for an intensity is separated by a space. The intensity column and optional relative intensity error and normalization flag columns are in free format.
The error column is used in RANDMARDI procedure and the calculation of distance error bounds.
The normalization flag column takes integer the 0 or 1. Zero indicates the corresponding intensity is not used for intensity normalization.
Note: the ERROR and NORMALIZATION FLAG columns are optional. If the keyword ERROR or NORM is in the line beginning with atom the program expects data in the column.
Unresolved peaks may be entered in the experimental intensity file, but they must be assigned to a pair or a group of cross-peak intensities. This can be done in two ways:1) For methyl, methylene and symmetry-equivalent aromatic ring protons, the unresolved peaks may be entered simply by referring to the group by a general name. The general names are as follows :
||HD11, HD12, HD13
Note: for methyls in residues other than 'standard aminoacid'
and THY, the nomenclature of methyl protons in pdb
file has to start with "HM" instead of "H". For the two
unresolved geminal protons, for example, HB1 and HB2, the
order of 1 and 2 in the PDB file can NOT be reversed.
Other problems can arise if a methyl is expected by the built-in naming rules and only one or two protons are actually found in the pdb file, e.g. the structure is a high pH structure and side-chain amines are not protonated. This will result in unpredictable results.
2) 'UNRESOLVED' intensities (see man pages for corma) are read but not used by mardigras.
constrain.dat - the name is strictly fixed - defines fixed distances in the spin system. If a pair of protons from the pdbfile or input intensity file matches the atom and residue names in the constrain.dat file, the pair is considered as having a fixed distance. A list is then made for all the fixed distances in the system. The distance values appearing in constrain.dat are not used in the calculation; instead, the corresponding distances calculated from the pdbfile are used. The list of fixed distances plays two roles in the calculation: defining fixed distances for fixed distance normalization, and defining constraints for for mardigras iteration procedure.
constrain.dat contains fixed distances for standard nucleotides and amino acids with conventional atom names. The user must modify the file for non-standard residues or non- conventional nomenclature, following the correct format. For example, there is one fixed distance in CYS (residue name is up to three characters). If we wish to include three possible nomenclatures, the constrain.dat file should have the following lines:
CYS 3 1HB 2HB 1.772 HB1 HB2 1.772 QB QB 1.772The first line indicates that there are three possible fixed distances (or one fixed distance with three possible names) in residue CYS. The following three lines define the atom names for the fixed distances.
pseudo.inp - the name is strictly fixed - is used to define pseudoatoms in non-standard residues. Each pseudoatom is defined by two lines, for example :
PSEUDO QG 3 HG1 3 HG2 3 PSEUDO MG 4 HG1 4 HG2 4 HG3 4The first line must start with the keyword PSEUDO, followed by the pseudoatom name (starting with letter M, Q or R) and the residue number to which it belongs.
Chemical shift file has the same format as PDB file for the first 5 columns (record type, atom numbers, atom names, residue names and residue numbers). The 6th column is the chemical shift in ppm. In the case that chemical shift assignment is incomplete, the unassigned proton resonances by default take the value of carrier frequency.
J-couplings are in Hz. The file has the same format as the input intensity file, i.e., (a4,i3,x,a4,i3) for the atom names and free format for the data column. It is found that only the strong scalar couplings are important in HOHAHA corrections. As an option, a program named jcoup is provided for simulating J coupling constants from a pdbfile or an ensemble of pdbfiles. For more details, see man pages for jcoup.
The program handles two type of chemical exchanges :
Order parameters are also defined for proton pairs. The order parameter file has the same format as the intensity or exchange rate file. A default order parameter is used for proton pairs which do not have an input order parameter. The default value is 1.0, but it can be modified at the beginning of the order parameter file by including the following line before the data (for example):
Distances which are determined previously can be used as input constraints for mardigras calculations, i.e, these distances are fixed in the iterations. The corresponding intensity (if there is one) should be removed from the intensity file, otherwise the input distance is changed to fit the intensity in the iterations. distance file serves different purpose from constrain.dat. It provides additional fixed distances (independent from the model) for mardigras iterations, but does not alter the fixed-distance list for normalization using fixed-distance intensities. distance.file has the same format for the atom names as file.INT.1, with free format for the distance column.
If RANDMARDI is not applied , the program creates the following files:
Prints out all distances calculated for iteration cycle XX. It prints out experimental and calculated intensities for comparison, as well as a comparison of model and mardigras distances. Because it is fairly time-consuming, iterative distance fits for methyl and other pseudoatom distances, which are defined to the geometric center of the protons are only calculated for the final cycle, and reported distances for earlier cycles are simple isotropic-motion-only estimates.
prints out various informational messages. Particularly important are methylene and aromatic ring pseudoatoms that have been recognized in the input intensity file. Very importantly, this file reports when an atom name in the input intensity file is not recognized. This is often indicative of a nomenclature difference between the PDB file and the intensity file or a failure to follow pseudoatom name- formation rules. This file also contains the proton pairs which have been classified as fixed-distance pairs due to connectivity constraints (for a list of possible fixed distances, see the file constrain.dat).
Prints out distance constraints suitable for distance geometry or flatwell-restrained molecular dynamics input. This file contains lower bounds, upper bounds, widths, distances calculated by mardigras, and the input model distances. The upper and lower deviations are not generally symmetric since measured errors are in intensities (not distances) and the transformation is not linear with intensity. prefix.BNDS for different input parameters can be used to determine the final bounds. The program avgbnds is available to average (or find the minima and maxima) of a number of prefix.BNDS files.
Prints out parameters for constraints in molecular dynamics simulations, including the force constant for a parabolic potential pseudoenergy curve for deviations from this distance. The smaller the estimated error in distance, the larger the value of this constant. The constant is currently scaled to yield an energy of 1/2kT when the displacement is equal to the estimated distance error. Preliminary results suggest that this is probably too high. This file is compatible with very old AMBER format, which is not used anymore at UCSF.
Prints out any errors which may have occurred during the run. Quite importantly, mardigras prints out a list of any proton pairs which have observed intensities, but to which mardigras was unable to assign a new relaxation rate (and hence a new distance). This error sometimes occurs when strong spin diffusion is present and the weak intensity is underdetermined by experimental data, or the model is poor over such a region -- the resulting inconsistencies may cause physically meaningless rates to appear upon backtransformation from intensities. Random variation of the intensities within the limits of experimental errors (RANDMARDI) may compensate the weak intensities and allows the distances to be calculated. If the error is due to a poor model, after the initial efforts at structure refinement, an improved starting model may be available which can eliminate many intensities from this errors list.
Print out rejected distances. Rejected distances are those for which the relaxation rate is very small - corresponding to distances greater than 5 A - and thus the distances have high uncertainty.
The following files are generated if RANDMARDI is applied :
Is similar to prefix.BNDS, but the distances are the average of N calculations using randomly modified intensity data, and the distance bounds are the average distance plus and minus the standard deviation (STD) of the distances. The width is two times the STD. The minimum and maximum values of each distance and the number of times the distance is calculated are also included in prefix.bnds. The program avgbnds takes either the SDT bounds or the minimal and maximal distances calculated for different input parameters (or intensities from spectra of different mixing times) to determine the final distance constraints suitable for distance geometry or restrained molecular dynamics input.
Contain calculated distances, calculated relaxation rates and errors added to experimental intensities for each of the N RANDMARDI calculations, respectively. The values for the first 10 calculations are in the files with xx = 01, the next 10 are in files with xx = 02. For example the first 10 distances calculated are in prefix.dst01, the next 10 in prefix.dst02, and so on. xx increments 1 for every 10 calculations. The number N specified in file.PARM for RANDMARDI does not need to be a multiple of 10; the residual will make the last file. These files are printed by mardigras only if specifically requested by user (keywords PRINT DISTANCES, PRINT RATES and PRINT ERRORS in file.PARM).
is the same as described above for regular mardigras, but contrains information for N calculations, where N is the number of random variations.
Is the same as described above, but for N calculations.
If the input intensities are ROESY, the following file is generated:prefix.ROEFCTR
Is output for ROESY simulation. It contains information about the chemical shift dependence and HOHAHA corrections of the REOSY intensities. A brief explanation can be found in the header of this output file. The purpose of this file is to give the user a feeling about how much correction to the intensities has been made for the chemical shift dependence and HOHAHA effect, and whether the correction for the direct HOHAHA effect is in the valid range.
This program was developed at UCSF on Sun SPARC stations with SunOS Release 4.1.3 UNIX operating system. It has not been fully tested on any other system and may therefore experience machine- or implementation-dependent problems.
Address technical questions and problems to:
Department of Pharmaceutical Chemistry
University of California
San Francisco, CA 94143-0446
Tel.: (415) 476-4378
Address other questions to:
Prof. Thomas L. James
Department of Pharmaceutical Chemistry
University of California
San Francisco, CA 94143-0446
Tel.: (415) 476-1569