corma (COmplete Relaxation Matrix Analysis)
DESCRIPTION
corma (COmplete Relaxation Matrix Analysis)
is a FORTRAN program for calculating the dipole-dipole relaxation matrix
for a system of protons and converting that to intensities expected for
a 2DNOE experiment.
Programs for structure elucidation by NMR
The mardigras and corma programs are two separate programs
for structure elucidation by nmr. newhyd prepares a pdb file for
input to corma.in. If the user has the coordinates in the PDB format
for heavy atoms but not protons, it supplies the missing protons and their
coordinates; it is a stand alone program. corma.in prepares the
output pdb file from newhyd for input to mardigras or corma.
mardigras uses the output pdb file from corma.in for the
iterative relaxation matrix calculation.
New features
The programs mardigras, and corma have been modified to
incorporate (i) an improved computation of distances when averaging due
to motion or overlap of methyl groups, methylene proton pairs or symmetry-equivalent
aromatic ring protons exist, (ii) internal motions as modelled by a `model-free'
approach, and (iii) chemical exchange described by a kinetic matrix of
exchange rates. The effect of the exchange with solvent protons and the
exchange between the two amino protons are included in the complete relaxation
matrix calculation of NOE intensities. If the two amino protons are unsolvable,
the intensity is calculated based on r-6 averaging of the two distances.
Pseudoatom names to be used in the input and output intensity files have been changed.
The corma has also been modified to do an ensemble R factor calculation. When multiple number of PDB files (maximum 5000) of the same molecule are provided, corma calculates the NOE intensities based on the ensemble averaged relaxation rates over the number of structures.
The corma can also be used for chemical fast exchange for bound and free drug-DNA complex.
corma.in has also been modified to include internal motion parameters. and a specification for fractional occupancy of a position by a proton.
Calculation of distances for methyl,
methylene or symmetryequivalent aromatic
protons.
One of the most important additions to the corma/mardigras
algorithm lies in the calculation of distances involving methyl groups.
Methyl rotation can be modeled in one of four ways by corma/mardigras
(see METHYL JUMP below under INPUT). For all models except model
0, distances are calculated iteratively to a pseudoatom at the geometric
center of the three methyl protons. Geometric parameters in models 1,
2 and 3 are taken from the input model coordinates and are assumed to
be correct. One should therefore have some degree of confidence in the
relative orientations of methyl groups in the input model before using
one of these more accurate methods of calculating distances to methyl
pseudoatoms. The iterative fitting procedure, then, is simply a one-parameter
(viz. distance) fit. (One could conceivably perform a multi-parameter
fit over methyl orientations as well, but this would not yield a unique
solution.) The advantage of this technique is that, when using corma/mardigras
for structure evaluation/refinement, methyl distances can be calculated
very accurately, and no additional uncertainty need be added to calculated
distance bounds. For model 0, inter-residue distances to methyls (and
other pseudoatoms) are calculated for the two extreme geometric cases.
These extremes define the upper and lower bounds on possible distances.
This will prevent unwarranted restraint on distances when the model-structure
geometry is unreliable.
A similar procedure is followed for distances to pseudoatoms which represent either (1) a methylene proton pair or (2) symmetry-equivalent aromatic ring protons. These protons are chemically equivalent, or sufficiently close such that their cross-peaks are not resolvable. In both cases we assume that motion is occurring which is slow relative to the overall tumbling time (and thus does not contribute to relaxation) but fast relative to the relaxation times. Effective pseudoatom distances are obtained by r-6 averaging over the two actual distances. This approach is rigorous for aromatic ring flip but not for methylene pairs. The equivalence (often approximate) of methylene chemical shifts is not necessarily a result of their mobility, although the assumption of motion may be better for protein side-chains and other freely-rotatable methylenes. In some cases, namely for constrained methylenes in DNA and Proline, it may be a better approximation to treat them as static atoms. Note that this r-6 averaging reduces to the same interaction as the case for static methylene protons in the limit of isolated spins. All of the error is introduced in the term for transfer of magnetization between the two methylene protons. In this model, magnetization is transferred perfectly, since the two atoms are collapsed into a single pseudoatom. The result of this treatment will be an error in the calculation involving local spin-diffusion effects. The error is only in a single spin-pair, and the results of preliminary tests indicate that the the overall error introduced is minor. Consider a methylene pair near two other protons A and B. The error introduced is at its maximum when the four protons are in line and at the closest possible distance. The error is zero when the intra-methylene axis is perpendicular to the A-B axis.
Nevertheless, the pseudoatom approach is the simplest reasonable model, since we have only a single experimental cross-peak and are justified in deriving only one distance. If the model geometry is correct, then we can extract the proper distances from the effective distance (using an iterative fitting procedure similar to that for methyls). If the model geometry is poor, we can at least calculate error bounds on that distance assuming extreme-case geometries. This is what is done. If methyl jump model 0 is chosen, the reported distance is the effective average distance, with upper and lower bounds adjusted accordingly. Models 1, 2 and 3 assume that the model geometry should be taken to be correct.
Model-free approach to internal motion
When the details of internal motion are not known, the model-free approach
is useful. This model is now one of the options for the calculation of
the relaxation matrix. The spectral density Jmodfree for the model-free
model is defined in terms of the frequency C, anisotropy parameter A,
order parameter S2, overall correlation times I1 and I2, and
the internal correlation time Ie as
Jmodfree(C,A,S2,I1,I2,Ie)
=AS2J(I1,C)+(1-A)S2J(I2,C)+A(1-S2)J(Ip,C)+(1-A)(1-S2)J(Ipp,C)
where
J(In,C)=In/(1+(InC)2)
Ip=IeI1/(Ie+I1)
and
Ipp=IeI2/(Ie+I2)
In the program the ratio I2/I1 is used. For I2/1=1, or A=1, isotropic overall motion is obtained. If the order parameter S2 is set to 1, the internal motion is ignored. If both A and S2 are set to one, the result is isotropic overall motion with no internal motion. The `model-free' approach is same as the well-known `wobble-in-a-cone' model. (Ref.: Lipari, G. and Szabo, A. J. Am. Chem. Soc. , 104, 4546, 4559 (1982))
To use model free approach, the input PDB file has to be prepared using corma.in with model free option. The modified PDB file contains the key word and parameters that are required by model free approach. The order parameters are not included in the PDB file because they are defined for proton pairs, not for individual protons. A seperate input file with the same format as the intensity file is required for the order parameters. It is not necessary to have a complete list of order parameters in the file. The default value is 1.0 for all proton pairs. Only those order parameters which are different from 1.0 need to be included in the order parameter file.
The order parameter may be assessed via molecular dynamics calculations (Ref.: Koning, T.M.G. et al., Biochemistry, 30, 3787-3797 (1991)) or via experimental determination.(Ref.: Lane, A.N., J. Magn. Reson., 78, 425-439 (1988)) Of course, if the order parameter is less than one but identical for all proton resonances, then the relative cross-peak intensities will not be altered.
Quality factors
We added quality factors Q, Q2, Qx and Q2x
to mardigras/corma output. The quality Q factor is
described in J. M. Withka, J. Srinivasan and P. H. Bolton, J. Magn.
Reson. 98, 611-617 (1992). The sixth root quality factors,
Qx and Q2x, are obvious extrapolations from sixth
root R factors.
Chemical exchange described by kinetic matrix The exchange with solvent protons and the exchange between the two amino protons are taken into account in the calculation of NOE intensity matrix. NOE intensities of exchangeable protons are weakened by the exchange with solvent protons. Neglecting of such effect produces the upper bounds of the distances. While consideration of the upper limit of the exchange rates gives the lower bounds of the distances. When the two amino protons are unsolvable, a mean distance is calculated iteratively from the relaxation rate based on r-6 averaging.
The relaxation matrix is modified by adding exchange matrices. The exchange with solvent contributes to direct relaxation, and is described by a diagonal matrix K. The element Kii equals exchange rates for exchangeable protons, and zero for nonexchangeable protons. A second exchange matrix K' is formed for proton pairs which exchange the positions. If two protons, i and j, are exchanging positions with each other with a rate k, then K'ij = K'ji = -k, and K'ii = K'jj = k. Zero is assigned to K' elements if proton pair exchange is not involved. (See, J. Jeener, B. H. Meier, P. Bachmann and R. R. Ernst, J. Chem. Phys. 71, 4546 (1979).) The modified relaxation matrix is then R' = R+K+K'.
Note that if the option ARBITRARY RATES? is specified as y then the modified relaxation matrix is R' = R+K and the exchange matrix K is read as is from the exchange file and is not modified in any manner.
Peak Integration Errors
The percentage errors in the intensities due to peak integrations may
be included as an additional column in the intensity file. The format
is free, but a key word ERROR% is required at the top of the column, see
example1.INT.1. Note, peak integration errors in the intensity file are
not used in corma. They are used in mardigras, together
with the signal to noise level given in the PARM file, to determine the
upper and lower distance bounds (in output file prefix.BNDS).
INPUT
corma is an interactive program and is invoked as:
unix:>corma
And the computer responds as:
CORMA 4.0 UNIX VERSION CO(mplete) R(elaxation) M(atrix) A(nalysis)
SPECTROMETER FREQUENCY IN MHz: 500.0
INCLUDE KINETIC EXCHANGE? [def=n]: y
ENTER the name of kinetic rate file: example2.exch
ARBITRARY RATES? [def=n]: n
INCLUDE FAST EXCHANGE? [def=n]: y
ENSEMBLE (MULTIPLE FAST EXCHANGE)? [def=n]: y ENTER the name of pdb filenames list file: example2.list
COMPARE WITH EXPERIMENTAL INTENSITIES? [def=n]: y
ENTER NAME OF EXPERIMENTAL INTENSITY FILE: example2.INT.1 NORMALIZE USING ONLY FIXED-DISTANCE INTENSITIES [def=f], OR USE ALL Iobs [a]? f
NAME FOR INTENSITY FILE TO BE CREATED: test1 WRITING TO FILES: test1.CRM.1 test1.INT.1
CUTOFF LEVEL FOR INTENSITIES? (ENTER 1, 2, 3, 4, or 5; 1=0.1, 2=0.01, 3=0.001, ETC.) [def=3]: 3
DISPLAY INTENSITIES IN EXTENDED PRECISION? [def=y]: y
ADD RANDOM NOISE? [def=n]: n
SELECT METHYL JUMP MODEL:
CALCULATING INTENSITIES
WRITING INTENSITIES
SUM Iobs= 3.149E+00 SUM Icalc= 3.149E+00 Scale factor= 1.000E+00
CORMA 4.0 UNIX VERSION CO(mplete) R(elaxation) M(atrix) A(nalysis)
SPECTROMETER FREQUENCY IN MHz: 500.0
INCLUDE KINETIC EXCHANGE? [def=n]: n
INCLUDE FAST EXCHANGE? [def=n]: y
ENSEMBLE (MULTIPLE FAST EXCHANGE)? [def=n]: n
BOUND POPULATION FRACTION [0.0 to 1.0]: .3 ENTER PDB FILE-NAME (bound):
example2.pdb COORDINATE FILE example2.PDB NOT FOUND.
TRYING example2.pdb AS COORDINATE FILE.
ENTER PDB FILE-NAME (free): example2.pdb COORDINATE FILE example2.PDB
NOT FOUND.
TRYING example2.pdb AS COORDINATE FILE.
COMPARE WITH EXPERIMENTAL INTENSITIES? [def=n]: y
ENTER NAME OF EXPERIMENTAL INTENSITY FILE: example2.INT.1 NORMALIZE USING ONLY FIXED-DISTANCE INTENSITIES [def=f], OR USE ALL Iobs [a]? f
NAME FOR INTENSITY FILE TO BE CREATED: test2 WRITING TO FILES: test2.CRM.1 test2.INT.1
CUTOFF LEVEL FOR INTENSITIES?
(ENTER 1, 2, 3, 4, or 5; 1=0.1, 2=0.01, 3=0.001, ETC.) [def=3]: 3
DISPLAY INTENSITIES IN EXTENDED PRECISION? [def=y]: y
ADD RANDOM NOISE? [def=n]: y
AMPLITUDE OF NOISE (IN MULTIPLES OF PRECISION: 1,2,3,4,....,10): 2
SELECT METHYL JUMP MODEL:
INTENSITIES ARE PLOTTED AS HALF-TONE SQUARES. ALSO PLOT INTENSITIES WITH NUMERICAL RANK? [def=n]: y
PLOTS WILL REQUIRE 4 PAGES. USE CONDENSED MODE (100X100 PER PAGE)? [def=n]: n
Scanning intensity file example2.INT.1 for dummy atoms
Scanning intensity file example2.INT.1 for dummy atoms
CALCULATING INTENSITIES
WRITING INTENSITIES
SUM Iobs= 3.149E+00 SUM Icalc= 3.160E+00 Scale factor= 1.003E+00
WRITING RANK FILE test2.RNK.1
As shown above the program prompts for the required information. Follwing is additional information:
The exchange rates are in [1/sec].
Description of the file may be given at the beginning of the file starting with HEADER or REMARK.
prefix.PLT and prefix.RNK are files containing POSTSCRIPT instructions that can be sent to the laserwriter.
If the files already exist the user is asked if they can be overwritten. If not the prefixes are incremented until a unique name is found.
(1) prefix.CRM which displays the calculated intensities by residue submatrices for all submatrices which contain any elements larger than the level of precision defined above. In prefix.CRM the display is limited to three significant figures. (Berkeley 4.3BSD UNIX f77 compiler: calculated intensities below the PRECISION level defined during input will be displayed as " 0. " in these submatrices for clarity.)
(2) prefix.INT.n contains the intensities (and distances
and relaxation rate) in columnar format.
In the case where unresolved intensities were added to the end of the
input intensity file, corma will print out the sum of the peaks
contributing to the unresolved peak, followed by the individual cross-peak
elements:
(4) prefix.RNK contains the POSTSCRIPT file for plotting ranks. (Optional output)
INPUT FILE FORMATS
The following formatted data files are necessary for running corma.
file.pdb is a modified PDB (Protein Data Bank) coordinate
file and is prepared by using corma.in. (See corma.in man
pages.) A HEADER card can go in line 1 to describe the source of the coordinates.
This file must also contain the keyword ISOTROPIC, LOCAL or MODELFREE
on line two (2) in the format:
REMARK CORRELATION TIME: ISOTROPIC
ISOTROPIC assumes a single effective isotropic correlation time for all interactions. (NOTE if the ratio for methyl TAUc is not 1, then the correlation times involving methyl protons will still be reduced.)
LOCAL assumes that each interaction can be assigned an effective isotropic correlation time that is a function of the local motion or diffusion time (TDIFF) of each of the protons (see below for TDIFF).
MODELFREE assumes the modelfree approach. (See the section on the modelfree approach and mardigras and corma.in man pages.)
The atomic coordinates are entered in PDB format beginning on line 3. There is no need to eliminate non-hydrogen atoms from the PDB file, the program can handle stripped or full coordinate sets.
NOTE: ALL PROTONS MUST BEGIN WITH THE LETTER "H". So, a set of coordinates
obtained from the Protein Data Bank which contains proton coordinates
MUST BE EDITED to ensure that the PROTONS ARE APPROPRIATELY LABELED.
See notes on the program newhyd.
All METHYL PROTONS are labeled in sequence Hx1, Hx2, Hx3, or Hxy1, Hxy2, Hxy3. IF MORE THAN ONE METHYL GROUP APPEARS IN A RESIDUE, THEIR NAMES MUST BE DISTINCT.
corma maintains a database of the amino and nucleic acids and recognizes methyl protons from these as long as they follow the name of the atom to which they are bonded.
The OCCUPANCY factor, the first field after the coordinates, is used to turn protons "on" or "off" for the case of deuteration or alternate conformations (enter a value of 1.00 or 0.00). To specify fractional occupancy, use a number between 0.0 and 1.0. This may be appropriate, for example, in the case of partial amide exchange for a deuteron.
The file is also modified by the substitution of "atomic diffusion times" (TDIFF; in nanoseconds) in the last field which usually contains the crystallographic B factors in a
PDB file. These values are used in the calculation of the correlation times TAUc for the H-H vectors:
1/TAUc=1/TDIFF(i)+1/TDIFF(j)
For the ISOTROPIC calculation only one TDIFF value need be entered (for the first H atom enter a value that is twice the correlation time). This value must be repeated for each H atom in the file. This allows one to specify different overall correlation times for different parts of the molecule it is felt to be appropriate. It has been suggested, for example, that base-moiety protons in DNA have a different effective correlation time than sugar protons. (See the model-free approach also)
For LOCAL, the first H atom in each residue should have a TDIFF value that will be used in calculating the correlation times involving all the protons in that residue.
The program corma.in will read a PDB file that contains temperature factors and calculate an appropriate TDIFF value for each of the protons. See the man page on corma.in.
NOTE: TDIFF values are in nanoseconds.
For MODELFREE, for each H atom the last six items are B, tau1, tau2/tau1, A, Ssq, taue. See the section on the modelfree approach for definitions. (See mardigras and corma.in man pages also.)
The following are examples for the input pdb file as prepared by corma.in:
line #
1 HEADER Whatever you like can go here.
2 REMARK Whatever you like can
go here. 3 MIXING TIME: 0.275
(Tm in seconds, f5.3) 4 ATOM1
ATOM2
5 HB2 11 HG1 32 0.020
(atomiii atomjjj intensity; iii, and
jjj are residue numbers in format:
(a4,i3,x,a4,i3,x,f10.0)
4 (etc.)
The format of the output .INT file is different than the format of the input file. This is recognized, and in fact corma can read an input file with the format described above, or a .INT file that was generated by corma (the program reads the second line and looks for the string `DISTAN' to decide how the file will be read).
NOTE: The format of the .INT file has changed with version 2.0
onwards to allow more than 99 residues. The program senses this by noting
the position of the second occurrence of the string `H'. Old .INT files created for earlier versions of corma
should be read correctly.
Unresolved peaks may be entered in the experimental intensity
file, but they must be assigned to a pair or a group of crosspeak intensities.
This can be done in two ways: 1) For methyl, methylene and symmetry-equivalent
aromatic ring protons, the unresolved peaks may be entered simply by referring
to the group by a general name. The general names are as follows (Note
that the naming conventions have been
changed from earlier versions of corma):
For methyls, use the character "M": HD11, HD12, HD13 of LEU ----> MD1 of LEU For methylenes, use the character "Q": HB1, HB2 of LYS ----> QB of LYS
For amino proton pairs, also use the character "Q": HN21, HN22 of GUA ----> QN2 of GUA
For ring protons, use the character "R": HD1, HD2 of PHE ----> RD of PHE
Note, for methyls in residues other than `standard amino-acid' and THY, the nomenclature of methyl protons in .pdb file has to start with "HM" instead of "H". For the two unresolved geminal protons, for example, HB1 and HB2, the order of 1,2 can not be reversed.
For other unresolved peaks, the line `UNRESOLVED' must follow all of the resolvable intensities (including unresolved methyl and methylene groups). After this line, unresolved peaks are entered by listing the cross-peaks that contribute to the unresolved peak, and the intensity of that peak.
H5' 10 M 10 0.006 H6 10 H6 10 0.271 H6 10 M 10 0.060 M 10 M 10 2.299 UNRESOLVED H1' 2 H2" 2 H1' 2 H2' 2 0.124 H2" 2 M 2 H2" 2 H1' 4 H2" 2 H2" 4 H2" 2 H2' 4 H2" 2 H3' 4 H2" 2 H4' 4 H2" 2 H5" 4 H2" 2 H5' 4 0.115 --a7--x--a7---3x--a7--x--a7---3x--a7--x--a7---3x---f10---
Up to 99 cross-peaks may be assigned to a given unresolved peak (this number may be changed with the parameter mxpu in PARMS.INC). If there are more than three cross-peaks, additional ones may be entered on subsequent lines by leaving the intensity field blank (i.e. the blank is interpreted as a continuation sign).
Other problems can arise if a methyl is expected by the built-in naming rules and only one or two protons are actually found in the pdb file, e.g. the structure is a high pH structure and side-chain amines are not protonated. This will result in unpredictable results: floating point errors, or just erroneous results that are hard to detect. If an input intensity (.INT) file is provided, you may find that perfectly normal proton labels are being flagged as unknown.
ACKNOWLEDGEMENTS
corma is distantly related to a program (NOEFT100) written by Greg
Young (now of Wright State University) and still contains relics of the
original code from that program.
BUGS
There probably are some. This program was developed at UCSF on Sun SPARC
stations with SunOS Release 4.1.3 UNIX operating system. It has not been
fully tested on any other system and may therefore experience machine-
or implementationdependent problems.
AUTHORS:
Brandan A. Borgias
Paul D. Thomas
He Liu
Anil Kumar
REFERENCES
The general calculations are described in: J. W. Keepers and T. L. James,
J. Magn. Reson. 57 404-426 (1984); B.A. Borgias
and T.L. James, J. Magn. Reson. 79 493-512
(1988); The mardigras algorithm is described in B. A. Borgias
and T. L. James, J. Magn. Reson. 87 (1990)
475-487 and B. A. Borgias and T. L. James, Meth. Enzymol.
176 (1989) with features described in B.A. Borgias, M. Gochin,
D.J. Kerwood, and T.L. James, In Progress in Nuclear
Magnetic Resonance Spectroscopy, J.W. Emsley,
J. Feeney, and L.H. Sutcliffe (Eds), Oxford: Pergamon Press, 22,
83-100 (1990), T. L. James, Curr. Opin. Struct. Biol.,
1, 1042-1053, (1991). The R factor and the sixth root R factor
are considered in P.D. Thomas, V.J. Basus and T.L. James. Proc.
Nat. Acad. Sci. USA, 88, 1237-1241
(1991). The description for averaging of pseudoproton relaxation rates
and distances can be found in H. Liu, P. D. Thomas and T. L. James, J.
Magn. Reson. 98, 163-175 (1992). Effect of internal
motion on the interproton distances is considered in A. Kumar, T. L. James,
and G. C. Levy, Israel J. Chem. 32, 257-261
(1992). Application to the chemical exchange problems is described in
H. Liu, A. Kumar, K. Weisz, U. Schmitz, K. D. Bishop, and T. L. James,
J. Am. Chem. Soc., 115, 1590 (1993).
An application of the ensemble averaging is given in U. Scmitz, A. Kumar,
and T. L. James, J. Am. Chem. Soc., 114,
10654 (1992).
email: james@picasso.uscf.edu