Revised November 21, 2006
miniCarlo performs conformational calculations (energy minimization and
Metropolis Monte Carlo simulations) on nucleic acids. Similar to conventional
Molecular Dynamics (MD) simulation packages (such as AMBER), it uses Cartesian
coordinates of individual atoms to calculate the conformational energy of a molecule.
miniCarlo uses the Zhurkin-Poltev-Florentiev force field
[1], which is currently hard-coded in the program.
To generate the Cartesian coordinates, miniCarlo uses a specialized set of independent
internal coordinates, which
include helical parameters.
This approach
allowes one to reduce drastically the number of degrees of freedom in a molecule by
treating aromatic bases as rigid bodies, and using idealized values for bond lengths
and most bond angles. Consequently, to start the miniCarlo calculations, a
a set of helical parameters is required as input,
and not a set of
xyz coordinates. A set of helical parameters is also output as result of calculations
(although, a pdb file can be also output for the use with other programs). Sample
input files of helical parameters for some standard structures will be provided
with example files. Also, helical parameters
can be calculated from a pdb file using an auxiliary program
fitparam.
However, because miniCarlo requires idealized geometries for aromatic bases,
bond lengths and bond angles, the nucleic acid structure from a pdb file may require
a "regularization" for the
use with miniCarlo. As practice shows, this is not always trivial.
Because of
a specialized nature of internal coordinates used, miniCarlo can presently work
with nucleic acids only. An arbitrary number of strands and any combination of deoxy
and ribo residues is supported; allowed nucleic acid bases are: adenine, cytosine,
guanine, thymine, and uracil.
The flow of calculations by miniCarlo is controlled by a user-specified
protocol. The protocol ("way-file") consists of a sequence of
commands, which invoke
simulation steps (i/o, energy minimization, Monte Carlo), control the flow of the
protocol, or modify various parameters (both parameters of simulations, and
independent parameters determining the nucleic acid structure).
The command language allows loops (which
can be nested) and calls of files with sub-protocols. At the start of miniCarlo
the sequence of commands is compiled to minimize communications with disk.
The core of miniCarlo is the backbone closure algorithm by Zhurkin and co-authors,
which was used in the early version of the program working with regular DNA duplexes
[2]. The current version of miniCarlo still
uses portions of the code of the original program, including the backbone closure
routine, matrix mathematics, etc. The program adopted its present shape in
1988 [3]. Since then the program underwent
numerous modifications and it exists in several divergent versions [Refs].
The current version of miniCarlo is being developed in the
Tom James' NMR Lab at UCSF and it includes
NMR-related options: distance restraints [4],
and multiple-copy refinement with floating probabilities [5]
against NOE-derived proton-proton dipolar relaxation rates. During the multiple-copy
refinement, floating probabilities are calculated using the pdqpro algorithm
[6], and theoretical relaxation rates are calculated
using routines of the RELAX program [7].
The proper description of miniCarlo has never been published (which we will
do soon), but it has been extensively used in several labs [Refs].
The program is written mostly in FORTRAN (with the exception of pdqpro and
RELAX routines) without dynamic memory allocation. This alpha-version of
miniCarlo is compiled allowing for a maximum of 100 residues,
50 base pairs, 2000 distance restraints, and 10 copies of a molecule. Please
contact Nick Ulyanov
if you need to recompile the program with different dimensions.
The following are the independent variables used by miniCarlo to determine the Cartesian coordinates of all atoms in a nucleic acid molecule:
Each residue of a nucleic acid must be assigned to a particular base pair, which is
specified by user in the sequence file.
All base pairs and steps between consecutive base pairs have certain
independent variables associated with them. Each variable (associated either with
pair or step) has a unique number identifying this variable.
Command language allows selection of any
individual variable (or a set of variables). Such a selection is necessary for
performing various operations on a nucleic acid structure (change of a particular
helical parameter, energy minimization using a subset of degrees of freedom, etc.).
A variable is selected by specifying if it is associated with a pair or
step, specifying the number of a particular pair or step, and
specifying the unique number identifying this variable. The definitions of
these unique numbers follow below. | ID numbers of PAIR parameters | ||||||
| rotations | translations | |||||
| name | Propeller | Buckle | Opening | Shear | Stretch | Stagger |
| symbol |
|
|
|
Sx | Sy | Sz |
| axis | y | x | z | x | y | z |
| ID number | 1 | 2 | 3 | 4 | 5 | 6 |
STEP Helical parameters.
Also, there are three rotations and three translations defining the
relative position of two consecutive pairs; they are associated with
steps. The table
below shows the ID numbers of step parameters; the ID numbers are used for the
selection of these parameters with the
STEP command (described in the
"miniCarlo command language" section).
| ID numbers of STEP parameters | ||||||
| rotations | translations | |||||
| name | Twist | Tilt | Roll | Shift | Slide | Rise |
| symbol |
|
|
|
Dx | Dy | Dz |
| axis | z | x | y | x | y | z |
| ID number | 1 | 2 | 3 | 4 | 5 | 6 |
Note that the definitions of helical parameters internally used in
miniCarlo do not conform with the guidelines of the Cambridge
convention [8] (e.g., because
of a different choice of frames of reference, see Figure 1). Consequently,
it is not advised to publish these parameters -- they are for the internal
use in miniCarlo only. The easiest way to obtain a set of
helical parameters conforming to the guidelines of the Cambridge convention
is to output the structure in pdb format, and use one of the available
nucleic acid analysis programs [9].
It is our intention to change the internal definitions of helical parameters
in the next release of miniCarlo, so that the same set of parameters
will be used for the definition of structure and for its description.
Residue (NUCL) parameters.
The rest of parameters define the sugar conformation, orientation of the
hydroxyl group for the ribose, orientation of the methyl group for T and m5C,
and orientation
(
,
,
,
Sx,Sy,Sz) of the 3rd or 4th base in a triple or a quadruple.
Now (since winter
2004) they are all referenced with the NUCL keyword.
| ID numbers of the Residue (NUCL) parameters | |||
| Name | ID number | Note | |
,
,
,
Sx,Sy,Sz |
1-6 | For the 3rd or 4th base in a triple or a quadruple only |
|
Glycosidic angle
|
8 | ||
| One-parameter sugar | |||
| Sugar pseudorotation P | 7 | SUG4 = 1 | |
| Four-parameter sugar, endocyclic parameters | |||
3 |
10 | SUG4 = 4 or 10 | |
4 |
11 | SUG4 = 4 or 10 | |
4 |
12 | SUG4 = 4 or 10 | |
5 |
13 | SUG4 = 4 or 10 | |
| Ten-parameter sugar, exocyclic parameters | |||
C1' |
14 | SUG4 = 10 | |
C1' |
15 | SUG4 = 10 | |
C2' |
16 | not used currently | |
C2' |
17 | not used currently | |
C3' |
18 | SUG4 = 10 | |
C3' |
19 | SUG4 = 10 | |
C4' |
20 | SUG4 = 10 | |
C4' |
21 | SUG4 = 10 | |
| 2' hydroxyl group | 22 | riboses only | |
| methyl group | 23 | T and m5C only | |
The glycosidic angles are internally defined as C2'-C1'-N9-C4 for purines
and C2'-C1'-N1-C2 for pyrimidines; this will also change in the next release
of miniCarlo.
For the internal geometry of sugars, one of two models must be selected:
one-parameter (SUG4 = 1, default)
or four-parameter (SUG4=4). In addition, for the four-parameter sugar model
it is allowed to change exocyclic bond angles (with SUG4=10).
Below is an example of a protocol
selecting all independent variables of a DNA molecule consisting of two base
pairs which does not have any thymines, assuming that the one-parameter
sugar model is used:
PAIR 1, 6, 1,2,3,4,5,6 PAIR 2, 6, 1,2,3,4,5,6 NUCL 1, 2, 7,8 NUCL 2, 2, 7,8 NUCL 3, 2, 7,8 NUCL 4, 2, 7,8 STEP 1, 6, 1,2,3,4,5,6
Note that in the case of PAIR and STEP parameters, the base pair number is referenced, but in the case of the NUCL parameters, the nucleotide number is referenced.
If the molecule had thymines or RNA residues, the torsion angles determining
the orientation of methyl groups in T's and hydroxyl groups in riboses
should have been also selected.
The sequence of nucleotides and their assignments to base pairs must be
specified in a "sequence file", which is required for
starting miniCarlo.
First line of the sequence file is an integer specifying the number of nucleotides
in a molecule.
Third line is a string specifying the sequence in one-character format.
Allowed characters are A, C, G, T, U; spaces are not allowed.
Second line is a string specifying ribo and deoxy
residues. Allowed characters are r (ribo), d (deoxy),
and space (deoxy). Empty string is interpreted as all deoxy residues.
Fourth line is a string specifying which residues are 5'- or 3'-ends of
a strand. Allowed characters are 5, 3 and space (neither 5'- nor 3'-end).
If a strand consists of a single residue, this residue must be specified
as 5'- rather than 3'-end. The molecule must have at least one 5'- and one
3'-end (circular molecules are not supported currently).
Fifth and other lines specify consecutive base pairs, starting with
base pair 1. Each line has two integers (two residue numbers). The bases in a
pair must not be complementary, and pair may consist of one base only. If a base
pair consists of one base only, the other still must be listed as zero.
Example 1. A DNA:RNA hybrid d(AAA):r(UUU).
Sequence file "aaa.seq"
6 rrr AAAUUU 5 35 3 1,6 2,5 3,4
This molecule has six residues that are assigned to three base pairs in the sequence file. Each pair has six degrees of freedom (Propeller, Buckle, Opening, Shear, Stretch, and Stagger); these parameters must be specified for each pair in the input file with helical parameters. Also, each of two steps in this molecule has six degrees of freedom (Twist, Tilt, Roll, Shift, Slide, and Rise); they too must be specified in the same input file.
Example 2. An RNA hairpin loop r(GGUUUCC).
Sequence file
7 rrrrrrr GGUUUCC 5 3 1,7 2,6 3,0 4,0 5,0
Example 3. DNA triplex d(AA):d(TT):d(TT).
Sequence file
6 AATTTT 535353 1,4, 5 2,3, 6
To run miniCarlo, you need to place the executable miniCarlo in a
directory included in the path. Also, you need to have the file
"dupinp.txt" in the working directory. ("dupinp.txt" contains information
about standard geometries, force field, partial charges, etc. Do not
edit this file!)
Running miniCarlo requires two parameters, and also several options
may be specified:
miniCarlo [-cdfhimoOp] -s <sequence file>
-w <protocol file>
Parameters in square brackets are optional; parameters "-s" and "-w" are required.
Instead of <sequence file> and <protocol file> actual
sequence file and
protocol file must be specified.
Options:
| -c <file> | compile all sub-protocols into a single |
| -d | checks the syntax of the protocol file (debug) |
| -f <pdb file> | to input pdb file and bypass calculation of coordinates |
| -h | prints short help |
| -i <input file> <record> | reads internal coordinates from <input file> using record # <record> |
| -m <number> | sets the number of multiple copies to <number> (default 1). |
| -O | allows overwriting output files with internal coordinates |
| -o <output file> | sets file name for output of internal coordinates |
| -p | skips protocol file and generates pdb files from input internal coordinates |
| -v | verbose; prints more messages onto stdio |
Internal coordinates can be also input from the protocol file. The output
file for internal coordinates can be also set in the protocol file.
(Specifying both input and output files in the command line has the
advantage of possibility to use the same protocol file with different
input files.)
The rest of options
(including setting of the number of multiple copies) can be used only in the
command line. miniCarlo also produces a fair amount of stdio output.
This should be redirected into a file if you want to run the program
in background.