Revised February 23, 2004

miniCarlo draft manual

by Nick Ulyanov

(under construction)

Contents

  • Independent variables
  • Sequence, base pairs and steps
  • Starting miniCarlo
  • Input files
  • Command language
  • Examples of protocols
  • References

    Overview

    miniCarlo performs conformational calculations (energy minimization and Metropolis Monte Carlo simulations) on nucleic acids. Similar to conventional Molecular Dynamics (MD) simulation packages (such as AMBER), it uses Cartesian coordinates of individual atoms to calculate the conformational energy of a molecule. miniCarlo uses the Zhurkin-Poltev-Florentiev force field [1], which is currently hard-coded in the program.

    To generate the Cartesian coordinates, miniCarlo uses a specialized set of independent internal coordinates, which include helical parameters. This approach allowes one to reduce drastically the number of degrees of freedom in a molecule by treating aromatic bases as rigid bodies, and using idealized values for bond lengths and most bond angles. Consequently, to start the miniCarlo calculations, a a set of helical parameters is required as input, and not a set of xyz coordinates. A set of helical parameters is also output as result of calculations (although, a pdb file can be also output for the use with other programs). Sample input files of helical parameters for some standard structures will be provided with example files. Also, helical parameters can be calculated from a pdb file using an auxiliary program fitparam. However, because miniCarlo requires idealized geometries for aromatic bases, bond lengths and bond angles, the nucleic acid structure from a pdb file may require a "regularization" for the use with miniCarlo. As practice shows, this is not always trivial.

    Because of a specialized nature of internal coordinates used, miniCarlo can presently work with nucleic acids only. An arbitrary number of strands and any combination of deoxy and ribo residues is supported; allowed nucleic acid bases are: adenine, cytosine, guanine, thymine, and uracil. However, the present version of the program is not convinient for simulation of triplexes and quadruplexes.

    The flow of calculations by miniCarlo is controlled by a user-specified protocol. The protocol ("way-file") consists of a sequence of commands, which invoke simulation steps (i/o, energy minimization, Monte Carlo), control the flow of the protocol, or modify various parameters (both parameters of simulations, and independent parameters determining the nucleic acid structure). The command language allows loops (which can be nested) and calls of files with sub-protocols. At the start of miniCarlo the sequence of commands is compiled to minimize communications with disk.

    The core of miniCarlo is the backbone closure algorithm by Zhurkin and co-authors, which was used in the early version of the program working with regular DNA duplexes [2]. The current version of miniCarlo still uses portions of the code of the original program, including the backbone closure routine, matrix mathematics, etc. The program adopted its present shape in 1988 [3]. Since then the program underwent numerous modifications and it exists in several divergent versions [Refs]. The current version of miniCarlo is being mostly developed in the Tom James' NMR Lab at UCSF and it includes NMR-related options: distance restraints [4], and multiple-copy refinement with floating probabilities [5] against NOE-derived proton-proton dipolar relaxation rates. During the multiple-copy refinement, floating probabilities are calculated using the pdqpro algorithm [6], and theoretical relaxation rates are calculated using routines of the RELAX program [7]. The proper description of miniCarlo has never been published (which we will do soon), but it has been extensively used in several labs [Refs].

    The program is written mostly in FORTRAN (with the exception of pdqpro and RELAX routines) without dynamic memory allocation. This alpha-version of miniCarlo is compiled allowing for a maximum of 100 residues, 50 base pairs, 2000 distance restraints, and 10 copies of a molecule. Please contact Nick Ulyanov if you need to recompile the program with different dimensions.


    Independent variables

    The following are the independent variables used by miniCarlo to determine the Cartezian coordinates of all atoms in a nucleic acid molecule:

    Base Pair Each residue of a nucleic acid must be assigned to a particular base pair, which is specified by user in the sequence file. All base pairs and steps between consecutive base pairs have certain independent variables associated with them. Each variable (associated either with pair or step) has a unique number identifying this variable. Command language allows selection of any individual variable (or a set of variables). Such a selection is necessary for performing various operations on a nucleic acid structure (change of a particular helical parameter, energy minimization using a subset of degrees of freedom, etc.). A variable is selected by specifying if it is associated with a pair or step, specifying the number of a particular pair or step, and specifying the unique number identifying this variable. The definitions of these unique numbers follow below.

    PAIR Helical parameters.
    There are three rotations (in degrees) and three translations (in angstroms) which define the relative position of two bases in a pair. The table below shows the ID numbers of pair parameters; the ID numbers are used for the selection of these parameters with the PAIR command (described in the "miniCarlo command language" section).

    ID numbers of PAIR parameters
    rotations translations
    name Propeller Buckle Opening Shear Stretch Stagger
    symbol omega kappa sigma Sx Sy Sz
    axis y x z x y z
    ID number 1 2 3 4 5 6


    STEP Helical parameters.
    Also, there are three rotations and three translations defining the relative position of two consecutive pairs; they are associated with steps. The table below shows the ID numbers of step parameters; the ID numbers are used for the selection of these parameters with the STEP command (described in the "miniCarlo command language" section).

    ID numbers of STEP parameters
    rotations translations
    name Twist Tilt Roll Shift Slide Rise
    symbol Omega tau rho Dx Dy Dz
    axis z x y x y z
    ID number 1 2 3 4 5 6

    Note that the definitions of helical parameters internally used in miniCarlo do not conform with the guidelines of the Cambridge convention [8] (e.g., because of a different choice of frames of reference, see Figure 1). Consequently, it is not advised to publish these parameters -- they are for the internal use in miniCarlo only. The easiest way to obtain a set of helical parameters conforming to the guidelines of the Cambridge convention is to output the structure in pdb format, and use one of the available nucleic acid analysis programs [9].

    It is our intention to change the internal definitions of helical parameters in the next release of miniCarlo, so that the same set of parameters will be used for the definition of structure and for its description.

    Residue parameters.
    The above step and pair parameters are the helical parameters per se. The rest of parameters define mostly the sugar conformation. Most logically, these parameters should be referenced using the residue number. However, miniCarlo does not presently have a mechanism to reference directly a residue number (this will change in the future, probably). Instead, a residue must be currently referenced via a pair to which it is assigned. Therefore, such parameters are selected using the PAIR command (described in the "miniCarlo command language" section). As shown in the table below, there are separate sets of ID numbers for parameters related to the "left" residue of the pair (located in the "sequence strand") and "right" residue of the pair (located in the "non-sequence strand", Figure 1). Residues must be assigned as "left" or "right" even for pairs consisting of a single residue; such assignments are specified in the sequence file.

    ID numbers of the rest of PAIR parameters
    "left" residue "right" residue Note
    glycosidic angle chi 9 10
    one-parameter sugar
    sugar pseudorotation P 7 8 SUG4 = 1
    four-parameter sugar, endocyclic parameters
    omega3 23 27 SUG4 = 4 or 10
    omega4 24 28 SUG4 = 4 or 10
    tau4 25 29 SUG4 = 4 or 10
    tau5 26 30 SUG4 = 4 or 10
    ten-parameter sugar, exocyclic parameters
    C1' beta 33 41 SUG4 = 10
    C1' gamma 34 42 SUG4 = 10
    C2' beta not used currently
    C2' gamma not used currently
    C3' beta 37 45 SUG4 = 10
    C3' gamma 38 46 SUG4 = 10
    C4' beta 39 47 SUG4 = 10
    C4' gamma 40 48 SUG4 = 10
    2' hydroxyl group 49 50 riboses only
    methyl group 51 52 thymines only

    The glycosidic angles are internally defined as C2'-C1'-N9-C4 for purines and C2'-C1'-N1-C2 for pyrimidines; this will also change in the next release of miniCarlo.

    For the internal geometry of sugars, one of two models must be selected: one-parameter (SUG4 = 1, default) or four-parameter (SUG4=4). In addition, for the four-parameter sugar model it is allowed to change exocyclic bond angles (with SUG4=10).

    Below is an example of a protocol selecting all independent variables of a DNA molecule consisting of two base pairs which does not have any thymines, assuming that the one-parameter sugar model is used:

    
    PAIR
    1, 10, 1,2,3,4,5,6,7,8,9,10 
    STEP
    1, 6, 1,2,3,4,5,6 
    PAIR
    2, 10, 1,2,3,4,5,6,7,8,9,10
    

    If the molecule had thymines or RNA residues, the torsion angles determining the orientation of methyl groups in T's and hydroxyl groups in riboses should have been also selected.


    Sequence of nucleotides, base pairs and steps

    The sequence of nucleotides and their assignments to base pairs must be specified in a "sequence file", which is required for starting miniCarlo.

    First line of the sequence file is an integer specifying the number of nucleotides in a molecule.

    Third line is a string specifying the sequence in one-character format. Allowed characters are A, C, G, T, U; spaces are not allowed.

    Second line is a string specifying ribo and deoxy residues. Allowed characters are r (ribo), d (deoxy), and space (deoxy). Empty string is interpreted as all deoxy residues.

    Fourth line is a string specifying which residues are 5'- or 3'-ends of a strand. Allowed characters are 5, 3 and space (neither 5'- nor 3'-end). If a strand consists of a single residue, this residue must be specified as 5'- rather than 3'-end. The molecule must have at least one 5'- and one 3'-end (circular molecules are not supported currently).

    Fifth and other lines specify consecutive base pairs, starting with base pair 1. Each line has two integers (two residue numbers). The bases in a pair must not be complementary, and pair may consist of one base only. If a base pair consists of one base only, the other still must be listed as zero.

    Example 1


    Example 1. A DNA:RNA hybrid d(AAA):r(UUU).

    Sequence file "aaa.seq"

    6
       rrr
    AAAUUU
    5 35 3
    1,6
    2,5
    3,4
    

    This molecule has six residues that are assigned to three base pairs in the sequence file. Each pair has six degrees of freedom (Propeller, Buckle, Opening, Shear, Stretch, and Stagger); these parameters must be specified for each pair in the input file with helical parameters. Also, each of two steps in this molecule has six degrees of freedom (Twist, Tilt, Roll, Shift, Slide, and Rise); they too must be specified in the same input file.


    Example 2


    Example 2. An RNA hairpin loop r(GGUUUCC).

    Sequence file

    7
    rrrrrrr
    GGUUUCC
    5     3
    1,7
    2,6
    3,0
    4,0
    5,0
    


    Starting miniCarlo

    To run miniCarlo, you need to place the executable miniCarlo in a directory included in the path. Also, you need to have the file "dupinp.txt" in the working directory. ("dupinp.txt" contains information about standard geometries, force field, partial charges, etc. Do not edit this file!)

    Running miniCarlo requires two parameters, and also several options may be specified:

    miniCarlo [-cdhimoOp] -s <sequence file> -w <protocol file>

    Parameters in square brackets are optional; parameters "-s" and "-w" are required. Instead of <sequence file> and <protocol file> actual sequence file and protocol file must be specified.

    Options:

    -c <file> compile all sub-protocols into a single
    -d checks the syntax of the protocol file (debug)
    -h prints short help
    -i <input file> <record> reads internal coordinates from <input file> using record # <record>
    -m <number> sets the number of multiple copies to <number> (default 1).
    -O allows overwriting output files with internal coordinates
    -o <output file> sets file name for output of internal coordinates
    -p skips protocol file and generates pdb files from input internal coordinates



    Internal coordinates can be also input from the protocol file. The output file for internal coordinates can be also set in the protocol file. (Specifying both input and output files in the command line has the advantage of possibility to use the same protocol file with different input files.) The rest of options (including setting of the number of multiple copies) can be used only in the command line. miniCarlo also produces a fair amount of stdio output. This should be redirected into a file if you want to run the program in background.