Revised November 21, 2006

miniCarlo draft manual

by Nick Ulyanov

(under construction)

This is a draft reflecting the changes made in miniCarlo in 2004-2005

Contents

  • Independent variables
  • Sequence, base pairs and steps
  • Starting miniCarlo
  • Input files
  • Command language
  • Examples of protocols
  • References

    Overview

    miniCarlo performs conformational calculations (energy minimization and Metropolis Monte Carlo simulations) on nucleic acids. Similar to conventional Molecular Dynamics (MD) simulation packages (such as AMBER), it uses Cartesian coordinates of individual atoms to calculate the conformational energy of a molecule. miniCarlo uses the Zhurkin-Poltev-Florentiev force field [1], which is currently hard-coded in the program.

    To generate the Cartesian coordinates, miniCarlo uses a specialized set of independent internal coordinates, which include helical parameters. This approach allowes one to reduce drastically the number of degrees of freedom in a molecule by treating aromatic bases as rigid bodies, and using idealized values for bond lengths and most bond angles. Consequently, to start the miniCarlo calculations, a a set of helical parameters is required as input, and not a set of xyz coordinates. A set of helical parameters is also output as result of calculations (although, a pdb file can be also output for the use with other programs). Sample input files of helical parameters for some standard structures will be provided with example files. Also, helical parameters can be calculated from a pdb file using an auxiliary program fitparam. However, because miniCarlo requires idealized geometries for aromatic bases, bond lengths and bond angles, the nucleic acid structure from a pdb file may require a "regularization" for the use with miniCarlo. As practice shows, this is not always trivial.

    Because of a specialized nature of internal coordinates used, miniCarlo can presently work with nucleic acids only. An arbitrary number of strands and any combination of deoxy and ribo residues is supported; allowed nucleic acid bases are: adenine, cytosine, guanine, thymine, and uracil.

    The flow of calculations by miniCarlo is controlled by a user-specified protocol. The protocol ("way-file") consists of a sequence of commands, which invoke simulation steps (i/o, energy minimization, Monte Carlo), control the flow of the protocol, or modify various parameters (both parameters of simulations, and independent parameters determining the nucleic acid structure). The command language allows loops (which can be nested) and calls of files with sub-protocols. At the start of miniCarlo the sequence of commands is compiled to minimize communications with disk.

    The core of miniCarlo is the backbone closure algorithm by Zhurkin and co-authors, which was used in the early version of the program working with regular DNA duplexes [2]. The current version of miniCarlo still uses portions of the code of the original program, including the backbone closure routine, matrix mathematics, etc. The program adopted its present shape in 1988 [3]. Since then the program underwent numerous modifications and it exists in several divergent versions [Refs]. The current version of miniCarlo is being developed in the Tom James' NMR Lab at UCSF and it includes NMR-related options: distance restraints [4], and multiple-copy refinement with floating probabilities [5] against NOE-derived proton-proton dipolar relaxation rates. During the multiple-copy refinement, floating probabilities are calculated using the pdqpro algorithm [6], and theoretical relaxation rates are calculated using routines of the RELAX program [7]. The proper description of miniCarlo has never been published (which we will do soon), but it has been extensively used in several labs [Refs].

    The program is written mostly in FORTRAN (with the exception of pdqpro and RELAX routines) without dynamic memory allocation. This alpha-version of miniCarlo is compiled allowing for a maximum of 100 residues, 50 base pairs, 2000 distance restraints, and 10 copies of a molecule. Please contact Nick Ulyanov if you need to recompile the program with different dimensions.


    Independent variables

    The following are the independent variables used by miniCarlo to determine the Cartesian coordinates of all atoms in a nucleic acid molecule:

    Base Pair Each residue of a nucleic acid must be assigned to a particular base pair, which is specified by user in the sequence file. All base pairs and steps between consecutive base pairs have certain independent variables associated with them. Each variable (associated either with pair or step) has a unique number identifying this variable. Command language allows selection of any individual variable (or a set of variables). Such a selection is necessary for performing various operations on a nucleic acid structure (change of a particular helical parameter, energy minimization using a subset of degrees of freedom, etc.). A variable is selected by specifying if it is associated with a pair or step, specifying the number of a particular pair or step, and specifying the unique number identifying this variable. The definitions of these unique numbers follow below.

    PAIR Helical parameters.
    There are three rotations (in degrees) and three translations (in angstroms) which define the relative position of two bases in a pair. The table below shows the ID numbers of pair parameters; the ID numbers are used for the selection of these parameters with the PAIR command (described in the "miniCarlo command language" section).

    ID numbers of PAIR parameters
    rotations translations
    name Propeller Buckle Opening Shear Stretch Stagger
    symbol omega kappa sigma Sx Sy Sz
    axis y x z x y z
    ID number 1 2 3 4 5 6


    STEP Helical parameters.
    Also, there are three rotations and three translations defining the relative position of two consecutive pairs; they are associated with steps. The table below shows the ID numbers of step parameters; the ID numbers are used for the selection of these parameters with the STEP command (described in the "miniCarlo command language" section).

    ID numbers of STEP parameters
    rotations translations
    name Twist Tilt Roll Shift Slide Rise
    symbol Omega tau rho Dx Dy Dz
    axis z x y x y z
    ID number 1 2 3 4 5 6

    Note that the definitions of helical parameters internally used in miniCarlo do not conform with the guidelines of the Cambridge convention [8] (e.g., because of a different choice of frames of reference, see Figure 1). Consequently, it is not advised to publish these parameters -- they are for the internal use in miniCarlo only. The easiest way to obtain a set of helical parameters conforming to the guidelines of the Cambridge convention is to output the structure in pdb format, and use one of the available nucleic acid analysis programs [9].

    It is our intention to change the internal definitions of helical parameters in the next release of miniCarlo, so that the same set of parameters will be used for the definition of structure and for its description.

    Residue (NUCL) parameters.
    The rest of parameters define the sugar conformation, orientation of the hydroxyl group for the ribose, orientation of the methyl group for T and m5C, and orientation (omega, kappa, sigma, Sx,Sy,Sz) of the 3rd or 4th base in a triple or a quadruple. Now (since winter 2004) they are all referenced with the NUCL keyword.

    ID numbers of the Residue (NUCL) parameters
    Name ID number Note
    omega, kappa, sigma, Sx,Sy,Sz 1-6 For the 3rd or 4th base
    in a triple or a quadruple only
    Glycosidic angle chi 8
    One-parameter sugar
    Sugar pseudorotation P 7 SUG4 = 1
    Four-parameter sugar, endocyclic parameters
    omega3 10 SUG4 = 4 or 10
    omega4 11 SUG4 = 4 or 10
    tau4 12 SUG4 = 4 or 10
    tau5 13 SUG4 = 4 or 10
    Ten-parameter sugar, exocyclic parameters
    C1' beta 14 SUG4 = 10
    C1' gamma 15 SUG4 = 10
    C2' beta 16 not used currently
    C2' gamma 17 not used currently
    C3' beta 18 SUG4 = 10
    C3' gamma 19 SUG4 = 10
    C4' beta 20 SUG4 = 10
    C4' gamma 21 SUG4 = 10
    2' hydroxyl group 22 riboses only
    methyl group 23 T and m5C only

    The glycosidic angles are internally defined as C2'-C1'-N9-C4 for purines and C2'-C1'-N1-C2 for pyrimidines; this will also change in the next release of miniCarlo.

    For the internal geometry of sugars, one of two models must be selected: one-parameter (SUG4 = 1, default) or four-parameter (SUG4=4). In addition, for the four-parameter sugar model it is allowed to change exocyclic bond angles (with SUG4=10).

    Below is an example of a protocol selecting all independent variables of a DNA molecule consisting of two base pairs which does not have any thymines, assuming that the one-parameter sugar model is used:

    
    PAIR
    1, 6, 1,2,3,4,5,6 
    PAIR
    2, 6, 1,2,3,4,5,6
    NUCL
    1, 2, 7,8
    NUCL
    2, 2, 7,8
    NUCL
    3, 2, 7,8
    NUCL
    4, 2, 7,8
    STEP
    1, 6, 1,2,3,4,5,6 
    

    Note that in the case of PAIR and STEP parameters, the base pair number is referenced, but in the case of the NUCL parameters, the nucleotide number is referenced.


    If the molecule had thymines or RNA residues, the torsion angles determining the orientation of methyl groups in T's and hydroxyl groups in riboses should have been also selected.


    Sequence of nucleotides, base pairs and steps

    The sequence of nucleotides and their assignments to base pairs must be specified in a "sequence file", which is required for starting miniCarlo.

    First line of the sequence file is an integer specifying the number of nucleotides in a molecule.

    Third line is a string specifying the sequence in one-character format. Allowed characters are A, C, G, T, U; spaces are not allowed.

    Second line is a string specifying ribo and deoxy residues. Allowed characters are r (ribo), d (deoxy), and space (deoxy). Empty string is interpreted as all deoxy residues.

    Fourth line is a string specifying which residues are 5'- or 3'-ends of a strand. Allowed characters are 5, 3 and space (neither 5'- nor 3'-end). If a strand consists of a single residue, this residue must be specified as 5'- rather than 3'-end. The molecule must have at least one 5'- and one 3'-end (circular molecules are not supported currently).

    Fifth and other lines specify consecutive base pairs, starting with base pair 1. Each line has two integers (two residue numbers). The bases in a pair must not be complementary, and pair may consist of one base only. If a base pair consists of one base only, the other still must be listed as zero.

    Example 1


    Example 1. A DNA:RNA hybrid d(AAA):r(UUU).

    Sequence file "aaa.seq"

    6
       rrr
    AAAUUU
    5 35 3
    1,6
    2,5
    3,4
    

    This molecule has six residues that are assigned to three base pairs in the sequence file. Each pair has six degrees of freedom (Propeller, Buckle, Opening, Shear, Stretch, and Stagger); these parameters must be specified for each pair in the input file with helical parameters. Also, each of two steps in this molecule has six degrees of freedom (Twist, Tilt, Roll, Shift, Slide, and Rise); they too must be specified in the same input file.


    Example 2


    Example 2. An RNA hairpin loop r(GGUUUCC).

    Sequence file

    7
    rrrrrrr
    GGUUUCC
    5     3
    1,7
    2,6
    3,0
    4,0
    5,0
    


    Example 3


    Example 3. DNA triplex d(AA):d(TT):d(TT).

    Sequence file

    6
    
    AATTTT
    535353
    1,4, 5
    2,3, 6
    


    Starting miniCarlo

    To run miniCarlo, you need to place the executable miniCarlo in a directory included in the path. Also, you need to have the file "dupinp.txt" in the working directory. ("dupinp.txt" contains information about standard geometries, force field, partial charges, etc. Do not edit this file!)

    Running miniCarlo requires two parameters, and also several options may be specified:

    miniCarlo [-cdfhimoOp] -s <sequence file> -w <protocol file>

    Parameters in square brackets are optional; parameters "-s" and "-w" are required. Instead of <sequence file> and <protocol file> actual sequence file and protocol file must be specified.

    Options:

    -c <file> compile all sub-protocols into a single
    -d checks the syntax of the protocol file (debug)
    -f <pdb file> to input pdb file and bypass calculation of coordinates
    -h prints short help
    -i <input file> <record> reads internal coordinates from <input file> using record # <record>
    -m <number> sets the number of multiple copies to <number> (default 1).
    -O allows overwriting output files with internal coordinates
    -o <output file> sets file name for output of internal coordinates
    -p skips protocol file and generates pdb files from input internal coordinates
    -v verbose; prints more messages onto stdio



    Internal coordinates can be also input from the protocol file. The output file for internal coordinates can be also set in the protocol file. (Specifying both input and output files in the command line has the advantage of possibility to use the same protocol file with different input files.) The rest of options (including setting of the number of multiple copies) can be used only in the command line. miniCarlo also produces a fair amount of stdio output. This should be redirected into a file if you want to run the program in background.