Software:Protein-DNA Modeling Interface Tutorial
From Honiglab_public
To run the program one requires a topology file (*.top) and a command file (COMFILE), both are described below.
Topology File: The program requires a topology file that describes the topology and force-field of the biomolecule (e.g. protein & DNA) being used. These files are analogous to the *.top *.crg files used by CHARMM. Two topology files are provided here for the AMBER98 force-field: AMBER98.top is the standard AMBER98 force-field (with the exception of improper dihedral terms), and AMBER98_0.5Phosph has the charge of the DNA phosphate groups scaled as described in the Siggers & Honig (200X). This topology file is read using an environment variable (TROLLTOP). It is easiest if this is set in your .tcshrc (or equivalent) file:
setenv TROLLTOP /foo/AMBER98.top
Command File (COMFILE): The program is run using a Command File (COMFILE) which describes all the input parameters and input files. The COMFILE is a plain text file listing all the arguments – one per line. The arguments needed in the COMFILE and short explanations are listed below; however, running the program with the option –help (>intf_model.exe –help) will similarly list the options with a short description of each. Comment lines can be indicated with preceding ‘//’ characters. For several of the arguments extra details on file formats are provided below.
COMFILE arguments/syntax:
// Comment Line
-i PDB.file -Input template protein-DNA complex, see longer explanation below
-o OUTPUT.pdb -Output filename for modeled structure
-res RESFILE -File describing which residues to model and their Identities
-lib SC_LIB -Protein sidechain rotamer library
-prot_lib_type TOR -Type of rotamer library being used for protein sidechains. Options: ‘TOR’ or ‘XYZ’ for torsional or cartesian.
-pol POL_HYD -Description of polar hydrogens
-rohs_eps 2.0 -Near field dielectic permittivity value
(Sigmoidal function, see description in paper)
-hbond -2.0 -Maximum value of an optimal hydrogen bond
-scp 0.9 -VDW softening parameter (see description in paper)
-cons -Will take sidechain bond lengths and bond angles
from the input PDB file when residue identity is
not changed. When this is not used Standard values
from Charmm22 are used
-init 20 -Number of initial configurations to try
-cycles 10 -Number of cycles to run per initial configuration
(see description in paper).
-DNA_XYZ_Lib_RotNum 50 -Number of Nucleotide rotamers to construct for
each nucleotide being modeled (see paper for description).
Additional parameter/file information:
-i PDB.file: This PDB file needs to be formatted to agree with the syntax in the topology
file. A perl script is included here to format a standard PDB file to agree
with the two AMBER98 topology files provide. This script can be run as shown
below. Currently parameters for metal ions are not included, therefore PDB
atom lines, such as for Zn atoms in Zinc-finger proteins, need to be manually
removed before running the script.
> perl pdb_to_Amber.pl –i FOO.pdb > FOO_converted.pdb
-res RESLIST: The residue list (RESLIST) file describes which sidechains and nucleotides
will be re-modeled. The syntax of the file is as follows
LINE 1: subset description of residues to model
LINES 2-N: identity of the residues indicated in LINE1
LINE N+1-M: Constraint lines to constrain rotamer sampling.
Example RESLIST file: (chain A and range 10-12) or (chain B and range 1-3) or (chain C and range 5-7)
ASP A 10 LEU A 11 TRP A 12 CYS A 13 GUA B 1 CYT A 7 THY B 2 ADE A 6 THY B 3 ADE A 5 CON 1.0 :chain B or chain C
Line 1 indicates that residues 10-12 from chain A, residues 1-3 from chain B and residues 5-7 from chain C should all be modeled. Residues do not need to be contiguous however, to select residues 1 and 3 from chain A one would write: chain A and (range 1 or range 3). The following lines (2-8) indicate the residue identities, protein sidechains are written one per line while nucleotides a paired up with their base-pairing partner nucleotides as indicated. Only residues indicated in line 1 will be modeled, therefore, CYS A 13 (line 5) will not be modeled. The constraint line(line 9) indicates that for chains B and chain C only rotamers with an rmsd <= 1.0 angstroms with the crystal structure PDB will be allowed. This line only makes sense for sidechain residues where the identity doesn’t change and for nucleotide rotamers, where the RMSD is calculated using the sugar heavy atoms and the N1 (pyrimidine bases) or N9 (purine bases).
-lib SC_LIB: A copy of the large torsional rotamer derived fro the cartesian library of
Xiang & Honig (2001) JMB 311:421 is provided. As well, a smaller cartesian (XYZ)
version of Xiang & Honig library is provided. The syntax of the rotamer libraries
needs to follow that of these files and the rotamer type (TOR or XYZ) needs to be
indicated with the –prot_lib_type argument.
-pol pol: This file contains one line that indicates which hydrogren atoms should be
treated as rotatable. Rotatable hydrogens (e.g. CYS HG1) will be rotationally
sampled during the modeling (i.e. when selecting the lowest energy rotamer, for
each CYS rotamer, the CA-CB-SG-HG1 dihedral angle will be sampled at 15 degree
increments). These rotatable hydrogens are normally: CYS, THR, SER, TYR. Two
files are included here, the file pol indicates the CYS,THR,SER and TYR
atoms (with the correct atom names) should be treated as rotatable, and nopol
is a dummy file indicating that no hydrogens should be treated as rotatable.
Running the program:
>intf_model.exe –i comfile
