Course in Applied Structural Bioinformatics: Amber Tutorial [PDF]

Sep 8, 2010 - In this tutorial we run a series of simulations of HNE. We will first figure out how to ...... script belo

0 downloads 13 Views 4MB Size

Report

Download PDF

PNG Network

Recommend Stories

Course Applied

If you are irritated by every rub, how will your mirror be polished? Rumi

Certificate Course in Applied Hypnotism

This being human is a guest house. Every morning is a new arrival. A joy, a depression, a meanness,

PDF Dragonfly in Amber (Outlander)

Where there is ruin, there is hope for a treasure. Rumi

Review PDF Machine Learning in Bioinformatics (Wiley Series in Bioinformatics)

We must be willing to let go of the life we have planned, so as to have the life that is waiting for

APPLIED STATISTICS ON BIOINFORMATICS AND R TOOL Resource Persons APPLIED

When you talk, you are only repeating what you already know. But if you listen, you may learn something

[PDF] Download Dragonfly in Amber (Outlander)

At the end of your life, you will never regret not having passed one more test, not winning one more

Implementation Of Design In Applied Thermodynamics Course

Seek knowledge from cradle to the grave. Prophet Muhammad (Peace be upon him)

Structural Bioinformatics and Protein Docking Analysis

You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Applied Structural Geology ACADEMIC YEAR

Life is not meant to be easy, my child; but take courage: it can be delightful. George Bernard Shaw

PDF Dragonfly in Amber: A Novel

When you talk, you are only repeating what you already know. But if you listen, you may learn something

Idea Transcript

Course in Applied Structural Bioinformatics: Amber Tutorial Lars Skjaerven, Yvan Strahm, Kjell Petersen and Ross Walker September 8, 2010

http://www.bccs.uni.no/ http://www.bioinfo.no/

Contents 1 Introduction 1.1 General preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1

2 Building our HNE system 2.1 Obtain and parse the PDB file . . . . . . . . . 2.2 Protonate the protein . . . . . . . . . . . . . . 2.3 Add crystal waters, disulfide bonds, and solvate 2.4 Visualize the solvated system . . . . . . . . . .

. . . .

2 3 3 3 5

. . . . . . . . . . . . . . . .

6 7 9 10 11 14 14 17 18 19 20 21 21 22 22 23 24

. . . . . . . . . . . . . . the system . . . . . . .

. . . .

3 MD Simulation 3.1 Relaxing the system prior to MD . . . . . . . . . . . . . . . . 3.1.1 Running sander for the first time . . . . . . . . . . . . 3.1.2 Creating PDB files from the AMBER coordinate files 3.2 Heating and equilibration . . . . . . . . . . . . . . . . . . . . 3.3 Analyzing the results to test the equilibration . . . . . . . . . 3.3.1 Analyzing the output files . . . . . . . . . . . . . . . . 3.3.2 Analyzing the trajectory . . . . . . . . . . . . . . . . . 3.3.3 Viewing the trajectory . . . . . . . . . . . . . . . . . . 3.3.4 Re-imaging the trajectory back into the primary box . 3.4 Production Phase . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Analyzing the trajectory . . . . . . . . . . . . . . . . . . . . . 3.5.1 Re-image the trajectory back into the primary box . . 3.5.2 RMSd . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Structural waters . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . .

1

Introduction

The purpose of this tutorial is to provide an initial introduction to setting up and running simulations using the AMBER software (It is based on AMBER 10 and AmberTools 1.2 but should work reasonably well with AMBER 9 and with newer versions of the AMBER software as they are released). In this tutorial we run a series of simulations of HNE. We will first figure out how to generate a starting structure and then use this structure to construct the necessary input files for running sander, the main molecular dynamics engine supplied with AMBER 10. In order to run a classical molecular dynamics simulation with Sander a number of files are required. These are (using their default filenames): prmtop a file containing a description of the molecular topology and the necessary force field parameters. inpcrd (or a restrt from a previous run) a file containing a description of the atom coordinates and optionally velocities and current periodic box dimensions. mdin the sander input file consisting of a series of namelists and control variables that determine the options and type of simulation to be run. The approximate order of this tutorial will be as follows: • Create the prmtop and inpcrd files: This is a description of how to generate the initial structure and set up the molecular topology/parameter and coordinate files necessary for performing minimisation or dynamics with sander. • Minimisation and molecular dynamics in explicit solvent: Setting up and running equilibration and production simulations for our HNE model using TIP3P explicit water.

1.1

General preparation

Before we run any of the programs provided with AMBER, we need to make sure the Unix shell environment variable that specifies where AMBER is installed is set properly. If you are running bash, csh or tcsh check the AMBERHOME environment variable is set by typing: echo $AMBERHOME If you see: AMBERHOME: Undefined variable.

(csh or tcsh) 1

or simply a blank line (bash) then the AMBERHOME environment variable was not set properly and you need to initialize it to point to the AMBER installation directory. If AMBER is installed in /usr/local/amber10 then you would type: export AMBERHOME=/usr/local/amber10

(bash)

While you are at it, you may want to update your .cshrc (csh), or .bashrc (bash) file to define this variable and add the AMBER binary directory to your path (or list of directories searched for executables) at login. In your .bash login file (or the global /etc/bashrc file) (if using the bash shell), add the following (substituting the correct path to AMBER): export AMBERHOME=/usr/local/amber10 export PATH=\$PATH:\$AMBERHOME/exe

2

Building our HNE system

The first stage of this tutorial is to create the necessary input files required by sander for performing minimization and molecular dynamics. There are a number of methods for creating these input files. In this example we will use a program written specifically for this purpose: LEaP. This is a program that reads in force field, topology and coordinate information and produces the files necessary for production calculations (i.e. minimization, molecular dynamics, analysis, ...). There are two versions of this program provided with AMBER 10 & AmberTools 1.2. A graphical version called xleap and a terminal interface called tleap. (There are also completely new versions being produced called gleap and sleap respectively that will ultimately replace xleap and tleap but discussion of this is beyond the scope of this tutorial.) The first step in any modeling project is developing the initial model structure. Although in principle, one could use xleap to build a model structure by hand this is only practical for the smallest of systems. The difficulty in both manipulating and predicting the structure of large biomolecular systems means that building a structure by hand is not usually a sensible undertaking. Instead, experimentally determined structures are used. These can be found by searching through databases of crystal or NMR structures such as the Protein Data Bank or the Cambridge Structural Database. With nucleic acids, users can also search the Nucleic Acid Database. In order to perform an MD simulation of HNE, we first need to fetch the PDB file, check the contents of it, parse it, and subsequently build the necesarry Amber files. Understanding the contents of the PDB file is crucial in order to perform a sucessful MD simulation.

2

2.1

Obtain and parse the PDB file

The HNE structure we are interested in has PDB ID 2z7f. It is crystalized with an inhibitor protein and two covalently bound ligands (NAG and FUC). Moreover, several residues has alternative configuration, denoted A and B along with the residue number in the PDB file. The PDB file also contains several water molecules which might be of structural importance. Finally, we need to investigate the protonation of the residues and identify disulfide bonds in the structure. We wish to perform the simulation without the inhibitor protein, and without the two ligands. We thus have to extract this from the original PDB file. You can do this with several programs, like PyMOL or VMD. Alternatively, the linux command ”grep” will also do the job: grep "ATOM " 2Z7F.pdb | egrep -v " I " > 2z7f_chainE.pdb grep "HOH " 2Z7F.pdb | egrep -v " I " > 2z7f_chainE_XWAT.pdb

2.2

Protonate the protein

These commands produces two new PDB files called 2z7f chainE.pdb (protein only) and ’2z7f chainE XWAT.pdb (crystal waters). Secondly, we’ll have to add hydrogen atoms to our system. The PDB2PQR webserver can do this for us. Adding hydrogens to amino acids are in most cases trivial for pH 7.0. However, histidines have a pKa right around this range and can have either their delta nitrogen or epsilon nitrogen protonated, or they can both be protonated, yielding an amino acid with a +1 charge. This webserver will optimize the hydrogen bonding network within the protein to determine where the hydrogens should go in the optimal case. Go to the pdb2pqr site, and submit the structure. Use the AMBER naming scheme, and ensure that you have checked on these settings: (1) ”Ensure that new atoms are not rebuilt too close to existing atoms”, and (2) ”Optimize the hydrogen bonding network”. Start the job, and await your results. Retrieve the newly generated PQR file and save it in your ”HNE MD” folder. Open this file in a text editor, or by the command less 2z7f_chainE.pqr The histidines are now called HID, HIE, or HIP, in order specify wether they have a hydrogen at the delta or epsilon nitrogen, or both (charged). Results: 2z7f.pqr

2.3

Add crystal waters, disulfide bonds, and solvate the system

In the next step we will use Amber in order to finalize our system in order to run the MD simulation. The first step is the building of residues. Many proteins contain coenzymes

3

as well as standard amino acids. These coenzymes are not normally pre-defined in the AMBER database and so are considered to be non-standard residues. It is necessary to provide structural information and force field parameters for all of the non-standard residues that will be present in your simulation before you can create the sander input files. Fortunately, if you are using standard nucleic acid or amino acid residues, as we will be in this tutorial, this step is not necessary since all of the residues are pre-built in the AMBER database. Later tutorials will cover what to do if you have non-standard residues. For this, we will use the Amber program called ”Leap” (there exists two versions: tleap and sleap). Leap takes as input a set of commands which dictates how you want to build your system. We can gather these commands in a file which we will call ”build”. We will use sleap, which is newer and create disulfide bridges automatically. Open a text editor and type in the following:

# Load the forcefield > source leaprc.ff99SB # Set settings for this Leap session > set default disulfide auto # load the protein prot = loadpdb 2z7f_E.pqr # load the crystal waters xwat = loadpdb 2z7f_chainE_XWAT.pdb # combine the protein and waters complex = combine {prot xwat} # save amber topology and coordinate files saveamberparm prot HNE_noWAT.prmtop HNE_noWAT.inpcrd # solvate the protein solvateOct complex TIP3PBOX 12.0 0.78 # save solvated complex saveamberparm complex HNE_sol.prmtop HNE_sol.incprd # save a pdb of the built system savepdb complex complex.pdb 4

Download the leap input file here. The first lines loads the forcefield we will use. The next command specifies that we want to build the disulfide bonds automatically. Then we load the protein structure, and the crystal waters into leap, and combine these two. ”saveamberparm” will generate the parameter and topology (.prmtop) and the coordinate file (.inpcrd). These files are not needed to run the MD simulation, but they will come in handy in the subsequent analysis of the MD simulations. ”solvateOct” is the command to solvate our protein in an truncated octahedron. We’ll use the TIP3 water model, and we want at least 12 ˚ A of water between our protein surface and the edge of the water box. The last term specifies that no water molecules should come closer to the solute than 0.78 ˚ A. Finally, we build the parameter, topology, and coordinate files for the solvated system in which we will use for the MD simulation. Run Leap with the command: sleap -f build_HNE then exit This creates the following files: • HNE sol.prmtop • HNE sol.inpcrd To remind you about these files: prmtop The parameter/topology file. This defines the connectivity and parameters for our current model. This information is static, or in other words, it doesn’t change during the simulation. The prmtop we created above is called HNE sol.prmtop inpcrd The coordinates (and optionally box coordinates and velocities). This is data is not static and changes during the simulations (although the file is unaltered). Above we created an initial set of coordinates called HNE sol.inpcrd.

2.4

Visualize the solvated system

To open the structure in VMD type the following command: vmd HNE_sol.prmtop When the three windows of VMD opens, left click on the HNE sol.prmtop name in the VMD Main window. When the name turns yellow, right click it, → ’Load data into molecule’. Browse to find ’HNE sol.inpcrd’. Select ’Amber7 Restart’ in the ’Determine file type’ box. Investigate the structure and observe e.g. the disulfide bonds in the residues named “CYX”. 5

3

MD Simulation

This section will introduce sander and show how it can be used for minimization and molecular dynamics of our previously created HNE model. This section of the tutorial will consist of 3 stages: • Relaxing the system prior to MD • Heating and equilibration • MD run for 5 ns

6

• Analyzing the results

3.1

Relaxing the system prior to MD

Prior to the MD simulation we need to relax the system by performing a energy minimization as our current geometry may not correspond to the actual minima in the force field we are using and may also result in conflicts and overlaps with atoms in other residues. It is always a good idea to minimize the structure before commencing molecular dynamics. Failure to successfully minimize the structure may lead to instabilities when we run MD. So, given the in vacuo prmtop (HNE sol.prmtop) and inpcrd (HNE sol.inpcrd) files we created, we will now use sander to conduct a short minimization run. This will take us towards the closest local minima. Minimization with sander will only ever take you to the nearest minima, it cannot cross transition states to reach lower minima. This is fine for our purposes, however, since all we want to do is remove the largest strains in the system. The basic usage for sander is as follows:

7

sander [-O] -i mdin -o mdout -p prmtop -c inpcrd -r restrt [-ref refc] [-x mdcrd] [-v mdvel] [-e mden] [-inf mdinfo] Arguments in []’s are optional. If an argument is not specified, the default name will be used. -O overwrite all output files (the default behavior is to quit if any output files already exist) -i the name of the input file (which describes the simulation options), mdin by default. -o the name of the output file, mdout by default. -p the parameter/topology file, prmtop by default. -c the set of initial coordinates for this run, inpcrd by default. -r the final set of coordinates from this MD or minimization run, restrt by default. -ref reference coordinates for positional restraints, if this option is specified in the input file, refc by default. -x the molecular dynamics trajectory file (if running MD), mdcrd by default. -v the molecular dynamics velocities file (if running MD), mdvel by default. -e a summary file of the energies (if running MD), mden by default. -inf a summary file written every time energy information is printed in the output file for the current step of the minimization of MD, useful for checking on the progress of a simulation, mdinfo by default. We will do our minimization in two steps. The first step involves the relaxation of the water molecules only. In this part, we will hold the protein fixed by adding a restraint to the protein residues (1-218). Such restraints work by specifying a reference structure, in this case our starting structure, and then restraining the selected atoms to conform to this structure via the use of a harmonic potential. This can be visualized as attaching a spring to each of the solute atoms that connects it to its initial position. Moving the atom from this position results in a force which acts to restore it to the initial position. By varying the magnitude of the force constant so the effect can be increased or decreased. This can be especially useful with structures based on homology models which may be a long way from equilibrium and so require a number of stages of minimization / MD with the force constant being reduced each time. The second step involves the minimization of the whole system. In order to perform the energy minimization we need to build a so-called MD input file (mdin). This file specifies the myriad of possible options for this run. An example is given below, which is the one we will use for our system. 8

minimization of waters &cntrl imin = 1, # perform energy minimization maxcyc=5000, # 5000 steps in total ncyc=3000,# 3000 of these are steepest descent ntb=1, # Constant volume cut=10, # Non-bonded cut-off ntpr=5,# Write energy information every 5th step ntr=1, restraintmask=’:1-218’, restraint_wt=2.0, / minimization of the whole system &cntrl imin = 1, maxcyc=10000, ncyc=7000, ntb=1, ntr=1, cut=10, ntpr=5, / Download the mdin files: • min sol.in • min all.in 3.1.1

Running sander for the first time

You can run the minimization with the commands below. This will start ”sander” and the minimization jobs specified in the input files (assuming a multi core CPU): mpirun -np 2 sander.MPI -O -i min_sol.in -o min_sol.out -p HNE_sol.prmtop -c HNE_sol.incprd -r min_sol.rst -ref HNE_sol.incprd mpirun -np 2 sander.MPI -O -i min_all.in -o min_all.out -p HNE_sol.prmtop -c min_sol.rst -r min_all.rst Download the results here: • min sol.out • min sol.rst • min all.out • min all.rst 9

Take a look at the output file produced during the minimization (min all.out). You will see that the energy dropped considerably between the first and last steps: NSTEP 1 NSTEP 6000

ENERGY 6.0677E+05 ENERGY -1.1256E+05

RMS 1.0951E+04 RMS 8.9736E-02

GMAX 1.1732E+06 GMAX 6.3972E+00

NAME O NAME OD1

NUMBER 19403 NUMBER 670

In spite of this, however, the structure did not change very much. This is because, as already mentioned, minimization will only find the nearest local minima. If you create PDB files for the start (HNE sol.inpcrd) and final structures (min all.rst) and compare the root-mean-squared deviations (RMSd), you will see that the two structures differ by only about 0.4 ˚ A (all atom).

3.1.2

Creating PDB files from the AMBER coordinate files

You may want to generate a new pdb file so you can look at the structure using the minimized coordinates from min all.rst. A pdb file can be created from the parm topology and coordinates (inpcrd or restrt) using the program ambpdb. ambpdb -p HNE_sol.prmtop < min_all.rst > min_all.pdb This will take the prmtop and inpcrd file specified (in this case the final structure from the minimization, the .rst file) and create (on stdout) a pdb file. In this case we redirected stdout to the file min all.pdb. In general, users should carefully inspect any starting structures and minimized structures; specifically check to make sure the hydrogens were placed where you thought they 10

should be, histidines are in the correct protonation state, the terminal residues are properly terminated, stereochemistry is reasonable, etc. There is nothing worse than finding out after you’ve run the simulation that something was wrong.

3.2

Heating and equilibration

Now that we have successfully minimized our system the next stage in our equilibration protocol is allow Our final aim is to run production dynamics at constant temperature and pressure since this more closely resembles laboratory conditions. However, at low temperatures, as our system will be for the first few ps the calculation of pressure is very inaccurate and so using constant pressure periodic boundaries in this situation can lead to problems. Using constant pressure with restraints can also cause problems so initially we will run 20 ps of MD at constant volume. Once our system has equilibrated over approximately 20 ps we will then switch off the restraints and change to constant pressure before running a further 100 ps of equilibration at 300 K. To this system we then added a truncated octahedral box of pre-equilibrated TIP3P water. This water has not felt the influence of the solute or charges and moreover there may be gaps between the solvent and solute and solvent and box edges. If we are not careful such holes can lead to ”vacuum” bubbles forming and subsequently an instability in our molecular dynamics simulation. Thus we need to run careful minimization before slowly heating our system to 300 K. It is also a good idea to allow the water box to relax during a MD equilibration stage prior to running production MD. In this phase it is a good idea, since we will be using periodic boundaries, to keep the pressure constant and so allow the volume of the box to change. This approach will allow the water to equilibrate around the solute and come to an equilibrium density. It is essential that we monitor this equilibrium phase in order to be certain our solvated system has reached equilibrium before we start obtaining results (production data) from our MD simulation. In this section will be presented one possible path towards equilibrating the solvated HNE structure such that it is ready for use in production MD. For this we will use periodic boundary simulations based on the particle mesh Ewald (PME) method. After equilibration has been demonstrated and deemed to have been successful, we will set up a ”production” MD run. Please note, there are many differing opinions on how one should equilibrate a simulation, herein is presented only one, based on my experience with performing molecular dynamics simulations on biological systems. Note: some of the calculations, particularly the equilibration phase, may take a very long time (several days) to run unless you have access to a computer cluster for parallel simulation. For the purposes of this simulation I will provide copies of all of the input files along with example output files run on four 1.3GHz CPUs of a SGI Altix. I will make it clear where the simulations are likely to take a long time such that you can just use the provided output files. 11

The second step in the MD simulation is the heating of the system from 0K to 300K. During this step we need constant volume since pressure calculation is inaccurate at low temperatures. We also want to keep our protein restrained to avoid wild fluctuations during this process. Heating of protein from 0K to 300K with weak restraints &cntrl imin = 0, # no energy minimization irest = 0, # no restart ntx = 1,# no initial velocities will be read ig = -1, # random initial velocities nstlim = 25000, # run for 0.05 ns dt = 0.002, # step length (fs) ntc = 2,# bonds involving H are constrained ntf = 2, cut = 8.0, ntb = 1, ntpr = 500, ntwx = 500,# write the structure every 500th step ntt = 3, # Langevin temperature control gamma_ln = 2.0, tempi = 0.0, # initial temperature temp0 = 300.0,# reference temperature ntr = 1, # Cartesian restraints on residue 1-218 restraintmask = ’:1-218’, restraint_wt = 2.0, # restraint weight nmropt = 1,# flag for restraints set below ioutfm = 1,# netcdf trajectory / &wt TYPE=’TEMP0’, istep1=0, istep2=25000, value1=0.1, value2=300.0, / &wt TYPE=’END’ / Equlibrate the density &cntrl imin=0, irest=1, # restart the simulation from last run ntx=5, nstlim=25000, dt=0.002, ntc=2,ntf=2, cut=8.0, ntb=2, 12

ntp=1, taup=1.0,# constant pressure ntpr=500, ntwx=500, ntt=3, gamma_ln=2.0, temp0=300.0, ntr=1, restraintmask=’:1-218’, restraint_wt=2.0, ioutfm = 1, / Equilibrate &cntrl imin = 0, irest = 1,ntx = 5, nstlim = 500000, dt = 0.002, ntc = 2, ntf = 2, cut = 10.0, ntb = 2, ntp = 1, taup=2.0, ntpr = 1000, ntwx = 1000, ntt = 3, gamma_ln = 2.0, temp0 = 300.0, ntr = 1, restraintmask = ’:1-218’, restraint_wt = 0.5, ioutfm = 1, / Download the mdin files: • heat.in • density.in • equil.in Start the simulations with: mpirun -np 2 sander.MPI -O -i heat.in -o heat.out -p HNE_sol.prmtop -c min_all.rst -r heat.rst -x heat.nc -ref min_all.rst mpirun -np 2 sander.MPI -O -i density.in -o density.out -p HNE_sol.prmtop -c heat.rst -r density.rst -x density.nc -ref min_all.rst mpirun -np 2 sander.MPI -O -i equil.in -o equil.out -p HNE_sol.prmtop -c density.rst -r equil.rst -x equil.nc -ref min_all.rst Download the results: • equilibration.tar.gz Unpack them with the command tar xzvf equilibration.tar.gz 13

3.3

Analyzing the results to test the equilibration

Now that we have equilibrated our system it is essential that we check that equilibrium has successfully been achieved before moving on to running any production MD simulations in which we would hope to learn something new about our molecule. There are a number of system properties that we should monitor to check the quality of our equilibrium. These include: • Potential, Kinetic and Total energy • Temperature • Pressure • Volume • Density • RMSd The following steps will illustrate how to extract this data and plot it and also discuss what we expect to see for a successful equilibration. 3.3.1

Analyzing the output files

In this section we will look at the system properties that can be extracted from the data written to the mdout files during our 120ps of MD simulation (md2.out). This includes the potential, kinetic and total energies, the temperature, pressure, volume and density. Lets start by extracting the various properties from our two output files. For this we will use the process mdout Perl script that was introduced earlier. This script can be given any number of files to process and it will append the results to single summary files: We’ll use a perl script for this analysis. Download it here. mkdir analysis cd analysis perl ../process_mdout.perl ../heat.out ../density.out ../equil.out This should give us a series of summary files in our analysis directory (summary output files here). Lets plot some of these to see if we have successfully reached an equilibrium. First of all the energies, potential, kinetic and total (= potential + kinetic): xmgrace summary.EPTOT summary.EKTOT summary.ETOT

14

Here the red line, which is positive, is our kinetic energy, the black line is the potential energy (negative) and the green line is our total energy. The important points to note with this plot is that all of the energies increased during the first few ps, corresponding to our heating from 0 K to 300 K. The kinetic energy then remained constant for the remainder of the simulation implying that our temperature thermostat, which acts on the kinetic energy, was working correctly. The potential energy, and consequently the total energy since total energy = potential energy + kinetic energy, initially increased and then plateaued during the constant volume stage (0 to 20 ps) before decreasing as our system relaxed when we switched off the DNA restraints and moved to constant pressure (20 to 40 ps). The potential energy then leveled off and remained constant for the remainder of our simulation (40 to 120 ps) indicating that the relaxation was completed and that we reached an equilibrium. Next we plot the temperature: xmgrace summary.TEMP Here we see the expected behavior, our system temperature started at 0 K and then increased to 300 K over a period of about 5 ps. The temperature then remained more or less constant for the remainder of the simulation indicating the use of Langevin dynamics for temperature regulation was successful. Next the pressure: xmgrace summary.PRES The pressure plot is slightly different than the previous plots. For the first 20 ps the pressure is zero. This is to be expected since we were running a constant volume simulation 15

in which the pressure wasn’t evaluated. At 20 ps we switched to constant pressure, allowing the volume of our box to change, at which point the pressure drops sharply becoming negative. The negative pressures correspond to a ”force” acting to decrease the box size and the positive pressures to a ”force” acting to increase the box size. The important point here is that while the pressure graph seems to show that the pressure fluctuated wildly during the simulation the mean pressure stabilized around 1 atm after about 50 ps of simulation. This is sufficient to indicate successful equilibration. Next we will consider the volume. Unfortunately we cannot simply plot the summary.VOLUME file since the first 20 ps do not contain any volume information since

16

during constant volume simulations the volume of the box is not written to the output file. So, we need to remove the first 101 lines of the file. You can use your favorite text editor to do this - I like vi: xmgrace summary.VOLUME

Notice how the volume of our system (in angstroms3) initially decreases as our water box relaxes and reaches and equilibrated density, and thus volume. The smooth transitions in this plot followed by the oscillations about a mean value suggest our equilibration has been successful. Finally, lets take a look at the density, we expect this to mirror the volume. We have the same problem with this summary file as we did with the volume since the density is not written to the output during constant volume simulations. xmgrace summary.VOLUME This is as expected. Our system has equilibrated at a density of approximately 1.04 g cm-3. This seems reasonable. The density of pure liquid water at 300 K is approximately 1.00 g cm-3 so adding a 10-mer of DNA and the associated charges has increased the density by around 4 %. 3.3.2

Analyzing the trajectory

Now we have analyzed our output files and seen that the main system properties suggest that our equilibration has been successful the next step is to consider the structures. Have 17

they remained reasonable? One useful measure to consider is the root mean square deviation (RMSd) from the starting structure. We can use ptraj to calculate this for us as a function of time. In this situation we are interested in the backbone of our HNE protein so we shall consider just the main backbone atoms, N, CA, C. We will conduct a standard (not mass weighted) RMS fit to the first structure of our MD (the final structure from the minimization). Note, the time is set to 0.2 this time around since although we wrote to the trajectory file every 100 steps the time step was 2 fs - hence 100 steps = 0.2 ps. Here is the input file for ptraj: trajin heat.mdcrd trajin density.mdcrd trajin equil.nc rms first out HNE_equilibration_rms.dat @C,CA,N time 0.2 Download this ptraj input file: initial rmsd.ptraj Run ptraj to calculate the RMSd: ptraj ../HNE_sol.prmtop initial_rmsd.ptraj xmgrace HNE_equilibration_rms.dat

3.3.3

Viewing the trajectory

Next we can take a look at the trajectory using VMD. First of all we will do this without re-imaging back to the primary box. We will then use ptraj to re-image our trajectory and then look at it again. So, first of all lets load both trajectory files into vmd: 18

It appears like the water box is breaking down and the waters are boiling off into space. This would indeed be true if we were not using periodic boundaries. However, as mentioned above the atom coordinates written to the trajectory files are those of the original atom, even if it migrates outside of the original box and is replaced by one of its images. We can ”fix” this by re-imaging our trajectory as we will now do.

3.3.4

Re-imaging the trajectory back into the primary box

Lets use ptraj to re-image our trajectory files so that only the atoms in the primary box are shown. Here is the ptraj input file: trajin trajin trajin center

heat.mdcrd density.mdcrd equil.nc :1-218 19

image familiar rms first @C,CA,N trajout equilibration_reimaged.nc netcdf Download this ptraj input file: reimage.ptraj Run ptraj to perform the re-imaging: ptraj ../HNE_sol.prmtop reimage.ptraj ... and visulaize the trajectory again.

3.4

Production Phase

After equilibrating the system we are ready to remove the artificial constraints, and let the system move freely in the water box. This is commonly the phase in which you perform the analysis of the MD simulation. Besides from removing the restraints we will also remove the temperature control (ntt=0). In order to do so we should lower the SHAKE tolerance (0.000001) and increase the accuracy of the PME by reducing DSUM TOL by an order of magnitude (0.000001). A time step of 1 fs is also needed in order to avoid a large drift in temperature. Moreover, we will run 5 ns at a time (5.000.000 steps), and save coordinates to the trajectory file every 1000 step. Our mdin file thus looks like this: MD simulation NVE &cntrl imin= 0, irest= 1, ntx = 5, ntb = 1, cut = 10, ntc = 2, ntf = 2, tol = 0.000001, ntt = 0, nstlim = 5000000, dt = 0.001, ntpr = 1000, ntwx = 1000, ntwr = 10000, ioutfm = 1, &ewald dsum_tol = 0.000001, / Download the mdin file: • prod nve.in In order to obtain 10 ns of simulation we need to do this twice. The first time we continue our simulation of the equil.rst file, and the second time we continue from the prod1.rst file. Start the simulations with the following commands: 20

mpirun -c mpirun -c

-np 2 pmemd.MPI -O -i prod_nve.in -o prod1.out -p HNE_sol.prmtop equil.rst -r prod1.rst -x prod1.nc -np 2 pmemd.MPI -O -i prod_nve.in -o prod1.out -p HNE_sol.prmtop prod1.rst -r prod2.rst -x prod2.nc

This takes about 50 hrs on 32 CPUs, so it’s gonna take ”some” more time on your local desktop computer. Thus, feel free to download the results, and unpack it: • production.tar.gz tar xzvf production.tar.gz As you may notice the files called prod1.nc and prod2.nc are quite big. They contain 10.000 frames, or coordinates, of the 27.800 atoms in our system. These frames constitute the trajectory of the MD simulation in which we will base our subsequent analysis on.

3.5

Analyzing the trajectory

Similar to the equilibration phase we should now plot the energies and particular temperature as a function of time. This will involve the same process as described above, and thus we will not repeat it here. We will rather spend some time on performing simple analysis on the MD trajectory. Calculating the RMSD and RMSF is perhaps the two most common analysis applied to MD simulations. We’ll demonstrate the use of these in the next sections. 3.5.1

Re-image the trajectory back into the primary box

However, we first have to perform a simple re-imaging in order to put all our atoms into the primary box (recall from the equilibration phase that the waters, and the solute, will diffuse from the primary box). By doing so, we will be able to study the waters interacting with the solute. Moreover, we told sander (or pmemd.MPI) to save 10.000 frames. To speed up our analysis, we will only process 1/10th of these. Let’s use ptraj to read in the trajectory files (prod1.nc and prod2.nc), re-image, superpose all frames onto the first frame, and save a new trajectory. By performing the RMS fit we will avoid seeing rotational and translational motion of the protein. Our ptraj input file will look like this: trajin ../prod1.nc 1 5000 10 trajin ../prod2.nc 1 5000 10 center :1-218 image familiar rms first @N,CA,C trajout prod_10ns.nc netcdf 21

You can download it from here: • short prod.ptraj And start it with the following command: ptraj HNE_sol.prmtop short_prod.ptraj This will create our new trajectory file which is shorter (1000 frames) and where the the atoms are placed in the primary box. Open the trajectory in VMD and take a look a the results: vmd ../HNE_sol.prmtop prod_10ns.nc 3.5.2

RMSd

Calculating the RMSd for each frame gives us means to interpret the stability of our simulation. Use ptraj to calculate it: trajin prod_10ns.nc rms first mass out prod_rmsd.out @N,CA,C Run it: ptraj HNE_sol.prmtop prod_rmsd.ptraj And plot the resulting values in xmgrace: xmgrace prod_rmsd.out As we can see by the plot the RMSd values seems stable around a little less than 1 ˚ A.

3.6

Fluctuations

In the previous section we calculated the RMSd values along the simulation time and thus got an impression of the stability of our 10 ns simulation. By calculating the RMSf (or beta factors) will give us information about individual residue fluctuations. For kicks we’ll compare our computed beta factors with the experimental beta factors. Ptraj will compute the bfactors for us: trajin prod_10ns.nc rms first mass @N,CA,C atomicfluct out fluctuations.out :1-218@N,CA,C bfactor byres Run it: 22

ptraj HNE_sol.prmtop prod_fluct.ptraj This will calculate the beta factors from our MD simulation. In order to compare this to the experimental values, we have to extract the values from the PDB file. You can download this from here: • betafact.dat Now we are ready to plot the values with xmgrace: xmgrace fluctuations.out betafact.dat

3.7

Distances

In MD simulations you can follow individual particle motions and measure the interaction with other particles. The following script provides an example on how to measure distances between atoms during the simulation: trajin prod_10ns.nc distance d1 :41@HD1 :88@OD2 out dist1.out distance d2 :41@H :88@OD1 out dist2.out distance d3 :41@HE2 :173@OG out dist3.out Run it: ptraj HNE_sol.prmtop dists.ptraj 23

xmgrace dist1.out xmgrace dist2.out xmgrace dist3.out

3.8

Structural waters

Finally we will investigate the interaction between individual water molecules and the solute. More specifically, we will search for structural water molecules which potentially provides and h-bond bridge between two residues. What will be shown below is that HNE has a few of these structural water molecules. We’ll start by calculating the bfactors of all the waters in our system. Run the ptraj script below, and plot the resulting bfactors in xmgrace. trajin prod_10ns.nc rms first mass @N,CA,C atomicfluct out wat_flucts.out ’* & !:1-218’ bfactor byres xmgrace wat_flucts.out As expected most water molecules have a high bfactor (little lass than 30.000), but not all. The plot reveals a few water molecules with low bfactor. As expected most of these are water molecules from the crystal structure (they have low residue numbers). However, we also see a few peaks with high residue numbers. The following table are the 20 water molecules with lowest bfactor:

24

Res. index

Bfactor

219 5060 7911 1408 234 4530 1842 261 239 231 228 226 280 222 225 224 229 220 235 221

10388.1 10105.3 9623.3 7761.2 7409.4 4683.9 2585.7 214.7 131.1 16.5 16.1 12.7 10.9 10.8 10.7 10.4 9.8 6.9 6.8 6.6

25

We will now focus on a few of these water molecules. Open VMD and load in the trajectory. Choose the “new cartoon” representation, and color it “Structure”. Create another representation which is called “residue 220” and represent it with “CPK”. Play the trajectory and zoom into the water molecule. If you want to take a look on the residues surrounding this trapped water molecule create another representation and write “all within 5 of residue 220”. Delete or hide the two latter representations. Create a new representation of “residue 1841” and represent it with CPK. Play the trajectory from the beginning and follow this water molecule.

26

Course in Applied Structural Bioinformatics: Amber Tutorial [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch