How can I check the accuracy of a simulation?
There are a number of checks that can be performed to help determine whether there are any problems with a simulation:
  • The SDA output contains messages for 'Warning' or 'Error', you should check that there are no errors, and that the warnings can be safely ignored.
  • When SDA is unable to make a Browian dynamics move that does not violate an exclusion grid, 'boosts' are performed to move the solutes away from each other. Ideally, this is not necessary and the input files for simulations in which many 'boosts' are made should be checked carefully. The number of 'boosts' performed is listed in the sda output under 'total boosts these runs', which should be 0 (N.B. you will need to check this value for each thread).
  • If using electrostatic potential grids, the energy of solute 1 in grid 2 (column Elec-1), should usually be almost identical to the energy of solute 2 in grid 1 (column Elec-2). The values are reported in e.g. the complexes output file if complexe_sum_energy is false. Large differences in these values suggest problems with the ECM model, and the grid preparation and ECM calculations should be checked. These errors can also suggest a problem with the exclusion grid generation. As the ECM model may not be valid if charged sites on one solute are able to penetrate the low dielectric interior of another.
Visual checks can be performed:
  • if save_exclusion is set in the SDA input file, the exclusion grid (*.excl.ascii.grd) that is created can be visualized, together the corresponding solute pdb file using a visualization tool (eg vmd, pymol etc). The grid should match the surface of the protein. If any atoms penetrate the surface this indicates an error. See also Q:Why does my protein manage to penetrate the surface of the fixed protein?
  • visualization of hydrophobic desolvation and electrostatic desolvation grids can be performed in a similar manner (like exclusion grids).
  • the file reaction_criteria.log can be also be viewed as a pdb file. The file contains the positions on each solute that define the reaction criteria. It can be viewed together with the solute pdb files used for in the simulation to compare whether the reaction criteria are placed correctly. Each frame in reaction_criteria.log corresponds to a solute (Frame 1 = Solute 1, ...)
NB: in VMD you may have to explicitly specify that files should be read in pdb format, if the suffix is not *.pdb.
Why does my protein manage to penetrate the surface of the fixed protein?
SDA can save the exclusion volume grid that is calculated during one simulation, to save time in later simulations. If SDA reads in a precalculated exclusion volume grid, but you have edited your solute pdb file following the generation of this grid, SDA will not regenerate a new grid, leading to inconsistencies. To resolve this issue you will need to delete the exclusion grid from earlier simulations and re-run SDA.
Penetration of the surface of a solute may also occur if the soft-core repulsion term is used (without an exclusion volume grid) and there is very strong electrostatic attraction between charges.
The energy in column Elec-1 is not comparable to the energy in column Elec-2
This is often caused by a clash between the proteins, normally accompanied by a number of boosts being observed (see 'How can I check the accuracy of a simulation?'). One common cause of this problem is the use of pdb2pqr to generate pqr files for the Poisson-Boltzmann electrostatic potential calculation, then to use the initial pdb file in the final SDA calculation. Often, pdb2pqr will attempt to 'debump' the structure in the initial pdb file to remove clashes between added hydrogens. This means that the coordinates in the initial pdb may vary slightly from those in the pqr file. You can use a molecular modelling package to remove hydrogens from the generated pqr file, and save the structure as a pdb, or use a text editor or script to remove hydrogens from the pqr file for use as the input pdb file in SDA. SDA does not use the occupancy or b-factor data in pdb files, so the latter option will not cause problems.
Where do I find a description of the energy terms implemented in SDA?
A description of the different terms can be found in the Appendix in Martinez et al. Different types of calculation require the choice of different terms.
How can I track the progress of a simulation?
  • In bimolecular (sda_2_proteins) simulations, the file 'steps.log' is created that contains the number of trajectories completed so far (first line), and the length of each trajectory (remaining lines). It is updated at a certain fixed frequency.
  • In SDAMM simulations, the file steps.log shows the progress of the simulation in ps. Also, if writing trajectory files in ascii format, the simulation time at the last set of coordinates was recorded (in ps) is shown in the Time column of the last trajectory line.
How can I visualize the simulation results?
  • You can use trajectory2dcd to generate pdb and dcd files for visualization. read_record may first be used to select which parts of a complexes or trajectory file you wish to visualize. For example, only those frames of a trajectory where two solutes are within a certain distance.
  • If a complexes file has been generated, its structures can be clustered and pdb files of the cluster representatives created for visualization.
  • If the option resid_3d was set in the ResidenceTime group of the SDA input file, you can visualise the distribution of positions of a mobile solute that have been sampled in a simulation, relative to a fixed solute, by loading the resulting grid along with the pdb file of the fixed solute in a visualization program, such as VMD. This grid contains the residence time of the center-of-geometry of protein 2. Adjust the isosurface value in the visualization program. High values give an idea of the likely interaction sites with the fixed solute.
I do not have NACCESS, can I still run SDA ?
SDA 7's internal subroutine for solvent accessibility is much more computationally efficient than in previous versions of SDA and consequently, the option to use NACCESS (a Fortran77 program) has been removed from the standard compilation (but SDA can be compiled for use with NACCESS as described here).
Also, a standalone tool calc_solvaccess is present in the SDA 7 tools directory, which can be used to calculate the solvent accessibility of each atom of a pdb file.
SDA 7 introduces the parameter save_access. When set to 1 in the SDA input file, the program will save the accessibility grid generated during a simulation to a file, so that, in later simulations, instead of being recalculated, it can be read from this file. This saves considerable time during the initialization step of subsequent simulations of large solutes. If you modify the input pdb files between simulations, be sure to delete this grid file, so that a new correct solvent accessibility grid can be generated (changes in probes, threshold or atom numbers will force the regeneration of the file).
I do not have HARLEM, can I still run SDA to calculate electron transfer rates ?
HARLEM is no longer used with SDA and the PATHWAYS plugin for VMD should be used instead (see the Cytochrome f - Plastocyanin example). This has the advantage that it is available for more computer platforms. Alternatively, you might use a simpler description of electron transfer: in some cases, good estimates can be obtained by simply giving the electron donor and acceptor as reaction atom pairs for SDA association rate calculations.
Is there a general procedure for the preparation of input files for an SDA simulation?
While this can depend on the system being simulated, and the properties you wish to calculate, the general procedure is as follows (Note that webSDA guides the user through the basic stages of the set up of an SDA calculation and can be used to generate input files for running SDA outside the web server).:
  1. Starting with initial structure files for the solutes you wish to simulate (which may or may not contain hydrogen atoms), you must generate pqr files for each solute that contain all hydrogen atoms and in which all titratable groups have their correct protonation states for the experimental conditions you wish to model. It is also important to consider which force field you use to generate the partial or test charges and atomic radii for this pqr file. For biomacromolecules, tools such as pdb2pqr or H++ are useful for this step.
    NB: Some software tools may try to optimize the structure (like pdb2pqr by default), and may even flip side-chains. In this case, the intial pdb should not be used in the later stages for consistency (see Q:The energy in column Elec-1 is not comparable to the energy in column Elec-2).
  2. Compute Poisson-Boltzmann electrostatic potential grids for your solutes using the pqr files you have generated. Common PB solvers used for this step are APBS, UHBD or Delphi. Make sure these calculations are consistent with the conditions you wish to simulate, eg the temperature and ionic strength (see also Q:Should I use linear or non-linear Poisson-Boltzmann equation?). SDA expects electrostatic grids to have units of kcal/(mol·e). APBS, for instance, creates grids in units of kBT/e. These can be converted to kcal/(mol·e) by using the convert_grid module of SDA with a conversion factor of 0.6.
  3. Generate pdb files that have identical coordinates as your pqr files, but with hydrogen atoms removed. These should be used during later preparation steps and during simulations, rather than the initial pdb files used in step 1, to ensure that they are consistent with the electrostatic grids you have generated.
  4. Compute the effective charges with the ECM tools provided in SDA. (See Q:Can I check the effective charges calculations?). Pay attention to the ionic strength, it should be consistent with that used in the Poisson-Boltzmann calculations in step 2. Note also that the intial step of an ECM calculation, the placement of test charges on certain atoms, is only automated for standard amino acid and nucleotide residues. If your system contains other modified residues or other small molecules, for example cofactors or drug molecules, care must be taken to choose suitable test charge sites and values, ensuring the correct total charge is maintained.
  5. Compute the other grids if needed : electrostatic desolvation, hydrophobici (non-polar) desolvation and soft-core repulsive grids with make_edhdlj_grid. You can adjust the size : hydrophobic desolvation interactions are short-range, for instance, so that these grids can generally be smaller. Again, when generating the electrostatic desolvation grids, you must ensure that the ionic strength used is consistent with that used in other steps or is set to zero, depending on the parameterization of the electrostatic desolvation term that you are using, (for details, see "Multiplicative factors applied to the grids" in the "Description of input files").
  6. Check the following:
    • Following all steps, check the output files for Warning or Error messages. For instance, some atom names in your solutes may not be recognized by the program. In this case, or should you want to override any default values, you need to generate an additional add_atoms file.
    • Check the total charge of your system at all stages as well. The sum of the charges in your pqr files, the values reported during your Poisson-Boltzmann calculations, and the sum of the test charges used in your ECM calculations should be identical, and should always be an integer value. Note that this is not true for the effective charges that are created by ECM. These can have non-integer sums, and, in general, sum to greater than the original total charge.
      If using pdb2pqr to generate your pqr files, pay attention to the terminal residues of all protein chains. By default, it generates charged termini, although you can create pqr files that represent neutral termini. You must check that your test charges agree with your choice, but be aware that the total charge is not a suitable test of this.
    • Visualize all grids in VMD or pymol. In particular check that the electrostatic grids are large enough that the electrostatic potential at the grid boundary is sufficiently small. Often, checking that the 0.05 kcal/(mol·e) isosurface surrounding a solute is fully sampled in the grid is used as a check for this. If this is not the case, you may need to use a larger grid for your simulations. If your grid does not contain this isosurface, but the potential appears spherically symmetric at the grid boundary, you may consider using the Debye-Hückel approximation to long range electrostatic interactions, that is supported in SDA. This option is particularly suitable if you have a high charge density on your solute, or if you are modelling a low ionic strength solvent.
  7. You can run short bimolecular ("sda_2proteins") simulations and check the following:
    • The electrostatic energy terms in your complex or trajectory files should be reasonably symmetric (they will not be identical, due to the use of the ECM approximation, but should not be different by a factor a ten)
    • The total binding energy between your solutes appears reasonable (it is in units of kBT)
    • There should be no 'boosts', or at least very few, applied.
  8. If these tests fail, begin again from step 1, taking care to ensure all steps are consistent with each other.
  9. If the tests are passed, you can start production simulations. You may want to consider the following to optimize your efficiency:
    • If one of the solutes is large, activate the save_access option in your input file. This will save the solvent accessibility file that is generated at the start of a simulation, and this step can then be skipped in subsequent simulations, reducing the time required.
    • If you are running SDAMM simulations in parallel with OpenMP, first test the scaling of your simulation with different numbers of CPU cores. Some systems may not scale well to large numbers of cores.
    • If a solid surface is present in your system, include this as the final solute in your input file. This can greatly improve the scaling.
    • If running in parallel with OpenMP and you are recording a complexes file, two parameters (merge_step and size_thread) control how often threads are synchronized and how many complexes are saved for each thread. Changing these values can affect the efficiency of your simulations, but can also alter the final recorded complexes.
This procedure assumes that you want to run your simulations without hydrogens in your pdb files. In certain cases, for example when using the ProMetCS force field (as in examples/aubs/) the method has been validated assuming all hydrogen atoms are present and using a Lennard-Jones term in the force field. In this case, the radii of the probes used for solvent accessible surface generation steps are different. For simulations of protein-protein association, we recommend that you delete the hydrogen atoms from your structures (this should reduce the cost of your simulations). The default values of the probe radii may be set in the SDA input file in this case:
  • probep = 1.7
  • probew = 1.4
Their sum (3.1 Å), represents an approximation to the length of a hydrogen bond. probep values of 1.7-1.8 Å roughly correspond to the average radius of the surface atoms of a protein and a value of probep = 1.77 is used in the examples.
Can I check the effective charges calculations?
  • Try to follow the procedure described above
  • Check in the intermediate test charge file (created by ecm_mksites) that all charged residues or molecules are present
  • As a general rule, the absolute value of the total effective charges should be larger than that in the original pqr file (it depends on the ionic strength also).
  • Check that you used the same ionic strength in all steps.
  • Visualize the electrostatic potential grid and the effective charges using vmd or pymol to check that they are consistent.
  • Finally check than the electrostatic energy is symmetric between the solutes (see Q:The energy in column Elec-1 is not comparable to the energy in column Elec-2)
Should I use linear or non-linear Poisson-Boltzmann equation?
For highly charged solutes (eg DNA or charged surfaces), the non-linear Poisson-Boltzmann equation is usually more suitable. For proteins, the linear Poisson-Boltzmann equation is usually adequate
In cases where the non-linear solution gave more satisfactory results, we have noted that :
  • It helps to improve the electrostatic symmetry between the solutes
  • In the case of flexible solutes, the energies obtained in simulations may be more consistent between the different conformations (see the Importin-Ran tutorial)
As a drawback, the Effective Charge Model is based on the linear approximation.
We advise testing of both linear and non-linear forms, and using the non-linear form only if it improves the results.
How do I change solvent ionic strength conditions in SDA calculations ?
You have to set the ionic strength value in calculations of:
  • Electrostatic potentials: in the input file for APBS or UHBD.
  • Effective charges: input file or as command line arguments of ecm_all.
  • Electrostatic desolvation grids: in the input file for make_edhdlj_grid.
You must use the same values in all steps for consistency. Note, however, that the ionic strength may need to be set to zero for the calculation of the electrostatic desolvation grid, depending on the grid multiplication factor (edfct) chosen, (for details, see "Multiplicative factors applied to the grids" in the "Description of input files").
Should I always use ECM to run SDA ?
Other charges may be used in place of effective charges. Bear in mind that, on the one hand, electrostatic interactions calculated using test charges may be less accurate than interactions calculated using effective charges, but on the other hand, the accuracy of calculations with effective charges declines as solutes approach each other.
To check the accuracy of interaction energies calculated with SDA, you can do an sda_energy calculation for a set of representative complexes and compare these energies with those calculated using a Poisson-Boltzmann solver (for example, APBS).