=============================== Protein-Ligand Association in Crowded Media ===============================

Author: Riccardo Beccaria; Validation and editing: Abraham Muniz Chicharro, Stefan Richter, Rebecca Wade  (HITS, Heidelberg).

Date: 27 October 2025

This directory contains the input files and configuration needed to run a simulation of trypsin-benzamidine association in a crowded medium using two-resolution models. Specifically, this example simulates the interaction between trypsin and benzamidine in the presence of charged lysozyme (HEWL) at a concentration of 10 g/L. This example folder includes two subdirectories:

    1. grid_prep:   Contains the scripts and input files required to generate interaction grids for trypsin, benzamidine, and HEWL.

    2. assoc:       Contains the SDA input files needed to run the association simulations.

After running the simulations, the resulting trajectories can be analysed to explore binding pathways using Markov State Modeling (MSM).

Note: The following code has been tested on a HITS workstation, assuming that APBS is available and accessible from the system environment. All simulation input files are configured to run on computing clusters accessible from HITS, using SLURM for job submission via sbatch.


=============================== USAGE ===============================

----------------------- Grid Preparation -------------------

First, go into the "grid_prep" folder and generate all input files and interaction grids to do the simulations. The code 'grid_prep.sh' will first copy the trypsin and benzamidine files from the data_grid folder of the "try-ben_sdamd" examples (we use the same parametrisation of the sdamd protocol), secondly it will generate the missing lj repulsion grids for p1 and p2 (sdamd uses excluded volume grids, here lj repulsion grids are used due to the large number of molecules), as well as all the files and grid for lysozymes. To do so, run the command:

bash grid_prep.sh

Once the files are created, the script automatically moves them into a newly created directory named "data_grids", which contains all the input files needed to run the simulations.

---------------------- Running Simulations -----------------

Log in to an HITS cluster and navigate to the "assoc" folder. You can split the simulation into multiple parallel jobs to run the trajectories efficiently. The number of jobs is specified in the submit_all.sh script — by default, it is set to 10. To generate 10 SDA input files with different random seeds, run:

bash submit_all.sh

This will create 10 SDA input files, each with a unique seed, enabling parallel execution of independent trajectories.
To submit the simulation jobs using a SLURM job array, run the following command:

sbatch sdamm_crowd-single-cascade.sh

At the end of the simulations you will obtain 10 SDA output files, each containing the kons values associated to that job. To merge them all together run:

python ../../../auxi/association_crowded_media/encounters.py sdamm_{1..10}.out >kons.out 

This will print the kon table into the file named "kons.out". Now to convert the kons values into the fully crowded association rates, run:

python ../../../auxi/association_crowded_media/convert.py --input_file kons.out --output_file kons_full.out --Dphi_p1 0.009978 --Dphi_p2 0.08899 --Dzero_p1 0.01044 --Dzero_p2 0.090920

where Dphi_p1 and Dphi_p2 are the effective diffusion coefficients in the crowded media of respectively p1 and p2; Dzero_p1 and Dzero_p2 are the diluted diffusion coefficients as computed by hydropro. The script will print the fully crowded association rates in the "kons_full.out" output file.

NOTE: The effective diffusion coefficients can be estimated analysing sdamm trajectories with trans_diffusion SDA tool.

---------------------- Markov State Modelling ----------------- 

Once the simulations have ended it is possible to start Markov State Modelling (MSM) analysis for studying the binding pathways. The Markov state modelling scripts are based on the Deeptime Python library (https://deeptime-ml.github.io). In the auxi/MSM_scripts folder a .yml file is provided to create a conda environment with deeptime library and dependencies already installed. To do so navigate to the auxi/MSM_scripts folder and run:

conda env create -f MSM_env.yml 

Once the environment is created, navigate in the "assoc" folder for the MSM analysis. The first step is to extract encounter trajectories from the full set of trajectories. An encounter trajectory is defined as one in which the ligand approaches the protein (the minimum distance between the two falls below a specified cutoff) and ultimately ends in binding (as specified in "ReactionCriteria" GROUP of the SDA input file). The trajectory is truncated at the point where the ligand binds to the protein. To extract the encounter trajectories, run the following script:

../../../auxi/MSM_scripts/get_encounter_traj.sh ../data_grids/p1_noh.pdb ../data_grids/p2_noh.pdb 6.5

The first and second arguments are the pdb files of the protein and ligand respectively, the third is the cutoff to define when the ligand approaches the protein. This script analyses the SDA complexe files to identify which trajectories are encounter trajectories. For each detected encounter, it saves only the segment starting from when the ligand approaches the protein (i.e., when the minimum distance falls below the specified cutoff) until the moment the ligand binds to the protein. The script will save the encounter trajectories in the "folder_MSM". Each encounter trajectory is saved in 3 different ways:
    1. The fragment of the encounter trajectory as extracted from the SDA trajectory file
    2. The fragment of the encounter trajectory in the SDA format but in the protein reference system
    3. The fragment of the encounter trajectory in the protein reference system in xyz format      
Once the encounter trajectories have been selected, it is possible to build the MSM. To do so, run:

../../../auxi/MSM_scripts/build_MSM.sh 6 1

The first argument is the number of clusters and the second one is the lagtime used to build the MSM. The script will generate the "folder_MSM/folder_MSM" folder containing the built MSM. Specifically, in that folder, you can find:
    1. PDB files with the positions of the MSM clusters in the protein reference system
    2. The file "msm_rates" containing all the transition rates between the MSM clusters
    3. The file "msm_mfpt" containing the mean first passage times for the transitions in the MSM
    4. The picture "msm_network.png" containing a visual representation of the MSM 


To validate the MSM, you can run

../../../auxi/MSM_scripts/validate_MSM.sh 6

where the first argument is the number of clusters used for building the MSM. The script will create the "validate_folder" directory with three pictures:
    1. wcss_analysis.png with the results from the WCSS analysis for determining the number of clusters to use in building MSM
    2. implied_timescales.png with the implied timescales of the transition matrix eigenvalues as a function of lag time
    3. ck_test.png figure with the picture containing the Chapman-Kolmogorov (CK) test for the built MSM   

NOTE: Since the simulations used for this analysis end upon binding, the MSMs capture only the encounter phase and do not represent the full binding pathway. 


++++++++++++++++ SMALL EXAMPLE FOR TESTING +++++++++++++++

The following lines contain the commands to run a very short sdamm nambox simulation using the two resolutions adaptive model. The example is inteded only for debugging/testing pushed repository. The example can be run locally and only runs few trajectories of trypsin benzamidine associating in crowded environment of lysozymes at density 10g/L.
To generate sda grids and input files go to "grid_prep" folder and run the grid_prep script:

cd grid_prep
bash grid_prep.sh

Then go into the assoc folder and run short simulation:

cd ../assoc
../../../bin/sda_flex sdamm_crowd_short.in

This will generate a sdamm nambox simulation with two-resolutions adaptive model in crowded lysozymes environment at density of 10g/L. The expected simulation time is couple of minutes.