Skip to content

gbsa_pipeline

GBSA pipeline package.

Optimize GBSA calculations for speed

Modules:

  • change_defaults

    Creating custom protocol with one setter per GROMACS parameter.

  • change_defaults_enum

    Classes for availible options in gms parameters.

  • change_params

    General parameter change helpers for GROMACS .mdp files.

  • cli

    Command-line interface for the GBSA pipeline.

  • config

    Top-level RunConfig model for driving the pipeline from a TOML file.

  • docking

    Lightweight docking adapter layer for Vina and helpers.

  • equilibration

    Preparing and running a heating procedure.

  • frcmod_parametrization

    Utilities for loading AMBER parameters into the GBSA pipeline.

  • gmx_edit_defaults

    Adapting the general change funtion for GROMACS formatting.

  • gromacs_index

    Module to generate gromacs index files for a system.

  • ligand_preparation

    Reading ligand from SDF file, standardizing with MolVS and adding hydrogens with RDKIT.

  • minimization

    Preparing minimization protocol and running.

  • parametrization

    Parametrize protein-ligand complexes.

  • parametrization_enum

    Force field and charge method enumerations for parametrization.

  • pipeline

    Functional pipeline runner — orchestrates all MD simulation stages.

  • solvation_box

    Solvation helpers with configurable solvent, box shape, padding, and ions.

  • solvation_openmm

    OpenMM/ParmEd solvation that bypasses BioSimSpace IO.

Classes:

  • AmberFFInput

    Inputs for converting AMBER frcmod + mol2 files to an OpenMM XML.

  • AmberInput

    Validated inputs for loading a pre-parametrized AMBER system.

  • ParametrisedComplex

    Parametrised protein-ligand complex ready for solvation and MD.

  • ParametrizationConfig

    Force field and charge method choices for a parametrization run.

  • ParametrizationInput

    Validated inputs for a parametrization run.

  • RunConfig

    Top-level configuration for a complete GBSA pipeline run.

  • SolvatedComplex

    Solvated protein-ligand complex produced by :func:solvate_openmm.

Functions:

  • build_amber_ff_xml

    Build a unified OpenMM ForceField XML from AMBER frcmod and mol2 files.

  • load_amber_complex

    Load an AMBER prmtop + inpcrd and export to GROMACS .gro / .top.

  • parametrize

    Parametrize a protein-ligand complex without running tleap.

  • run_pipeline

    Run the full GBSA pipeline from a validated :class:~gbsa_pipeline.config.RunConfig.

  • solvate_openmm

    Solvate a parametrised complex with OpenMM + ParmEd (no BSS IO).

AmberFFInput

Bases: BaseModel

Inputs for converting AMBER frcmod + mol2 files to an OpenMM XML.

The generated XML contains all atom types, bonded parameters, LJ parameters, and residue templates needed by OpenMM's ForceField. Pass the output path to :class:~gbsa_pipeline.parametrization.ParametrizationConfig via extra_ff_files alongside the standard protein/water XML files.

Parameters

frcmod_files: One or more AMBER frcmod files (e.g. from MCPB.py) that supply custom bond, angle, dihedral, and LJ parameters. residue_mol2s: Mol2 files for non-standard residues (e.g. metal-coordinating CYS/HIS modified by MCPB.py, bare metal ions). Each file must contain exactly one @<TRIPOS>MOLECULE block. protein_ff: Protein force field. Determines which base AMBER parameter files are loaded (parm19.dat + frcmod.ff14SB for FF14SB, etc.). ligand_ff: Ligand force field. Determines gaff.dat vs gaff2.dat. output_xml: Path where the unified XML is written. A file inside a temporary directory is used when None.

AmberInput

Bases: BaseModel

Validated inputs for loading a pre-parametrized AMBER system.

Parameters

prmtop: AMBER parameter/topology file (.prmtop / .parm7), produced by e.g. tleap, MCPB.py, or antechamber. inpcrd: AMBER coordinate file (.inpcrd / .rst7). output_dir: Directory for GROMACS output files. A temporary directory is created when None.

ParametrisedComplex dataclass

ParametrisedComplex(
    gro_file: Path,
    top_file: Path,
    config: ParametrizationConfig,
    forcefield: Any = None,
    parmed_structure: Any = None,
)

Parametrised protein-ligand complex ready for solvation and MD.

Attributes:

gro_file: GROMACS coordinate file (.gro) produced by ParmEd. top_file: GROMACS topology file (.top) produced by ParmEd. config: The force field configuration used to produce this complex. Stored so that downstream steps can record or reproduce the run. forcefield: The OpenMM ForceField (with GAFF registered and ligand charges already assigned) used during parametrization. Passed directly to :func:~gbsa_pipeline.solvation_openmm.solvate_openmm so that water XML can be loaded into it without rebuilding from scratch or re-running AM1-BCC. None when the complex was loaded from disk (e.g. :func:~gbsa_pipeline.frcmod_parametrization.load_amber_complex). parmed_structure: The ParmEd Structure holding every protein+ligand force field parameter produced during parametrization. Passed directly to :func:~gbsa_pipeline.solvation_openmm.solvate_openmm to avoid reloading from the GROMACS files. None when loaded from disk.

ParametrizationConfig

Bases: BaseModel

Force field and charge method choices for a parametrization run.

Defaults to AMBER ff14SB + GAFF2 + AM1-BCC. Use the class-method presets for the most common combinations, or construct directly to override individual axes.

Examples:

ParametrizationConfig() # all defaults ParametrizationConfig(protein_ff=ProteinFF.FF19SB) # swap protein FF ParametrizationConfig.amber14_gaff2_nagl() # preset with NAGL charges

Methods:

amber14_gaff2 classmethod

amber14_gaff2() -> ParametrizationConfig

AMBER ff14SB + GAFF2 + AM1-BCC charges (default).

Source code in src/gbsa_pipeline/parametrization.py
70
71
72
73
@classmethod
def amber14_gaff2(cls) -> ParametrizationConfig:
    """AMBER ff14SB + GAFF2 + AM1-BCC charges (default)."""
    return cls(protein_ff=ProteinFF.FF14SB, charge_method=ChargeMethod.AM1BCC)

amber14_gaff2_nagl classmethod

amber14_gaff2_nagl() -> ParametrizationConfig

AMBER ff14SB + GAFF2 + NAGL graph-neural-network charges.

Source code in src/gbsa_pipeline/parametrization.py
80
81
82
83
@classmethod
def amber14_gaff2_nagl(cls) -> ParametrizationConfig:
    """AMBER ff14SB + GAFF2 + NAGL graph-neural-network charges."""
    return cls(charge_method=ChargeMethod.NAGL)

amber19_gaff2 classmethod

amber19_gaff2() -> ParametrizationConfig

AMBER ff19SB + GAFF2 + AM1-BCC charges.

Source code in src/gbsa_pipeline/parametrization.py
75
76
77
78
@classmethod
def amber19_gaff2(cls) -> ParametrizationConfig:
    """AMBER ff19SB + GAFF2 + AM1-BCC charges."""
    return cls(protein_ff=ProteinFF.FF19SB, charge_method=ChargeMethod.AM1BCC)

ParametrizationInput

Bases: BaseModel

Validated inputs for a parametrization run.

Parameters

protein_pdb: Path to the protein PDB file. Must exist. ligand_sdf: Path to the ligand SDF file with embedded 3-D coordinates. Must exist. config: Force field and charge method selection. Defaults to ParametrizationConfig() (ff14SB + GAFF2 + AM1-BCC). net_charge: Formal charge of the ligand in elementary charge units. None lets the charge assignment toolkit determine it automatically. work_dir: Directory where intermediate and output files are written. When None a temporary directory is created automatically.

RunConfig

Bases: BaseModel

Top-level configuration for a complete GBSA pipeline run.

Load from a TOML file with :meth:from_toml. Each section maps to a nested model. The [md] section accepts any field of :class:~gbsa_pipeline.change_defaults.GromacsParams.

Example:

[system]
protein = "protein.pdb"
ligand  = "ligand.sdf"

[solvation]
water_model = "tip3p"
padding = 10.0

[md]
nsteps = 500000
dt = 0.002
tcoupl = "v-rescale"
ref_t = 300.0

Methods:

  • from_toml

    Load and validate a :class:RunConfig from a TOML file.

  • to_parametrization_input

    Build a :class:~gbsa_pipeline.parametrization.ParametrizationInput from this config.

from_toml classmethod

from_toml(path: Path) -> RunConfig

Load and validate a :class:RunConfig from a TOML file.

Parameters

path: Path to the .toml configuration file.

Returns:

RunConfig Validated configuration object.

Source code in src/gbsa_pipeline/config.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
@classmethod
def from_toml(cls, path: Path) -> RunConfig:
    """Load and validate a :class:`RunConfig` from a TOML file.

    Parameters
    ----------
    path:
        Path to the ``.toml`` configuration file.

    Returns:
    -------
    RunConfig
        Validated configuration object.
    """
    with open(path, "rb") as f:
        data: dict[str, Any] = tomllib.load(f)
    return cls.model_validate(data)

to_parametrization_input

to_parametrization_input(
    work_dir: Path,
) -> ParametrizationInput

Build a :class:~gbsa_pipeline.parametrization.ParametrizationInput from this config.

Parameters

work_dir: Directory where parametrization output files will be written.

Returns:

ParametrizationInput Ready to pass to :func:~gbsa_pipeline.parametrization.parametrize.

Raises:

ValueError If system.ligand is None (ligand is required for parametrization).

Source code in src/gbsa_pipeline/config.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def to_parametrization_input(self, work_dir: Path) -> ParametrizationInput:
    """Build a :class:`~gbsa_pipeline.parametrization.ParametrizationInput` from this config.

    Parameters
    ----------
    work_dir:
        Directory where parametrization output files will be written.

    Returns:
    -------
    ParametrizationInput
        Ready to pass to :func:`~gbsa_pipeline.parametrization.parametrize`.

    Raises:
    ------
    ValueError
        If ``system.ligand`` is ``None`` (ligand is required for parametrization).
    """
    if self.system.ligand is None:
        raise ValueError(
            "system.ligand must be set to run the parametrization stage. "
            "Provide a ligand SDF path in the [system] section of your config."
        )
    return ParametrizationInput(
        protein_pdb=self.system.protein,
        ligand_sdf=self.system.ligand,
        config=ParametrizationConfig(
            protein_ff=self.forcefield.protein_ff,
            ligand_ff=self.forcefield.ligand_ff,
            charge_method=self.forcefield.charge_method,
            extra_ff_files=self.system.extra_ff_files,
        ),
        net_charge=self.system.net_charge,
        work_dir=work_dir,
    )

SolvatedComplex dataclass

SolvatedComplex(
    gro_file: Path,
    top_file: Path,
    parmed_structure: Any = None,
)

Solvated protein-ligand complex produced by :func:solvate_openmm.

Carries both the on-disk GROMACS files (written for inspection and checkpoint purposes) and the in-memory ParmEd structure so that downstream steps can use either without repeating disk I/O.

Attributes:

gro_file: GROMACS coordinate file (.gro) of the solvated system. top_file: GROMACS topology file (.top) of the solvated system. parmed_structure: In-memory ParmEd Structure with all force-field parameters. None when the object is constructed from existing files rather than from a live parametrization run.

Methods:

  • load_bss

    Load this complex as a BioSimSpace System for MD stages.

load_bss

load_bss() -> Any

Load this complex as a BioSimSpace System for MD stages.

Reads from the GROMACS files already written to disk. The returned system is ready for minimization, equilibration, and production MD.

Returns:

BioSimSpace._SireWrappers.System The solvated system loaded into BioSimSpace.

Source code in src/gbsa_pipeline/solvation_openmm.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def load_bss(self) -> Any:
    """Load this complex as a BioSimSpace System for MD stages.

    Reads from the GROMACS files already written to disk.  The returned
    system is ready for minimization, equilibration, and production MD.

    Returns:
    -------
    BioSimSpace._SireWrappers.System
        The solvated system loaded into BioSimSpace.
    """
    if not self.gro_file.exists() or not self.top_file.exists():
        raise FileNotFoundError(f"SolvatedComplex files not found: {self.gro_file}, {self.top_file}.")
    import BioSimSpace as BSS  # noqa: PLC0415

    return BSS.IO.readMolecules([str(self.gro_file), str(self.top_file)])

build_amber_ff_xml

build_amber_ff_xml(inp: AmberFFInput) -> Path

Build a unified OpenMM ForceField XML from AMBER frcmod and mol2 files.

Loads the base AMBER parameter files for the requested force fields, adds the custom frcmod parameters, merges mol2 residue templates directly into the :class:parmed.openmm.OpenMMParameterSet, and writes a single XML file that OpenMM's ForceField can read.

Parameters

inp: Validated inputs.

Returns:

  • Path

    Path to the written XML file.

Raises:

  • RuntimeError

    If the AMBER parameter directory cannot be found.

  • ValueError

    If any atom type referenced in a residue template is not defined in the combined parameter set.

Source code in src/gbsa_pipeline/frcmod_parametrization.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
def build_amber_ff_xml(inp: AmberFFInput) -> Path:
    """Build a unified OpenMM ForceField XML from AMBER frcmod and mol2 files.

    Loads the base AMBER parameter files for the requested force fields, adds
    the custom frcmod parameters, merges mol2 residue templates directly into
    the :class:`parmed.openmm.OpenMMParameterSet`, and writes a single XML
    file that OpenMM's ``ForceField`` can read.

    Parameters
    ----------
    inp:
        Validated inputs.

    Returns:
        Path to the written XML file.

    Raises:
        RuntimeError: If the AMBER parameter directory cannot be found.
        ValueError: If any atom type referenced in a residue template is not
            defined in the combined parameter set.
    """
    parm_dir = _find_amber_parm_dir()

    # Base AMBER parameter files for the chosen force fields.
    base_files = [str(parm_dir / f) for f in _PROTEIN_BASE_PARAMS[inp.protein_ff]]
    base_files.append(str(parm_dir / _LIGAND_BASE_PARAMS[inp.ligand_ff]))

    # Load base params + user frcmod files into a single AmberParameterSet.
    amber_parm = pmd.amber.AmberParameterSet(
        *base_files,
        *(str(f) for f in inp.frcmod_files),
    )
    ff = pmd.openmm.OpenMMParameterSet.from_parameterset(amber_parm)

    # Merge mol2 residue templates directly into the parameter set.
    for mol2_path in inp.residue_mol2s:
        mol2 = pmd.load_file(str(mol2_path))
        if isinstance(mol2, pmd.modeller.ResidueTemplateContainer):
            ff.residues.update(mol2.to_library())
        else:
            ff.residues[mol2.name] = mol2

    # Validate: all types used in templates must be defined.
    defined = set(ff.atom_types.keys())
    used: set[str] = set()
    for templ in ff.residues.values():
        for atom in templ.atoms:
            if at := getattr(atom, "type", None):
                used.add(str(at))
    missing = used - defined
    if missing:
        raise ValueError(
            f"Residue templates reference atom types not defined in the "
            f"parameter set: {sorted(missing)}. "
            f"Add a frcmod file that defines these types."
        )

    # Write unified XML.
    output_xml = inp.output_xml or Path(tempfile.mkdtemp(prefix="gbsa_ff_")) / "combined.xml"
    ff.write(str(output_xml), write_unused=True)
    return output_xml

load_amber_complex

load_amber_complex(inp: AmberInput) -> ParametrisedComplex

Load an AMBER prmtop + inpcrd and export to GROMACS .gro / .top.

Uses ParmEd to read the AMBER files directly and write the GROMACS files.

Parameters

inp: Validated inputs.

Returns:

Source code in src/gbsa_pipeline/frcmod_parametrization.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def load_amber_complex(inp: AmberInput) -> ParametrisedComplex:
    """Load an AMBER prmtop + inpcrd and export to GROMACS ``.gro`` / ``.top``.

    Uses ParmEd to read the AMBER files directly and write the GROMACS files.

    Parameters
    ----------
    inp:
        Validated inputs.

    Returns:
        :class:`~gbsa_pipeline.parametrization.ParametrisedComplex` with
        paths to the GROMACS coordinate and topology files.
    """
    work_dir = inp.output_dir or Path(tempfile.mkdtemp(prefix="gbsa_amber_"))
    work_dir.mkdir(parents=True, exist_ok=True)

    structure = pmd.load_file(str(inp.prmtop), xyz=str(inp.inpcrd))

    gro_file = work_dir / "complex.gro"
    top_file = work_dir / "complex.top"
    structure.save(str(top_file), format="gromacs")
    structure.save(str(gro_file))

    return ParametrisedComplex(
        gro_file=gro_file,
        top_file=top_file,
        config=ParametrizationConfig(),
    )

parametrize

parametrize(
    inp: ParametrizationInput,
) -> ParametrisedComplex

Parametrize a protein-ligand complex without running tleap.

Uses OpenMM force field XML files for the protein and GAFFTemplateGenerator for the ligand. Partial charges are assigned via the method specified in inp.config.charge_method (default: AM1-BCC via sqm). The system is exported to GROMACS .gro and .top files via ParmEd.

Parameters

inp: Validated parametrization inputs including file paths, force field configuration, and optional work directory.

Returns:

ParametrisedComplex Frozen dataclass holding the paths to the output GROMACS files and the configuration used.

Source code in src/gbsa_pipeline/parametrization.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def parametrize(inp: ParametrizationInput) -> ParametrisedComplex:
    """Parametrize a protein-ligand complex without running tleap.

    Uses OpenMM force field XML files for the protein and
    ``GAFFTemplateGenerator`` for the ligand. Partial charges are assigned
    via the method specified in ``inp.config.charge_method`` (default:
    AM1-BCC via sqm). The system is exported to GROMACS ``.gro`` and
    ``.top`` files via ParmEd.

    Parameters
    ----------
    inp:
        Validated parametrization inputs including file paths, force field
        configuration, and optional work directory.

    Returns:
    -------
    ParametrisedComplex
        Frozen dataclass holding the paths to the output GROMACS files and
        the configuration used.
    """
    return _parametrize_openmm(inp)

run_pipeline

run_pipeline(config: RunConfig, output_dir: Path) -> None

Run the full GBSA pipeline from a validated :class:~gbsa_pipeline.config.RunConfig.

Stages (each writes output to a numbered subdirectory):

  1. Parametrize — assign force field parameters to protein + ligand.
  2. Solvate — add water box and counter-ions.
  3. Minimize — energy minimization.
  4. Equilibrate — NVT heating from 0 K to 300 K.
  5. Production MD — NpT simulation driven by [md] section params.

Parameters

config: Validated run configuration (usually loaded via :meth:~gbsa_pipeline.config.RunConfig.from_toml). output_dir: Root directory for all output. Created if it does not exist.

Source code in src/gbsa_pipeline/pipeline.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def run_pipeline(config: RunConfig, output_dir: Path) -> None:
    """Run the full GBSA pipeline from a validated :class:`~gbsa_pipeline.config.RunConfig`.

    Stages (each writes output to a numbered subdirectory):

    1. **Parametrize** — assign force field parameters to protein + ligand.
    2. **Solvate** — add water box and counter-ions.
    3. **Minimize** — energy minimization.
    4. **Equilibrate** — NVT heating from 0 K to 300 K.
    5. **Production MD** — NpT simulation driven by ``[md]`` section params.

    Parameters
    ----------
    config:
        Validated run configuration (usually loaded via
        :meth:`~gbsa_pipeline.config.RunConfig.from_toml`).
    output_dir:
        Root directory for all output. Created if it does not exist.
    """
    output_dir.mkdir(parents=True, exist_ok=True)
    _log_config(config, output_dir)

    # Stage 1: Parametrize
    logger.info("─── Stage 1/5: Parametrization ───")
    logger.info(
        "  protein_ff=%s  ligand_ff=%s  charge_method=%s",
        config.forcefield.protein_ff,
        config.forcefield.ligand_ff,
        config.forcefield.charge_method,
    )
    param_dir = output_dir / "01_parametrize"
    parametrized = _run_stage("parametrize", lambda: parametrize(config.to_parametrization_input(param_dir)))
    logger.info("  Done → %s, %s", parametrized.gro_file.name, parametrized.top_file.name)

    # Stage 2: Solvate with OpenMM + ParmEd (bypasses BSS IO)
    logger.info("─── Stage 2/5: Solvation ───")
    sol = config.solvation
    box_desc = f"padding={sol.padding} nm" if sol.padding is not None else f"box_size={sol.box_size} nm"
    logger.info(
        "  water_model=%s  box_shape=%s  %s  ion_conc=%s mol/L",
        sol.water_model,
        sol.box_shape,
        box_desc,
        sol.ion_concentration,
    )
    solvation_params = _to_solvation_params(sol)
    sol_dir = output_dir / "02_solvated"
    solvated: SolvatedComplex = _run_stage(
        "solvation",
        lambda: solvate_openmm(
            parametrized=parametrized,
            params=solvation_params,
            output_gro=sol_dir / "solvated.gro",
            output_top=sol_dir / "solvated.top",
        ),
    )
    logger.info("  Saved → %s / %s", solvated.gro_file.name, solvated.top_file.name)

    logger.info("  Loading solvated system into BSS …")
    system = solvated.load_bss()
    logger.info("  Loaded %d molecules (%d atoms)", system.nMolecules(), system.nAtoms())

    # Stage 3: Minimize
    logger.info("─── Stage 3/5: Minimization ───")
    logger.info(
        "  nsteps=%d  emtol=%.1f kJ/mol/nm",
        config.minimization.nsteps,
        config.minimization.emtol,
    )
    min_dir = output_dir / "03_minimized"
    min_dir.mkdir(parents=True, exist_ok=True)
    system = _run_stage(
        "minimization",
        lambda: run_minimization(nsteps=config.minimization.nsteps, system=system, work_dir=min_dir),
    )
    logger.info("  Done. Saving …")
    export_gromacs_top_gro(system, str(min_dir / "minimized"))
    logger.info("  Saved → 03_minimized.gro / .top")

    # Stage 4: Equilibrate
    logger.info("─── Stage 4/5: Equilibration ───")
    logger.info("  NVT heating 0→300 K over %.1f ps", config.equilibration.simulation_time_ps)
    equil_time = config.equilibration.simulation_time_ps * BSS.Units.Time.picosecond
    equil_dir = output_dir / "04_equilibrated"
    equil_dir.mkdir(parents=True, exist_ok=True)
    system = _run_stage("equilibration", lambda: run_heating(equil_time, system, work_dir=equil_dir))
    logger.info("  Done. Saving …")
    export_gromacs_top_gro(system, str(equil_dir / "equilibrated"))
    logger.info("  Saved → 04_equilibrated.gro / .top")

    # Stage 5: Production MD
    logger.info("─── Stage 5/5: Production MD ───")
    logger.info(
        "  integrator=%s  nsteps=%d  dt=%s ps  tcoupl=%s  pcoupl=%s",
        config.md.integrator,
        config.md.nsteps,
        config.md.dt,
        config.md.tcoupl,
        config.md.pcoupl,
    )
    prod_dir = output_dir / "05_production"
    prod_dir.mkdir(parents=True, exist_ok=True)
    system, _ = _run_stage(
        "production_md", lambda: run_gro_custom(parameters=config.md, system=system, work_dir=prod_dir)
    )
    logger.info("  Done. Saving …")
    export_gromacs_top_gro(system, str(prod_dir / "production"))
    logger.info("  Saved → 05_production.gro / .top")

    logger.info("Pipeline complete. Output written to %s", output_dir)

solvate_openmm

solvate_openmm(
    parametrized: ParametrisedComplex,
    params: SolvationParams,
    output_gro: Path,
    output_top: Path,
) -> SolvatedComplex

Solvate a parametrised complex with OpenMM + ParmEd (no BSS IO).

Reuses the ForceField and ParmEd Structure carried by parametrized — built in the parametrization stage — so that AM1-BCC charge assignment is not repeated. Writes GROMACS .gro and .top files and returns a :class:SolvatedComplex that carries both the file paths and the in-memory ParmEd structure.

Parameters

parametrized: Output of :func:~gbsa_pipeline.parametrization.parametrize. Must carry forcefield and parmed_structure (populated by the OpenMM parametrization path). params: Solvation settings (water model, box shape/size, ion concentration). output_gro: Destination path for the solvated GROMACS coordinate file. output_top: Destination path for the solvated GROMACS topology file.

Returns:

SolvatedComplex Dataclass holding the written file paths and the in-memory ParmEd structure of the solvated system.

Raises:

ValueError If parametrized was not produced by the OpenMM parametrization path (i.e. forcefield or parmed_structure is None).

Source code in src/gbsa_pipeline/solvation_openmm.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def solvate_openmm(
    parametrized: ParametrisedComplex,
    params: SolvationParams,
    output_gro: Path,
    output_top: Path,
) -> SolvatedComplex:
    """Solvate a parametrised complex with OpenMM + ParmEd (no BSS IO).

    Reuses the ``ForceField`` and ParmEd ``Structure`` carried by
    *parametrized* — built in the parametrization stage — so that AM1-BCC
    charge assignment is not repeated.  Writes GROMACS ``.gro`` and ``.top``
    files and returns a :class:`SolvatedComplex` that carries both the file
    paths and the in-memory ParmEd structure.

    Parameters
    ----------
    parametrized:
        Output of :func:`~gbsa_pipeline.parametrization.parametrize`.
        Must carry ``forcefield`` and ``parmed_structure`` (populated by the
        OpenMM parametrization path).
    params:
        Solvation settings (water model, box shape/size, ion concentration).
    output_gro:
        Destination path for the solvated GROMACS coordinate file.
    output_top:
        Destination path for the solvated GROMACS topology file.

    Returns:
    -------
    SolvatedComplex
        Dataclass holding the written file paths and the in-memory ParmEd
        structure of the solvated system.

    Raises:
    ------
    ValueError
        If *parametrized* was not produced by the OpenMM parametrization path
        (i.e. ``forcefield`` or ``parmed_structure`` is ``None``).
    """
    if parametrized.forcefield is None or parametrized.parmed_structure is None:
        raise ValueError(
            "solvate_openmm requires parametrized.forcefield and "
            "parametrized.parmed_structure to be set. "
            "Use parametrize() (not load_amber_complex) before calling this function."
        )

    output_gro.parent.mkdir(parents=True, exist_ok=True)

    water_model = (
        WaterModel(str(params.water_model).lower())
        if not isinstance(params.water_model, WaterModel)
        else params.water_model
    )
    box_shape = BoxShape(str(params.shape).lower()) if not isinstance(params.shape, BoxShape) else params.shape

    # ------------------------------------------------------------------
    # 1. Reuse the in-memory ParmEd structure from parametrization
    #    (all protein+ligand force field parameters already assigned)
    # ------------------------------------------------------------------
    existing: Any = parametrized.parmed_structure
    n_orig = len(existing.atoms)
    logger.debug("Using in-memory ParmEd structure (%d atoms).", n_orig)

    # ------------------------------------------------------------------
    # 2. Extend the existing ForceField with water templates.
    #    GAFF is already registered with pre-assigned ligand charges,
    #    so AM1-BCC will not re-run.
    # ------------------------------------------------------------------
    water_xml = _WATER_XML[water_model]
    ff: ForceField = parametrized.forcefield
    logger.debug("Loading water FF (%s) into existing ForceField …", water_xml)
    ff.loadFile(water_xml)

    # ------------------------------------------------------------------
    # 3. Add solvent (water + ions) via OpenMM Modeller.
    #    Full ForceField ensures correct clash detection for all residues.
    # ------------------------------------------------------------------
    logger.debug("Creating Modeller from existing topology …")
    modeller = Modeller(existing.topology, existing.positions)

    kwargs: dict[str, Any] = {
        "model": _WATER_NAME[water_model],
        "neutralize": params.neutralize,
    }
    if params.ion_concentration is not None:
        kwargs["ionicStrength"] = params.ion_concentration * mm_unit.molar
    if params.padding is not None:
        kwargs["padding"] = params.padding * mm_unit.nanometer
    else:
        kwargs["boxSize"] = Vec3(params.box_size, params.box_size, params.box_size) * mm_unit.nanometer
    if box_shape is BoxShape.TRUNCATED_OCTAHEDRON:
        kwargs["boxShape"] = "octahedron"

    logger.debug(
        "Adding solvent (model=%s, %s, box_shape=%s) …",
        kwargs["model"],
        f"padding={params.padding} nm" if params.padding is not None else f"box_size={params.box_size} nm",
        box_shape,
    )
    modeller.addSolvent(ff, **kwargs)
    logger.debug("Solvated topology: %d atoms.", modeller.topology.getNumAtoms())

    # ------------------------------------------------------------------
    # 4. Build full OpenMM system → ParmEd structure.
    #    rigidWater=False keeps O-H bonds in HarmonicBondForce so ParmEd
    #    can resolve SETTLE geometry when writing GROMACS topology.
    # ------------------------------------------------------------------
    logger.debug("Creating solvated OpenMM system …")
    system = ff.createSystem(modeller.topology, nonbondedMethod=NoCutoff, constraints=None, rigidWater=False)
    logger.debug("Converting to ParmEd structure …")
    structure = pmd.openmm.load_topology(modeller.topology, system, modeller.positions)

    # ------------------------------------------------------------------
    # 5. Write GROMACS gro + top
    # ------------------------------------------------------------------
    logger.debug("Writing GROMACS topology → %s …", output_top)
    structure.save(str(output_top), format="gromacs", overwrite=True)
    logger.debug("Writing GROMACS coordinates → %s …", output_gro)
    structure.save(str(output_gro), overwrite=True)
    logger.debug("Solvated files written.")

    return SolvatedComplex(
        gro_file=output_gro,
        top_file=output_top,
        parmed_structure=structure,
    )