dig.ggraph.utils

Utilities under dig.ggraph.utils.

calculate_min_plogp

Calculate the eward that consists of log p penalized by SA and # long cycles, as described in (Kusner et al. 2017).

check_chemical_validity

Check the chemical validity of the mol object.

check_valency

Check that no atoms in the mol have exceeded their possible valency.

convert_radical_electrons_to_hydrogens

Convert radical electrons in a molecule into bonds to hydrogens.

gen_mol_from_one_shot_tensor

Construct molecules from the node tensors and adjacency tensors generated by one-shot molecular graph generation methods.

reward_target_molecule_similarity

Calculate the similarity, based on tanimoto similarity between the ECFP fingerprints of the x molecule and target molecule.

steric_strain_filter

Flag molecules based on a steric energy cutoff after max_num_iters iterations of MMFF94 forcefield minimization.

zinc_molecule_filter

Flag molecules based on problematic functional groups as provided set of ZINC rules from http://blaster.docking.org/filtering/rules_default.txt.

calculate_min_plogp(mol)[source]

Calculate the eward that consists of log p penalized by SA and # long cycles, as described in (Kusner et al. 2017). Scores are normalized based on the statistics of 250k_rndm_zinc_drugs_clean.smi dataset.

Parameters

mol – Rdkit mol object

Return type

float

check_chemical_validity(mol)[source]

Check the chemical validity of the mol object. Existing mol object is not modified. Radicals pass this test.

Parameters

mol – Rdkit mol object

Return type

bool, True if chemically valid, False otherwise

check_valency(mol)[source]

Check that no atoms in the mol have exceeded their possible valency.

Parameters

mol – Rdkit mol object

Return type

bool, True if no valency issues, False otherwise

convert_radical_electrons_to_hydrogens(mol)[source]

Convert radical electrons in a molecule into bonds to hydrogens. Only use this if molecule is valid. Return a new mol object.

Parameters

mol – Rdkit mol object

Return type

Rdkit mol object

gen_mol_from_one_shot_tensor(adj, x, atomic_num_list, correct_validity=True, largest_connected_comp=True)[source]

Construct molecules from the node tensors and adjacency tensors generated by one-shot molecular graph generation methods.

Parameters
  • adj (Tensor) – The adjacency tensor with shape [number of samples, number of possible bond types, maximum number of atoms, maximum number of atoms].

  • x (Tensor) – The node tensor with shape [number of samples, number of possible atom types, maximum number of atoms].

  • atomic_num_list (list) – A list to specify what atom each channel of the 2nd dimension of :obj: x corresponds to.

  • correct_validity (bool, optional) – Whether to use the validity correction introduced by the paper MoFlow: an invertible flow model for generating molecular graphs. (default: True)

  • largest_connected_comp (bool, optional) – Whether to use the largest connected component as the final molecule in the validity correction.(default: True)

Return type

A list of rdkit mol object. The length of the list is number of samples.

Examples

>>> adj = torch.rand(2, 4, 38, 38)
>>> x = torch.rand(2, 10, 38)
>>> atomic_num_list = [6, 7, 8, 9, 15, 16, 17, 35, 53, 0]
>>> gen_mols = gen_mol_from_one_shot_tensor(adj, x, atomic_num_list)
>>> gen_mols
[<rdkit.Chem.rdchem.Mol>, <rdkit.Chem.rdchem.Mol>]
reward_target_molecule_similarity(mol, target, radius=2, nBits=2048, useChirality=True)[source]

Calculate the similarity, based on tanimoto similarity between the ECFP fingerprints of the x molecule and target molecule.

Parameters
  • mol – Rdkit mol object

  • target – Rdkit mol object

Return type

float, [0.0, 1.0]

steric_strain_filter(mol, cutoff=0.82, max_attempts_embed=20, max_num_iters=200)[source]

Flag molecules based on a steric energy cutoff after max_num_iters iterations of MMFF94 forcefield minimization. Cutoff is based on average angle bend strain energy of molecule

Parameters
  • mol – Rdkit mol object

  • cutoff (float, optional) – Kcal/mol per angle . If minimized energy is above this threshold, then molecule fails the steric strain filter. (default: 0.82)

  • max_attempts_embed (int, optional) – Number of attempts to generate initial 3d coordinates. (default: 20)

  • max_num_iters (int, optional) – Number of iterations of forcefield minimization. (default: 200)

Return type

bool, True if molecule could be successfully minimized, and resulting energy is below cutoff, otherwise False.

zinc_molecule_filter(mol)[source]

Flag molecules based on problematic functional groups as provided set of ZINC rules from http://blaster.docking.org/filtering/rules_default.txt.

Parameters

mol – Rdkit mol object

Return type

bool, returns True if molecule is okay (ie does not match any of therules), False if otherwise.