dig.ggraph.utils¶
Utilities under dig.ggraph.utils
.
Calculate the eward that consists of log p penalized by SA and # long cycles, as described in (Kusner et al. 2017). |
|
Check the chemical validity of the mol object. |
|
Check that no atoms in the mol have exceeded their possible valency. |
|
Convert radical electrons in a molecule into bonds to hydrogens. |
|
Construct molecules from the node tensors and adjacency tensors generated by one-shot molecular graph generation methods. |
|
Calculate the similarity, based on tanimoto similarity between the ECFP fingerprints of the x molecule and target molecule. |
|
Flag molecules based on a steric energy cutoff after max_num_iters iterations of MMFF94 forcefield minimization. |
|
Flag molecules based on problematic functional groups as provided set of ZINC rules from http://blaster.docking.org/filtering/rules_default.txt. |
- calculate_min_plogp(mol)[source]¶
Calculate the eward that consists of log p penalized by SA and # long cycles, as described in (Kusner et al. 2017). Scores are normalized based on the statistics of 250k_rndm_zinc_drugs_clean.smi dataset.
- Parameters
mol – Rdkit mol object
- Return type
- check_chemical_validity(mol)[source]¶
Check the chemical validity of the mol object. Existing mol object is not modified. Radicals pass this test.
- Parameters
mol – Rdkit mol object
- Return type
bool
, True if chemically valid, False otherwise
- check_valency(mol)[source]¶
Check that no atoms in the mol have exceeded their possible valency.
- Parameters
mol – Rdkit mol object
- Return type
bool
, True if no valency issues, False otherwise
- convert_radical_electrons_to_hydrogens(mol)[source]¶
Convert radical electrons in a molecule into bonds to hydrogens. Only use this if molecule is valid. Return a new mol object.
- Parameters
mol – Rdkit mol object
- Return type
Rdkit mol object
- gen_mol_from_one_shot_tensor(adj, x, atomic_num_list, correct_validity=True, largest_connected_comp=True)[source]¶
Construct molecules from the node tensors and adjacency tensors generated by one-shot molecular graph generation methods.
- Parameters
adj (Tensor) – The adjacency tensor with shape [
number of samples
,number of possible bond types
,maximum number of atoms
,maximum number of atoms
].x (Tensor) – The node tensor with shape [
number of samples
,number of possible atom types
,maximum number of atoms
].atomic_num_list (list) – A list to specify what atom each channel of the 2nd dimension of :obj: x corresponds to.
correct_validity (bool, optional) – Whether to use the validity correction introduced by the paper MoFlow: an invertible flow model for generating molecular graphs. (default:
True
)largest_connected_comp (bool, optional) – Whether to use the largest connected component as the final molecule in the validity correction.(default:
True
)
- Return type
A list of rdkit mol object. The length of the list is
number of samples
.
Examples
>>> adj = torch.rand(2, 4, 38, 38) >>> x = torch.rand(2, 10, 38) >>> atomic_num_list = [6, 7, 8, 9, 15, 16, 17, 35, 53, 0] >>> gen_mols = gen_mol_from_one_shot_tensor(adj, x, atomic_num_list) >>> gen_mols [<rdkit.Chem.rdchem.Mol>, <rdkit.Chem.rdchem.Mol>]
- reward_target_molecule_similarity(mol, target, radius=2, nBits=2048, useChirality=True)[source]¶
Calculate the similarity, based on tanimoto similarity between the ECFP fingerprints of the x molecule and target molecule.
- Parameters
mol – Rdkit mol object
target – Rdkit mol object
- Return type
float
, [0.0, 1.0]
- steric_strain_filter(mol, cutoff=0.82, max_attempts_embed=20, max_num_iters=200)[source]¶
Flag molecules based on a steric energy cutoff after max_num_iters iterations of MMFF94 forcefield minimization. Cutoff is based on average angle bend strain energy of molecule
- Parameters
mol – Rdkit mol object
cutoff (float, optional) – Kcal/mol per angle . If minimized energy is above this threshold, then molecule fails the steric strain filter. (default:
0.82
)max_attempts_embed (int, optional) – Number of attempts to generate initial 3d coordinates. (default:
20
)max_num_iters (int, optional) – Number of iterations of forcefield minimization. (default:
200
)
- Return type
bool
, True if molecule could be successfully minimized, and resulting energy is below cutoff, otherwise False.
- zinc_molecule_filter(mol)[source]¶
Flag molecules based on problematic functional groups as provided set of ZINC rules from http://blaster.docking.org/filtering/rules_default.txt.
- Parameters
mol – Rdkit mol object
- Return type
bool
, returns True if molecule is okay (ie does not match any of therules), False if otherwise.