dig.ggraph.method¶
Method classes under dig.ggraph.method
.
The method base class for graph generation. 

The method class for GraphAF algorithm proposed in the paper GraphAF: a Flowbased Autoregressive Model for Molecular Graph Generation. 

The method class for GraphDF algorithm proposed in the paper GraphDF: A Discrete Flow Model for Molecular Graph Generation. 

The method class for GraphEBM algorithm proposed in the paper GraphEBM: Molecular Graph Generation with EnergyBased Models. 

The method class for the JTVAE algorithm proposed in the paper Junction Tree Variational Autoencoder for Molecular Graph Generation. 
 class Generator[source]¶
The method base class for graph generation. To write a new graph generation method, create a new class inheriting from this class and implement the functions.
 run_const_prop_opt(*args, **kwargs)[source]¶
Running molecule optimization for constrained optimization task.
 class GraphAF[source]¶
The method class for GraphAF algorithm proposed in the paper GraphAF: a Flowbased Autoregressive Model for Molecular Graph Generation. This class provides interfaces for running random generation, property optimization, and constrained optimization with GraphAF. Please refer to the example codes for usage examples.
 run_cons_optim(dataset, model_conf_dict, checkpoint_path, repeat_time=200, min_optim_time=50, num_max_node=25, temperature=0.7, atomic_num_list=[6, 7, 8, 9])[source]¶
Running molecule optimization for constrained optimization task.
 Parameters
dataset – The dataset class for loading molecules to be optimized. It is supposed to use dig.ggraph.dataset.ZINC800 as the dataset class.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
repeat_time (int, optional) – The maximum number of optimization times for each molecule before successfully optimizing it under the threshold 0.6. (default:
200
)min_optim_time (int, optional) – The minimum number of optimization times for each molecule. (default:
50
)num_max_node (int, optional) – The maximum number of nodes in the optimized molecular graphs. (default:
25
)temperature (float, optional) – A float numbers, the temperature parameter of prior distribution. (default:
0.75
)atomic_num_list (list, optional) – A list of integers, the list of atomic numbers indicating the node types in the optimized molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
(mols_0, mols_2, mols_4, mols_6), they are lists of optimized molecules (represented by rdkit Chem.Mol objects) under the threshold 0.0, 0.2, 0.4, 0.6, respectively.
 run_prop_optim(model_conf_dict, checkpoint_path, n_mols=100, num_min_node=7, num_max_node=25, temperature=0.75, atomic_num_list=[6, 7, 8, 9])[source]¶
Running graph generation for property optimization task.
 Parameters
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
n_mols (int, optional) – The number of molecules to generate. (default:
100
)num_min_node (int, optional) – The minimum number of nodes in the generated molecular graphs. (default:
7
)num_max_node (int, optional) – The maximum number of nodes in the generated molecular graphs. (default:
25
)temperature (float, optional) – A float numbers, the temperature parameter of prior distribution. (default:
0.75
)atomic_num_list (list, optional) – A list of integers, the list of atomic numbers indicating the node types in the generated molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
all_mols, a list of generated molecules represented by rdkit Chem.Mol objects.
 run_rand_gen(model_conf_dict, checkpoint_path, n_mols=100, num_min_node=7, num_max_node=25, temperature=0.75, atomic_num_list=[6, 7, 8, 9])[source]¶
Running graph generation for random generation task.
 Parameters
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
n_mols (int, optional) – The number of molecules to generate. (default:
100
)num_min_node (int, optional) – The minimum number of nodes in the generated molecular graphs. (default:
7
)num_max_node (int, optional) – The maximum number of nodes in the generated molecular graphs. (default:
25
)temperature (float, optional) – A float numbers, the temperature parameter of prior distribution. (default:
0.75
)atomic_num_list (list, optional) – A list of integers, the list of atomic numbers indicating the node types in the generated molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
(all_mols, pure_valids), all_mols is a list of generated molecules represented by rdkit Chem.Mol objects; pure_valids is a list of integers, all are 0 or 1, indicating whether bond resampling happens.
 train_cons_optim(loader, lr, wd, max_iters, warm_up, model_conf_dict, pretrain_path, save_interval, save_dir)[source]¶
Running finetuning for constrained optimization task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.ZINC800 as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_iters (int) – The maximum number of training iters.
warm_up (int) – The number of linear warmup iters.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
pretrain_path (str) – The path to the saved pretrained model parameters file.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=20, the model parameters will be saved for every 20 training iters.
save_dir (str) – The directory to save the model parameters.
 train_prop_optim(lr, wd, max_iters, warm_up, model_conf_dict, pretrain_path, save_interval, save_dir)[source]¶
Running finetuning for property optimization task.
 Parameters
lr (float) – The learning rate for finetuning.
wd (float) – The weight decay factor for training.
max_iters (int) – The maximum number of training iters.
warm_up (int) – The number of linear warmup iters.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
pretrain_path (str) – The path to the saved pretrained model file.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=20, the model parameters will be saved for every 20 training iters.
save_dir (str) – The directory to save the model parameters.
 train_rand_gen(loader, lr, wd, max_epochs, model_conf_dict, save_interval, save_dir)[source]¶
Running training for random generation task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_epochs (int) – The maximum number of training epochs.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=2, the model parameters will be saved for every 2 training epochs.
save_dir (str) – The directory to save the model parameters.
 class GraphDF[source]¶
The method class for GraphDF algorithm proposed in the paper GraphDF: A Discrete Flow Model for Molecular Graph Generation. This class provides interfaces for running random generation, property optimization, and constrained optimization with GraphDF algorithm. Please refer to the example codes for usage examples.
 run_const_prop_opt(dataset, model_conf_dict, checkpoint_path, repeat_time=200, min_optim_time=50, num_max_node=25, temperature=[0.3, 0.3], atomic_num_list=[6, 7, 8, 9])[source]¶
Running molecule optimization for constrained optimization task.
 Parameters
dataset – The dataset class for loading molecules to be optimized. It is supposed to use dig.ggraph.dataset.ZINC800 as the dataset class.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
repeat_time (int, optional) – The maximum number of optimization times for each molecule before successfully optimizing it under the threshold 0.6. (default:
200
)min_optim_time (int, optional) – The minimum number of optimization times for each molecule. (default:
50
)num_max_node (int, optional) – The maximum number of nodes in the optimized molecular graphs. (default:
25
)temperature (list, optional) – A list of two float numbers, the temperature parameter of prior distribution. (default:
[0.3, 0.3]
)atomic_num_list (list, optional) – A list of integers, the list of atomic numbers indicating the node types in the optimized molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
(mols_0, mols_2, mols_4, mols_6), they are lists of optimized molecules (represented by rdkit Chem.Mol objects) under the threshold 0.0, 0.2, 0.4, 0.6, respectively.
 run_prop_opt(model_conf_dict, checkpoint_path, n_mols=100, num_min_node=7, num_max_node=25, temperature=[0.3, 0.3], atomic_num_list=[6, 7, 8, 9])[source]¶
Running graph generation for property optimization task.
 Parameters
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
n_mols (int, optional) – The number of molecules to generate. (default:
100
)num_min_node (int, optional) – The minimum number of nodes in the generated molecular graphs. (default:
7
)num_max_node (int, optional) – The maximum number of nodes in the generated molecular graphs. (default:
25
)temperature (list, optional) – A list of two float numbers, the temperature parameter of prior distribution. (default:
[0.3, 0.3]
)atomic_num_list (list, optional) – A list of integers, the list of atomic numbers indicating the node types in the generated molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
all_mols, a list of generated molecules represented by rdkit Chem.Mol objects.
 run_rand_gen(model_conf_dict, checkpoint_path, n_mols=100, num_min_node=7, num_max_node=25, temperature=[0.3, 0.3], atomic_num_list=[6, 7, 8, 9])[source]¶
Running graph generation for random generation task.
 Parameters
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
checkpoint_path (str) – The path to the saved model checkpoint file.
n_mols (int, optional) – The number of molecules to generate. (default:
100
)num_min_node (int, optional) – The minimum number of nodes in the generated molecular graphs. (default:
7
)num_max_node (int, optional) – the maximum number of nodes in the generated molecular graphs. (default:
25
)temperature (list, optional) – a list of two float numbers, the temperature parameter of prior distribution. (default:
[0.3, 0.3]
)atomic_num_list (list, optional) – a list of integers, the list of atomic numbers indicating the node types in the generated molecular graphs. (default:
[6, 7, 8, 9]
)
 Return type
(all_mols, pure_valids), all_mols is a list of generated molecules represented by rdkit Chem.Mol objects; pure_valids is a list of integers, all are 0 or 1, indicating whether bond resampling happens.
 train_const_prop_opt(loader, lr, wd, max_iters, warm_up, model_conf_dict, pretrain_path, save_interval, save_dir)[source]¶
Running finetuning for constrained optimization task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.ZINC800 as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_iters (int) – The maximum number of training iters.
warm_up (int) – The number of linear warmup iters.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
pretrain_path (str) – The path to the saved pretrained model parameters file.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=20, the model parameters will be saved for every 20 training iters.
save_dir (str) – The directory to save the model parameters.
 train_prop_opt(lr, wd, max_iters, warm_up, model_conf_dict, pretrain_path, save_interval, save_dir)[source]¶
Running finetuning for property optimization task.
 Parameters
lr (float) – The learning rate for finetuning.
wd (float) – The weight decay factor for training.
max_iters (int) – The maximum number of training iters.
warm_up (int) – The number of linear warmup iters.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
pretrain_path (str) – The path to the saved pretrained model file.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=20, the model parameters will be saved for every 20 training iters.
save_dir (str) – The directory to save the model parameters.
 train_rand_gen(loader, lr, wd, max_epochs, model_conf_dict, save_interval, save_dir)[source]¶
Running training for random generation task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k/MOSES as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_epochs (int) – The maximum number of training epochs.
model_conf_dict (dict) – The python dict for configuring the model hyperparameters.
save_interval (int) – Indicate the frequency to save the model parameters to .pth files, e.g., if save_interval=2, the model parameters will be saved for every 2 training epochs.
save_dir (str) – The directory to save the model parameters.
 class GraphEBM(n_atom, n_atom_type, n_edge_type, hidden, device=None)[source]¶
The method class for GraphEBM algorithm proposed in the paper GraphEBM: Molecular Graph Generation with EnergyBased Models. This class provides interfaces for running random generation, goaldirected generation (including property optimization and constrained optimization), and compositional generation with GraphEBM algorithm. Please refer to the example codes for usage examples.
 Parameters
n_atom (int) – Maximum number of atoms.
n_atom_type (int) – Number of possible atom types.
n_edge_type (int) – Number of possible bond types.
hidden (int) – Hidden dimensions.
device (torch.device, optional) – The device where the model is deployed.
 run_comp_gen(checkpoint_path_qed, checkpoint_path_plogp, n_samples, c, ld_step, ld_noise, ld_step_size, clamp, atomic_num_list)[source]¶
Running graph generation for compositional generation task.
 Parameters
checkpoint_path_qed (str) – The path of the model trained on QED property, i.e., the .pt file.
checkpoint_path_plogp (str) – The path of the model trained on plogp property, i.e., the .pt file.
n_samples (int) – the number of molecules to generate.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
atomic_num_list (list) – The list used to indicate atom types.
 Return type
gen_mols (list): A list of generated molecules represented by rdkit Chem.Mol objects;
 run_const_prop_opt(checkpoint_path, initialization_loader, c, ld_step, ld_noise, ld_step_size, clamp, atomic_num_list, train_smiles)[source]¶
Running graph generation for goaldirected generation task: constrained property optimization.
 Parameters
checkpoint_path (str) – The path of the trained model, i.e., the .pt file.
initialization_loader – The data loader for loading samples to initialize the Langevin dynamics. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
atomic_num_list (list) – The list used to indicate atom types.
train_smiles (list) – A list of smiles string corresponding to training samples.
 Return type
mols_0_list (list), mols_2_list (list), mols_4_list (list), mols_6_list (list), imp_0_list (list), imp_2_list (list), imp_4_list (list), imp_4_list (list): They are lists of optimized molecules (represented by rdkit Chem.Mol objects) and the corresponding improvements under the threshold 0.0, 0.2, 0.4, 0.6, respectively.
 run_prop_opt(checkpoint_path, initialization_loader, c, ld_step, ld_noise, ld_step_size, clamp, atomic_num_list, train_smiles)[source]¶
Running graph generation for goaldirected generation task: property optimization.
 Parameters
checkpoint_path (str) – The path of the trained model, i.e., the .pt file.
initialization_loader – The data loader for loading samples to initialize the Langevin dynamics. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
atomic_num_list (list) – The list used to indicate atom types.
train_smiles (list) – A list of smiles string corresponding to training samples.
 Return type
save_mols_list (list), prop_list (list): save_mols_list is a list of generated molecules with high QED scores represented by rdkit Chem.Mol objects; prop_list is a list of the corresponding QED scores.
 run_rand_gen(checkpoint_path, n_samples, c, ld_step, ld_noise, ld_step_size, clamp, atomic_num_list)[source]¶
Running graph generation for random generation task.
 Parameters
checkpoint_path (str) – The path of the trained model, i.e., the .pt file.
n_samples (int) – the number of molecules to generate.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
atomic_num_list (list) – The list used to indicate atom types.
 Return type
gen_mols (list): A list of generated molecules represented by rdkit Chem.Mol objects;
 train_goal_directed(loader, lr, wd, max_epochs, c, ld_step, ld_noise, ld_step_size, clamp, alpha, save_interval, save_dir)[source]¶
Running training for goaldirected generation task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_epochs (int) – The maximum number of training epochs.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
alpha (float) – The weight coefficient for loss function.
save_interval (int) – The frequency to save the model parameters to .pt files, e.g., if save_interval=2, the model parameters will be saved for every 2 training epochs.
save_dir (str) – the directory to save the model parameters.
 train_rand_gen(loader, lr, wd, max_epochs, c, ld_step, ld_noise, ld_step_size, clamp, alpha, save_interval, save_dir)[source]¶
Running training for random generation task.
 Parameters
loader – The data loader for loading training samples. It is supposed to use dig.ggraph.dataset.QM9/ZINC250k as the dataset class, and apply torch_geometric.data.DenseDataLoader to it to form the data loader.
lr (float) – The learning rate for training.
wd (float) – The weight decay factor for training.
max_epochs (int) – The maximum number of training epochs.
c (float) – The scaling hyperparameter for dequantization.
ld_step (int) – The number of iteration steps of Langevin dynamics.
ld_noise (float) – The standard deviation of the added noise in Langevin dynamics.
ld_step_size (int) – The step size of Langevin dynamics.
clamp (bool) – Whether to use gradient clamp in Langevin dynamics.
alpha (float) – The weight coefficient for loss function.
save_interval (int) – The frequency to save the model parameters to .pt files, e.g., if save_interval=2, the model parameters will be saved for every 2 training epochs.
save_dir (str) – the directory to save the model parameters.
 class JTVAE(list_smiles, build_vocab=True, device=None)[source]¶
The method class for the JTVAE algorithm proposed in the paper Junction Tree Variational Autoencoder for Molecular Graph Generation. This class provides interfaces for running random generation with the JTVAE algorithm. Please refer to the example codes for usage examples.
 Parameters
list_smiles (list) – List of smiles in training data.
training (boolean) – If we are training (as opposed to testing).
build_vocab (boolean) – If we need to build the vocabulary (first time training with this dataset).
device (torch.device, optional) – The device where the model is deployed.
 train_cons_optim(loader, batch_size, num_epochs, hidden_size, latent_size, depth, beta, lr)[source]¶
Train the Junction Tree Variational Autoencoder for the constrained optimization task.
 Parameters
loader (MolTreeFolder) – The MolTreeFolder loader.
batch_size (int) – The batch size.
num_epochs (int) – The number of epochs.
hidden_size (int) – The hidden size.
latent_size (int) – The latent size.
depth (int) – The depth of the network.
lr (float) – The learning rate for training.
beta (float) – The KL regularization weight.
 train_rand_gen(loader, load_epoch, lr, anneal_rate, clip_norm, num_epochs, beta, max_beta, step_beta, anneal_iter, kl_anneal_iter, print_iter, save_iter)[source]¶
Train the Junction Tree Variational Autoencoder for the random generation task.
 Parameters
loader (MolTreeFolder) – The MolTreeFolder loader.
load_epoch (int) – The epoch to load from state dictionary.
lr (float) – The learning rate for training.
anneal_rate (float) – The learning rate annealing.
clip_norm (float) – Clips gradient norm of an iterable of parameters.
num_epochs (int) – The number of training epochs.
beta (float) – The KL regularization weight.
max_beta (float) – The maximum KL regularization weight.
step_beta (float) – The KL regularization weight step size.
anneal_iter (int) – How often to step in annealing the learning rate.
kl_anneal_iter (int) – How often to step in annealing the KL regularization weight.
print_iter (int) – How often to print the iteration statistics.
save_iter (int) – How often to save the iteration statistics.