dig.threedgraph.dataset¶
Dataset interfaces under dig.threedgraph.dataset
.
A Pytorch Geometric data interface for |
|
A Pytorch Geometric data interface for |
- class MD17(root='dataset/', name='benzene_old', transform=None, pre_transform=None, pre_filter=None)[source]¶
A Pytorch Geometric data interface for
MD17
dataset which is from “Machine learning of accurate energy-conserving molecular force fields” paper. MD17 is a collection of eight molecular dynamics simulations for small organic molecules.- Parameters
root (string) – The dataset folder will be located at root/name.
name (string) – The name of dataset. Available dataset names are as follows:
aspirin
,benzene_old
,ethanol
,malonaldehyde
,naphthalene
,salicylic
,toluene
,uracil
. (default:benzene_old
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
>>> dataset = MD17(name='aspirin') >>> split_idx = dataset.get_idx_split(len(dataset.data.y), train_size=1000, valid_size=10000, seed=42) >>> train_dataset, valid_dataset, test_dataset = dataset[split_idx['train']], dataset[split_idx['valid']], dataset[split_idx['test']] >>> train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) >>> data = next(iter(train_loader)) >>> data Batch(batch=[672], force=[672, 3], pos=[672, 3], ptr=[33], y=[32], z=[672])
Where the attributes of the output data indicates:
z
: The atom type.pos
: The 3D position for atoms.y
: The property (energy) for the graph (molecule).force
: The 3D force for atoms.batch
: The assignment vector which maps each node to its respective graph identifier and can help reconstructe single graphs
- property processed_file_names¶
The name of the files in the
self.processed_dir
folder that must be present in order to skip processing.
- property raw_file_names¶
The name of the files in the
self.raw_dir
folder that must be present in order to skip downloading.
- class QM93D(root='dataset/', transform=None, pre_transform=None, pre_filter=None)[source]¶
A Pytorch Geometric data interface for
QM9
dataset which is from “Quantum chemistry structures and properties of 134 kilo molecules” paper. It connsists of about 130,000 equilibrium molecules with 12 regression targets:mu
,alpha
,homo
,lumo
,gap
,r2
,zpve
,U0
,U
,H
,G
,Cv
. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule.Note
We used the processed data in DimeNet, wihch includes spatial information and type for each atom. You can also use QM9 in Pytorch Geometric.
- Parameters
root (string) – the dataset folder will be located at root/qm9.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
>>> dataset = QM93D() >>> target = 'mu' >>> dataset.data.y = dataset.data[target] >>> split_idx = dataset.get_idx_split(len(dataset.data.y), train_size=110000, valid_size=10000, seed=42) >>> train_dataset, valid_dataset, test_dataset = dataset[split_idx['train']], dataset[split_idx['valid']], dataset[split_idx['test']] >>> train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) >>> data = next(iter(train_loader)) >>> data Batch(Cv=[32], G=[32], H=[32], U=[32], U0=[32], alpha=[32], batch=[579], gap=[32], homo=[32], lumo=[32], mu=[32], pos=[579, 3], ptr=[33], r2=[32], y=[32], z=[579], zpve=[32])
Where the attributes of the output data indicates:
z
: The atom type.pos
: The 3D position for atoms.y
: The target property for the graph (molecule).batch
: The assignment vector which maps each node to its respective graph identifier and can help reconstructe single graphs
- property processed_file_names¶
The name of the files in the
self.processed_dir
folder that must be present in order to skip processing.
- property raw_file_names¶
The name of the files in the
self.raw_dir
folder that must be present in order to skip downloading.