dig.xgraph.dataset

Dataset interfaces under dig.xgraph.dataset.

BA_LRP

The synthetic graph classification dataset used in Higher-Order Explanations of Graph Neural Networks via Relevant Walks.

MoleculeDataset

The extension of MoleculeNet with MUTAG.

SentiGraphDataset

The SentiGraph datasets from Explainability in Graph Neural Networks: A Taxonomic Survey.

SynGraphDataset

The Synthetic datasets used in Parameterized Explainer for Graph Neural Network.

class BA_LRP(root, num_per_class=10000, transform=None, pre_transform=None)[source]

The synthetic graph classification dataset used in Higher-Order Explanations of Graph Neural Networks via Relevant Walks. The first class in BA_LRP is Barabási–Albert(BA) graph which connects a new node \(\mathcal{V}\) from current graph \(\mathcal{G}\).

\[p(\mathcal{V}) = \frac{Degree(\mathcal{V})}{\sum_{\mathcal{V}' \in \mathcal{G}} Degree(\mathcal{V}')}\]

The second class in BA_LRP has a slightly higher growth model and nodes are selected without replacement with the inverse preferential attachment model.

\[p(\mathcal{V}) = \frac{Degree(\mathcal{V})^{-1}}{\sum_{\mathcal{V}' \in \mathcal{G}} Degree(\mathcal{V}')^{-1}}\]
Parameters
  • root (str) – Root data directory to save datasets

  • num_per_class (int) – The number of the graphs for each class.

  • transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Note

BA_LRP will automatically generate the dataset if the dataset file is not existed in the root directory.

Example

>>> dataset = BA_LRP(root='./datasets')
>>> loader = Dataloader(dataset, batch_size=32)
>>> data = next(iter(loader))
# Batch(batch=[640], edge_index=[2, 1344], x=[640, 1], y=[32, 1])

Where the attributes of data indices:

  • batch: The assignment vector mapping each node to its graph index

  • x: The node features

  • edge_index: The edge matrix

  • y: The graph label

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class MoleculeDataset(root, name, transform=None, pre_transform=None, pre_filter=None)[source]

The extension of MoleculeNet with MUTAG.

The MoleculeNet benchmark collection from the MoleculeNet: A Benchmark for Molecular Machine Learning paper, containing datasets from physical chemistry, biophysics and physiology.

The MoleculeNet datasets come with the additional node and edge features introduced by the Open Graph Benchmark, and the node features in MUTAG dataset are one hot features denoting the atom types.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("MUTAG", "ESOL", "FreeSolv", "Lipo", "PCBA", "MUV", "HIV", "BACE", "BBPB", "Tox21", "ToxCast", "SIDER", "ClinTox").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class SentiGraphDataset(root, name, transform=None, pre_transform=<function undirected_graph>)[source]

The SentiGraph datasets from Explainability in Graph Neural Networks: A Taxonomic Survey. The datasets take pretrained BERT as node feature extractor and dependency tree as edges to transfer the text sentiment datasets into graph classification datasets.

The dataset Graph-SST2 should be downloaded to the proper directory before running. All the three datasets Graph-SST2, Graph-SST5, and Graph-Twitter can be download in this link.

Parameters
  • root (str) – Root directory where the datasets are saved

  • name (str) – The name of the datasets.

  • transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Note

The default parameter of pre_transform is undirected_graph() which transfers the directed graph in original data into undirected graph before being saved to disk.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class SynGraphDataset(root, name, transform=None, pre_transform=None)[source]

The Synthetic datasets used in Parameterized Explainer for Graph Neural Network. It takes Barabási–Albert(BA) graph or balance tree as base graph and randomly attachs specific motifs to the base graph.

Parameters
  • root (str) – Root data directory to save datasets

  • name (str) – The name of the dataset. Including BA_shapes, BA_grid,

  • transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (Callable, None) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.