dig.sslgraph.dataset¶

Dataset interfaces under dig.sslgraph.dataset.

class TUDatasetExt(root, name, task, transform=None, pre_transform=None, pre_filter=None, use_node_attr=False, use_edge_attr=False, cleaned=False, processed_filename='data.pt')[source]¶

An extended TUDataset from Pytorch Geometric, including a variety of graph kernel benchmark datasets, e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)
use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)
cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)
processed_filename (string, optional) – The name of the processed data file. (default: obj: data.pt)

download()[source]¶: Downloads the dataset to the self.raw_dir folder.

get(idx)[source]¶: Gets the data object at index idx.

process()[source]¶: Processes the dataset to the self.processed_dir folder.

property processed_file_names¶: The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names¶: The name of the files to find in the self.raw_dir folder in order to skip the download.

get_dataset(name, task, feat_str='deg', root=None)[source]¶

A pre-implemented function to retrieve graph datasets from TUDataset. Depending on evaluation tasks, different node feature augmentation will be applied following GraphCL.

Parameters

name (string) –
The name of the dataset.
task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.
feat_str (bool, optional) – The node feature augmentations to be applied, e.g., degrees and centrality. (default: deg)
root (string, optional) – Root directory where the dataset should be saved. (default: None)

Return type

torch_geometric.data.Dataset (unsupervised), or (torch_geometric.data.Dataset, torch_geometric.data.Dataset) (semisupervised).

Examples

>>> dataset, dataset_pretrain = get_dataset("NCI1", "semisupervised")
>>> dataset
NCI1(4110)

>>> dataset = get_dataset("MUTAG", "unsupervised", feat_str="")
>>> dataset # degree not augmented as node attributes
MUTAG(188)

get_node_dataset(name, norm_feat=False, root=None)[source]¶

A pre-implemented function to retrieve node datasets from Planetoid.

Parameters

name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").
norm_feat (bool, optional) – Whether to normalize node features.
root (string, optional) – Root directory where the dataset should be saved. (default: None)

Return type

torch_geometric.data.Dataset

Example

>>> dataset = get_node_dataset("Cora")
>>> dataset
Cora()