dig.sslgraph.dataset

Dataset interfaces under dig.sslgraph.dataset.

class TUDatasetExt(root, name, task, transform=None, pre_transform=None, pre_filter=None, use_node_attr=False, use_edge_attr=False, cleaned=False, processed_filename='data.pt')[source]

An extended TUDataset from Pytorch Geometric, including a variety of graph kernel benchmark datasets, e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset.

  • task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

  • cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)

  • processed_filename (string, optional) – The name of the processed data file. (default: obj: data.pt)

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

get_dataset(name, task, feat_str='deg', root=None)[source]

A pre-implemented function to retrieve graph datasets from TUDataset. Depending on evaluation tasks, different node feature augmentation will be applied following GraphCL.

Parameters
  • name (string) –

    The name of the dataset.

  • task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.

  • feat_str (bool, optional) – The node feature augmentations to be applied, e.g., degrees and centrality. (default: deg)

  • root (string, optional) – Root directory where the dataset should be saved. (default: None)

Return type

torch_geometric.data.Dataset (unsupervised), or (torch_geometric.data.Dataset, torch_geometric.data.Dataset) (semisupervised).

Examples

>>> dataset, dataset_pretrain = get_dataset("NCI1", "semisupervised")
>>> dataset
NCI1(4110)
>>> dataset = get_dataset("MUTAG", "unsupervised", feat_str="")
>>> dataset # degree not augmented as node attributes
MUTAG(188)
get_node_dataset(name, norm_feat=False, root=None)[source]

A pre-implemented function to retrieve node datasets from Planetoid.

Parameters
  • name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").

  • norm_feat (bool, optional) – Whether to normalize node features.

  • root (string, optional) – Root directory where the dataset should be saved. (default: None)

Return type

torch_geometric.data.Dataset

Example

>>> dataset = get_node_dataset("Cora")
>>> dataset
Cora()