dig.sslgraph.dataset¶
Dataset interfaces under dig.sslgraph.dataset
.
-
class
TUDatasetExt
(root, name, task, transform=None, pre_transform=None, pre_filter=None, use_node_attr=False, use_edge_attr=False, cleaned=False, processed_filename='data.pt')[source]¶ An extended TUDataset from Pytorch Geometric, including a variety of graph kernel benchmark datasets, e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”.
Note
Some datasets may not come with any node labels. You can then either make use of the argument
use_node_attr
to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as liketorch_geometric.transforms.Constant
ortorch_geometric.transforms.OneHotDegree
.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)use_node_attr (bool, optional) – If
True
, the dataset will contain additional continuous node attributes (if present). (default:False
)use_edge_attr (bool, optional) – If
True
, the dataset will contain additional continuous edge attributes (if present). (default:False
)cleaned (bool, optional) – If
True
, the dataset will contain only non-isomorphic graphs. (default:False
)processed_filename (string, optional) – The name of the processed data file. (default: obj: data.pt)
-
property
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
property
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
get_dataset
(name, task, feat_str='deg', root=None)[source]¶ A pre-implemented function to retrieve graph datasets from TUDataset. Depending on evaluation tasks, different node feature augmentation will be applied following GraphCL.
- Parameters
name (string) –
The name of the dataset.
task (string) – The evaluation task. Either ‘semisupervised’ or ‘unsupervised’.
feat_str (bool, optional) – The node feature augmentations to be applied, e.g., degrees and centrality. (default:
deg
)root (string, optional) – Root directory where the dataset should be saved. (default:
None
)
- Return type
torch_geometric.data.Dataset
(unsupervised), or (torch_geometric.data.Dataset
,torch_geometric.data.Dataset
) (semisupervised).
Examples
>>> dataset, dataset_pretrain = get_dataset("NCI1", "semisupervised") >>> dataset NCI1(4110)
>>> dataset = get_dataset("MUTAG", "unsupervised", feat_str="") >>> dataset # degree not augmented as node attributes MUTAG(188)