dig.oodgraph

Graph OOD (GOOD) Dataset interfaces under dig.oodgraph.

Please refer to the GOOD project for more details.

This module includes 8 GOOD datasets.

  • Graph prediction datasets: GOOD-HIV, GOOD-PCBA, GOOD-ZINC, GOOD-CMNIST, GOOD-Motif.

  • Node prediction datasets: GOOD-Cora, GOOD-Arxiv, GOOD-CBAS.

GOODArxiv

The GOOD-Arxiv dataset adapted from OGB benchmark.

GOODCBAS

The GOOD-CBAS dataset.

GOODCMNIST

The GOOD-CMNIST dataset following IRM paper.

GOODCora

The GOOD-Cora dataset.

GOODHIV

The GOOD-HIV dataset.

GOODMotif

The GOOD-Motif dataset motivated by Spurious-Motif.

GOODPCBA

The GOOD-PCBA dataset.

GOODZINC

The GOOD-ZINC dataset adapted from ZINC database.

class GOODArxiv(root: str, domain: str, shift: str = 'no_shift', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-Arxiv dataset adapted from OGB benchmark.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODCBAS(root: str, domain: str, shift: str = 'no_shift', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-CBAS dataset. Modified from BA-Shapes.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘color’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODCMNIST(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-CMNIST dataset following IRM paper.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘color’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODCora(root: str, domain: str, shift: str = 'no_shift', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-Cora dataset. Adapted from the full Cora dataset.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘word’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODHIV(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-HIV dataset. Adapted from MoleculeNet.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘scaffold’ and ‘size’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODMotif(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-Motif dataset motivated by Spurious-Motif.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘basis’ and ‘size’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODPCBA(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-PCBA dataset. Adapted from MoleculeNet.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘scaffold’ and ‘size’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

class GOODZINC(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

The GOOD-ZINC dataset adapted from ZINC database.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘scaffold’ and ‘size’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.