Supervised Learning Subset Selection Data Loaders

In this section, we consider different subset selection based data loaders geared towards efficient and robust learning in standard supervised learning setting.

DSS Dataloader (Base Class)

class cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader(full_data, dss_args, logger, *args, **kwargs)[source]

Bases: object

Implementation of DSSDataLoader class which serves as base class for dataloaders of other selection strategies for supervised learning framework.

Parameters
  • full_data (torch.utils.data.Dataset Class) – Full dataset from which data subset needs to be selected.

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger class for logging the information

Non-Adaptive subset selection Data Loaders

class cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader

Implementation of NonAdaptiveDSSDataLoader class which serves as base class for dataloaders of other nonadaptive subset selection strategies for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader

Implements of CRAIGDataLoader that serves as the dataloader for the nonadaptive CRAIG subset selection strategy from the paper 1.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.FacLocDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of FacLocDataLoader class for the nonadaptive facility location based subset selection strategy for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.GraphCutDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of GraphCutDataLoader class for the nonadaptive graph cut function based subset selection strategy for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SaturatedCoverageDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of SaturatedCoverageDataLoader class for the nonadaptive saturated coverage function based subset selection strategy for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader

Implementation of SubmodDataLoader class for the nonadaptive submodular subset selection strategies for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SumRedundancyDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of SumRedundancyDataLoader class for the nonadaptive sum redundancy function based subset selection strategy for supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

Adaptive subset selection Data Loaders

class cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader

Implementation of AdaptiveDSSDataLoader class which serves as base class for dataloaders of other adaptive subset selection strategies for supervised learning framework.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

resample()[source]

Function that resamples the subset indices and recalculates the subset weights

class cords.utils.data.dataloader.SL.adaptive.glisterdataloader.GLISTERDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of GLISTERDataLoader that serves as the dataloader for the adaptive GLISTER subset selection strategy from the paper 2.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for GLISTER subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.adaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of CRAIGDataLoader that serves as the dataloader for the adaptive CRAIG subset selection strategy from the paper 1.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.adaptive.gradmatchdataloader.GradMatchDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of GradMatchDataLoader that serves as the dataloader for the adaptive GradMatch subset selection strategy from the paper 3.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for GradMatch subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.adaptive.randomdataloader.RandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of RandomDataLoader that serves as the dataloader for the non-adaptive Random subset selection strategy.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SL.adaptive.olrandomdataloader.OLRandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of OLRandomDataLoader that serves as the dataloader for the adaptive Random subset selection strategy.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy

  • logger (class) – Logger for logging the information

REFERENCES

1(1,2)

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 6950–6960. PMLR, 13–18 Jul 2020. URL: https://proceedings.mlr.press/v119/mirzasoleiman20a.html.

2

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: generalization based data subset selection for efficient and robust learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9):8110–8118, May 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16988.

3

Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: gradient matching based data subset selection for efficient deep model training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5464–5474. PMLR, 18–24 Jul 2021. URL: https://proceedings.mlr.press/v139/killamsetty21a.html.