Semi-supervised Learning Subset Selection Data Loaders

In this section, we consider different subset selection based data loaders geared towards efficient and robust learning in standard semi-supervised learning setting.

DSS Dataloader (Base Class)

class cords.utils.data.dataloader.SSL.dssdataloader.DSSDataLoader(full_data, dss_args, logger, *args, **kwargs)[source]

Bases: object

Implementation of DSSDataLoader class which serves as base class for dataloaders of other selection strategies for semi-supervised learning framework.

Parameters
  • full_data (torch.utils.data.Dataset Class) – Full dataset from which data subset needs to be selected.

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger class for logging the information

Non-Adaptive subset selection Data Loaders

class cords.utils.data.dataloader.SSL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.dssdataloader.DSSDataLoader

Implementation of NonAdaptiveDSSDataLoader class which serves as base class for dataloaders of other nonadaptive subset selection strategies for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader

Implements of CRAIGDataLoader that serves as the dataloader for the nonadaptive CRAIG subset selection strategy for semi-supervised learning and is an adapted version from the paper 1.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.FacLocDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of FacLocDataLoader class for the nonadaptive facility location based subset selection strategy for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.GraphCutDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of GraphCutDataLoader class for the nonadaptive graph cut function based subset selection strategy for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SaturatedCoverageDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of SaturatedCoverageDataLoader class for the nonadaptive saturated coverage function based subset selection strategy for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SubmodDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader

Implementation of SubmodDataLoader class for the nonadaptive submodular subset selection strategies for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SumRedundancyDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.nonadaptive.submoddataloader.SubmodDataLoader

Implementation of SumRedundancyDataLoader class for the nonadaptive sum redundancy function based subset selection strategy for semi-supervised learning setting.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

Adaptive subset selection Data Loaders

class cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.dssdataloader.DSSDataLoader

Implementation of AdaptiveDSSDataLoader class which serves as base class for dataloaders of other adaptive subset selection strategies for semi-supervised learning framework.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary

  • logger (class) – Logger for logging the information

resample()[source]

Function that resamples the subset indices and recalculates the subset weights

class cords.utils.data.dataloader.SSL.adaptive.retrievedataloader.RETRIEVEDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of RETRIEVEDataLoader that serves as the dataloader for the adaptive RETRIEVE subset selection strategy from the paper 2.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for GLISTER subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.adaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of CRAIGDataLoader that serves as the dataloader for the adaptive CRAIG subset selection strategy for semi-supervised learning and is an adapted version from the paper 1.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.adaptive.gradmatchdataloader.GradMatchDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of GradMatchDataLoader that serves as the dataloader for the adaptive GradMatch subset selection strategy for semi-supervised learning and is an adapted version of the one given in the paper 3. :param train_loader: Dataloader of the training dataset :type train_loader: torch.utils.data.DataLoader class :param val_loader: Dataloader of the validation dataset :type val_loader: torch.utils.data.DataLoader class :param dss_args: Data subset selection arguments dictionary required for GradMatch subset selection strategy :type dss_args: dict :param logger: Logger for logging the information :type logger: class

class cords.utils.data.dataloader.SSL.adaptive.randomdataloader.RandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of RandomDataLoader that serves as the dataloader for the non-adaptive Random subset selection strategy.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy

  • logger (class) – Logger for logging the information

class cords.utils.data.dataloader.SSL.adaptive.olrandomdataloader.OLRandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]

Bases: cords.utils.data.dataloader.SSL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader

Implements of OLRandomDataLoader that serves as the dataloader for the adaptive Random subset selection strategy.

Parameters
  • train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset

  • dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy

  • logger (class) – Logger for logging the information

REFERENCES

1(1,2)

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 6950–6960. PMLR, 13–18 Jul 2020. URL: https://proceedings.mlr.press/v119/mirzasoleiman20a.html.

2

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, and Rishabh K Iyer. RETRIEVE: coreset selection for efficient and robust semi-supervised learning. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems. 2021. URL: https://openreview.net/forum?id=jSz59N8NvUP.

3

Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: gradient matching based data subset selection for efficient deep model training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5464–5474. PMLR, 18–24 Jul 2021. URL: https://proceedings.mlr.press/v139/killamsetty21a.html.