Supervised Learning Subset Selection Data Loaders
In this section, we consider different subset selection based data loaders geared towards efficient and robust learning in standard supervised learning setting.
DSS Dataloader (Base Class)
- class cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader(full_data, dss_args, logger, *args, **kwargs)[source]
Bases:
object
Implementation of DSSDataLoader class which serves as base class for dataloaders of other selection strategies for supervised learning framework.
- Parameters
full_data (torch.utils.data.Dataset Class) – Full dataset from which data subset needs to be selected.
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger class for logging the information
Non-Adaptive subset selection Data Loaders
- class cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader
Implementation of NonAdaptiveDSSDataLoader class which serves as base class for dataloaders of other nonadaptive subset selection strategies for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader
Implements of CRAIGDataLoader that serves as the dataloader for the nonadaptive CRAIG subset selection strategy from the paper 1.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.FacLocDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader
Implementation of FacLocDataLoader class for the nonadaptive facility location based subset selection strategy for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.GraphCutDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader
Implementation of GraphCutDataLoader class for the nonadaptive graph cut function based subset selection strategy for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SaturatedCoverageDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader
Implementation of SaturatedCoverageDataLoader class for the nonadaptive saturated coverage function based subset selection strategy for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.nonadaptivedataloader.NonAdaptiveDSSDataLoader
Implementation of SubmodDataLoader class for the nonadaptive submodular subset selection strategies for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SumRedundancyDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.nonadaptive.submoddataloader.SubmodDataLoader
Implementation of SumRedundancyDataLoader class for the nonadaptive sum redundancy function based subset selection strategy for supervised learning setting.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
Adaptive subset selection Data Loaders
- class cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.dssdataloader.DSSDataLoader
Implementation of AdaptiveDSSDataLoader class which serves as base class for dataloaders of other adaptive subset selection strategies for supervised learning framework.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.adaptive.glisterdataloader.GLISTERDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader
Implements of GLISTERDataLoader that serves as the dataloader for the adaptive GLISTER subset selection strategy from the paper 2.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary required for GLISTER subset selection strategy
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.adaptive.craigdataloader.CRAIGDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader
Implements of CRAIGDataLoader that serves as the dataloader for the adaptive CRAIG subset selection strategy from the paper 1.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary required for CRAIG subset selection strategy
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.adaptive.gradmatchdataloader.GradMatchDataLoader(train_loader, val_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader
Implements of GradMatchDataLoader that serves as the dataloader for the adaptive GradMatch subset selection strategy from the paper 3.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
val_loader (torch.utils.data.DataLoader class) – Dataloader of the validation dataset
dss_args (dict) – Data subset selection arguments dictionary required for GradMatch subset selection strategy
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.adaptive.randomdataloader.RandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader
Implements of RandomDataLoader that serves as the dataloader for the non-adaptive Random subset selection strategy.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy
logger (class) – Logger for logging the information
- class cords.utils.data.dataloader.SL.adaptive.olrandomdataloader.OLRandomDataLoader(train_loader, dss_args, logger, *args, **kwargs)[source]
Bases:
cords.utils.data.dataloader.SL.adaptive.adaptivedataloader.AdaptiveDSSDataLoader
Implements of OLRandomDataLoader that serves as the dataloader for the adaptive Random subset selection strategy.
- Parameters
train_loader (torch.utils.data.DataLoader class) – Dataloader of the training dataset
dss_args (dict) – Data subset selection arguments dictionary required for Random subset selection strategy
logger (class) – Logger for logging the information
REFERENCES
- 1(1,2)
Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 6950–6960. PMLR, 13–18 Jul 2020. URL: https://proceedings.mlr.press/v119/mirzasoleiman20a.html.
- 2
Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: generalization based data subset selection for efficient and robust learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9):8110–8118, May 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16988.
- 3
Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: gradient matching based data subset selection for efficient deep model training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5464–5474. PMLR, 18–24 Jul 2021. URL: https://proceedings.mlr.press/v139/killamsetty21a.html.