Subset Selection Dataloaders
Essentially, with subset selection-based data loaders, it is pretty straightforward to use subset selection strategies directly because they are integrated directly into subset data loaders; this allows users to use subset selection strategies directly by using their respective subset selection data loaders.
Below is an example that shows the subset selection process is simplified by just calling a data loader in supervised learning setting,
dss_args = dict(model=model,
loss=criterion_nored,
eta=0.01,
num_classes=10,
num_epochs=300,
device='cuda',
fraction=0.1,
select_every=20,
kappa=0,
linear_layer=False,
selection_type='SL',
greedy='Stochastic')
dss_args = DotMap(dss_args)
dataloader = GLISTERDataLoader(trainloader, valloader, dss_args, logger,
batch_size=20,
shuffle=True,
pin_memory=False)
for epoch in range(num_epochs):
for _, (inputs, targets, weights) in enumerate(dataloader):
"""
Standard PyTorch training loop using weighted loss
Our training loop differs from the standard PyTorch training loop in that along with
data samples and their associated target labels; we also have additional sample weight
information from the subset data loader, which can be used to calculate the weighted
loss for gradient descent. We can calculate the weighted loss by using default PyTorch
loss functions with no reduction as follows:
"""
# Convert inputs, targets, and weights to the required device
inputs = inputs.to(self.cfg.train_args.device)
targets = targets.to(self.cfg.train_args.device, non_blocking=True)
weights = weights.to(self.cfg.train_args.device)
# Zero the optimizer gradients to prevent gradient accumulation
optimizer.zero_grad()
#Model forward pass over the inputs
outputs = model(inputs)
# Get individual sample losses with no reduction
losses = criterion_nored(outputs, targets)
# Get weighted loss by a dotproduct of the losses vector with sample weights
loss = torch.dot(losses, weights / (weights.sum()))
# Do backprop on the weighted loss
loss.backward()
# Step the model based on the gradient values
optimizer.step()
In our current version, we deployed subset selection data loaders in supervised learning and semi-supervised learning settings.