PyData Global Sprint

PyTorch-Ignite

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

if __name__ == "__main__":
    print("Let's have fun helping PyTorch-Ignite open source project !")

Slides: https://pytorch-ignite.github.io/pydata-global2021-slides/

Maintainers leading the sprint

	Priyansi @Priyansi	A CS Undergrad. Currently working on the docs of PyTorch-Ignite and helping manage the community
	Jeff Yang@ydcjeff	Contributor to PyTorch-Ignite and its related projects
	Ahmed @KickItLikeShika	A CS Undergrad and a Machine Learning Intern at Factmata working on NLP and contributor to PyTorch-Ignite.
	Victor @vfdev-5	Software Engineer at Quansight working on AI-related open source projects

Content

About the PyTorch-Ignite project
What are PyTorch and PyTorch-Ignite ?
Quick-start PyTorch-Ignite example
How you can help ?

About “PyTorch-Ignite” project

Community-driven open source and NumFOCUS Affiliated Project

maintained by volunteers in the PyTorch community:

@vfdev-5, @ydcjeff, @KickItLikeShika, @sdesrozis, @alykhantejani, @anmolsjoshi,
@trsvchn, @fco-dv, @Priyansi, @Moh-Yakoub, @gucifer, @Ishan-Kumar2 ...

With the support of:

Community Engagement

Google Summer of Code 2021
- Mentored two great students (Ahmed and Arpan)
Google Season of Docs 2021
- Working with great tech writer (Priyansi)
Hacktoberfest 2020 and 2021
PyData Global Mentored Sprint 2020 and 2021
Public meetings on Discord, open to everyone

Stay tuned for upcoming events …

PyTorch in a nutshell

import torch
import torch.nn as nn

device = "cuda"

class MyNN(nn.Module):
    def __init__(self):
        super(MyNN, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = MyNN().to(device)

tensor manipulations (device: CPUs, GPUs, TPUs)
NN components, optimizers, loss functions
Distributed computations
Profiling
other cool features …
Domain libraries: vision, text, audio
Rich ecosystem

https://pytorch.org/tutorials/beginner/basics/intro.html

Quick-start ML with PyTorch

Computer Vision example with Fashion MNIST

Problem: 1 - how to classify images ?

model(image) -> predicted label

2 - How measure model performances ?

predicted labels vs correct labels

Quick-start ML with PyTorch

Setup training and testing data

from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose

# Setup training/test data
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, transform=ToTensor())

batch_size = 64

# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# Optionally, for debugging:
for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

# Output:
# Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
# Shape of y:  torch.Size([64]) torch.int64

Quick-start ML with PyTorch

Create a model

import torch
from torch import nn

device = "cuda" if torch.cuda.is_available() else "cpu"

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)

Quick-start ML with PyTorch

Model training
- Loss function: cross-entropy
- Optimization with Stochastic Gradient Descent

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

def train(dataloader, model, loss_fn, optimizer):
    model.train()
    for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

def test(dataloader, model, loss_fn):
    # code to compute and print average loss and accuracy

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Why using PyTorch without Ignite is suboptimal ?

For NN training and evaluation:

PyTorch gives only “low”-level building components
Common bricks to code in any user project:
- metrics
- checkpointing, best model saving, early stopping, …
- logging to experiment tracking systems
- code adaptation for device (e.g. GPU, XLA)

Pure PyTorch code


model = Net()
train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.8)
criterion = torch.nn.NLLLoss()

max_epochs = 10
validate_every = 100
checkpoint_every = 100


def validate(model, val_loader):
    model = model.eval()
    num_correct = 0
    num_examples = 0
    for batch in val_loader:
        input, target = batch
        output = model(input)
        correct = torch.eq(torch.round(output).type(target.type()), target).view(-1)
        num_correct += torch.sum(correct).item()
        num_examples += correct.shape[0]
    return num_correct / num_examples


def checkpoint(model, optimizer, checkpoint_dir):
    # ...

def save_best_model(model, current_accuracy, best_accuracy):
    # ...

iteration = 0
best_accuracy = 0.0

for epoch in range(max_epochs):
    for batch in train_loader:
        model = model.train()
        optimizer.zero_grad()
        input, target = batch
        output = model(input)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        if iteration % validate_every == 0:
            binary_accuracy = validate(model, val_loader)
            print("After {} iterations, binary accuracy = {:.2f}"
                  .format(iteration, binary_accuracy))
            save_best_model(model, binary_accuracy, best_accuracy)

        if iteration % checkpoint_every == 0:
            checkpoint(model, optimizer, checkpoint_dir)
        iteration += 1

PyTorch-Ignite: what and why? 🤔

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

https://github.com/pytorch/ignite


def train_step(engine, batch):
  #  ... any training logic ...
  return batch_loss

trainer = Engine(train_step)

# Compose your pipeline ...

trainer.run(train_loader, max_epochs=100)


metrics = {
  "precision": Precision(),
  "recall": Recall()
}

evaluator = create_supervised_evaluator(
  model,
  metrics=metrics
)

@trainer.on(Events.EPOCH_COMPLETED)
def run_evaluation():
  evaluator.run(test_loader)

handler = ModelCheckpoint(
  '/tmp/models', 'checkpoint'
)
trainer.add_event_handler(
  Events.EPOCH_COMPLETED,
  handler,
  {'model': model}
)

Key concepts in a nutshell

PyTorch-Ignite is about:

Engine and Event System
Out-of-the-box metrics to easily evaluate models
Built-in handlers to compose training pipeline
Distributed Training support

What makes PyTorch-Ignite unique ?

Composable and interoperable components
Simple and understandable code
Open-source community involvement

How PyTorch-Ignite makes user’s live easier ?

With PyTorch-Ignite:

Less code than pure PyTorch while ensuring maximum control and simplicity
Easily get more refactored and structured code
Extensible API for metrics, experiment managers, and other components
Same code for non-distributed and distributed configs

Quick-start PyTorch-Ignite example 👩‍💻👨‍💻

Let’s train a MNIST classifier with PyTorch-Ignite!

https://pytorch-ignite.ai/tutorials/beginner/01-getting-started/

Any questions before we go on ?

How you can help ?

Participating GitHub repositories:

➡️ PyTorch-Ignite - Library to help with training and evaluating neural networks
PyTorch-Ignite Code-Generator - Web Application to generate the training scripts
PyTorch-Ignite Examples repository - Examples, tutorials, and how-to guides

Prerequisites

Basic Python knowledge
Basic PyTorch knowledge
Basic Machine Learning knowledge

Start contributing

Codebase structure

ignite - Core library files
- engine - Module containing core classes like Engine, Events, State.
- handlers - Module containing out-of-the-box handlers
- metrics - Module containing out-of-the-box metrics
- contrib - Contrib module with other metrics, handlers classes that may require additional dependencies
- distributed - Module with helpers for distributed computations
tests - Python unit tests
docs - Documentation files

https://github.com/pytorch/ignite/blob/master/CONTRIBUTING.md#developing-ignite

Help-wanted issues:

https://github.com/pytorch/ignite/issues?q=is%3Aopen+is%3Aissue+label%3APyDataGlobal

How you can help ?

Participating GitHub repositories:

PyTorch-Ignite - Library to help with training and evaluating neural networks
➡️ PyTorch-Ignite Code-Generator - Web Application to generate the training scripts
PyTorch-Ignite Examples repository - Examples, tutorials, and how-to guides

PyTorch-Ignite Code-Generator

https://code-generator.pytorch-ignite.ai/

What is Code-Generator?: web app to quickly produce quick-start python code for common training tasks in deep learning.
Why to use Code-Generator?: start working on a task without rewriting everything from scratch.

Prerequisites

Basic Python knowledge for working on templates
Basic HTML/JS knowledge for the web application
Basic Machine Learning knowledge

Start contributing

Contributing guidelines:

https://github.com/pytorch-ignite/code-generator/blob/main/CONTRIBUTING.md

Help-wanted issues:

https://github.com/pytorch-ignite/code-generator/issues?q=is%3Aissue+is%3Aopen+label%3APyDataGlobal2021

How you can help ?

Participating GitHub repositories:

PyTorch-Ignite - Library to help with training and evaluating neural networks
PyTorch-Ignite Code-Generator - Web Application to generate the training scripts
➡️ PyTorch-Ignite Examples repository - Examples, tutorials, and how-to guides

Prerequisites

Basic Machine/Deep Learning knowledge
Basic PyTorch and PyTorch-Ignite knowledge

Start contributing (1/2)

Your feedback is valuable!

Check out our new site: https://pytorch-ignite.ai/
Open an issue if see a typo, something is not clear, etc
- https://github.com/pytorch-ignite/examples/issues
- https://github.com/pytorch-ignite/pytorch-ignite.ai/issues

Start contributing (2/2)

Contributing guidelines:

Setup Jupyter Lab/Notebook with PyTorch and PyTorch-Ignite installed
- https://pytorch.org/get-started/locally/
- pip install pytorch-ignite

Help-wanted issues:

https://github.com/pytorch-ignite/examples/issues?q=is%3Aissue+is%3Aopen+label%3APyDataGlobal2021

Thank you for participating

in this sprint session !

and check out our new website:

https://pytorch-ignite.ai

We are looking for contributors to help out with the project.

Everyone is welcome to contribute

The Big Picture

Engine and Event System

Engine
- Loops on user data
- Applies an arbitrary user function on batches
Event system
- Customizable event collections
- Triggers handlers attached to events

In its simpliest form:

fire_event(Events.STARTED)
while epoch < max_epochs:
    fire_event(Events.EPOCH_STARTED)

    for batch in data:
        fire_event(Events.ITERATION_STARTED)
        output = train_step(batch)
        fire_event(Events.ITERATION_COMPLETED)

    fire_event(Events.EPOCH_COMPLETED)
fire_event(Events.COMPLETED)

Simplified training and validation loop

No more coding for/while loops on epochs and iterations. Users instantiate engines and run them.

from ignite.engine import Engine, Events, create_supervised_evaluator
from ignite.metrics import Accuracy


# Setup training engine:
def train_step(engine, batch):
    # Users can do whatever they need on a single iteration
    # Eg. forward/backward pass for any number of models, optimizers, etc.
    # ...

trainer = Engine(train_step)

# Setup single model evaluation engine
evaluator = create_supervised_evaluator(model, metrics={"accuracy": Accuracy()})

def validation():
    state = evaluator.run(validation_data_loader)
    # print computed metrics
    print(trainer.state.epoch, state.metrics)

# Run model's validation at the end of each epoch
trainer.add_event_handler(Events.EPOCH_COMPLETED, validation)

# Start the training
trainer.run(training_data_loader, max_epochs=100)

Power of Events & Handlers 🚀

1. Execute any number of functions whenever you wish

Handlers can be any function: e.g. lambda, simple function, class method, etc.

trainer.add_event_handler(Events.STARTED, lambda _: print("Start training"))

# attach handler with args, kwargs
mydata = [1, 2, 3, 4]
logger = ...

def on_training_ended(data):
    print(f"Training is ended. mydata={data}")
    # User can use variables from another scope
    logger.info("Training is ended")


trainer.add_event_handler(Events.COMPLETED, on_training_ended, mydata)
# call any number of functions on a single event
trainer.add_event_handler(Events.COMPLETED, lambda engine: print(engine.state.times))

@trainer.on(Events.ITERATION_COMPLETED)
def log_something(engine):
    print(engine.state.output)

Power of Events & Handlers

2. Built-in events filtering and stacking

# run the validation every 5 epochs
@trainer.on(Events.EPOCH_COMPLETED(every=5))
def run_validation():
    # run validation

@trainer.on(Events.COMPLETED | Events.EPOCH_COMPLETED(every=10))
def run_another_validation():
    # ...

# change some training variable once on 20th epoch
@trainer.on(Events.EPOCH_STARTED(once=20))
def change_training_variable():
    # ...

# Trigger handler with customly defined frequency
@trainer.on(Events.ITERATION_COMPLETED(event_filter=first_x_iters))
def log_gradients():
    # ...

Power of Events & Handlers

3. Custom events to go beyond standard events

from ignite.engine import EventEnum

# Define custom events
class BackpropEvents(EventEnum):
    BACKWARD_STARTED = 'backward_started'
    BACKWARD_COMPLETED = 'backward_completed'
    OPTIM_STEP_COMPLETED = 'optim_step_completed'

def train_step(engine, batch):
    # ...
    loss = criterion(y_pred, y)
    engine.fire_event(BackpropEvents.BACKWARD_STARTED)
    loss.backward()
    engine.fire_event(BackpropEvents.BACKWARD_COMPLETED)
    optimizer.step()
    engine.fire_event(BackpropEvents.OPTIM_STEP_COMPLETED)
    # ...

trainer = Engine(train_step)
trainer.register_events(*BackpropEvents)

@trainer.on(BackpropEvents.BACKWARD_STARTED)
def function_before_backprop(engine):
    # ...

Out-of-the-box metrics 📈

50+ distributed ready out-of-the-box metrics to easily evaluate models.

Dedicated to many Deep Learning tasks
Easily composable to assemble a custom metric
Easily extendable to create custom metrics

precision = Precision(average=False)
recall = Recall(average=False)
F1_per_class = (precision * recall * 2 / (precision + recall))
F1_mean = F1_per_class.mean()  # torch mean method
F1_mean.attach(engine, "F1")

Built-in Handlers

Logging to experiment tracking systems
Checkpointing,
Early stopping
Profiling
Parameter scheduling
etc.

# model checkpoint handler
checkpoint = ModelCheckpoint('/tmp/ckpts', 'training')
trainer.add_event_handler(Events.EPOCH_COMPLETED(every=2), handler, {'model': model})

# early stopping handler
def score_function(engine):
    val_loss = engine.state.metrics['acc']
    return val_loss
es = EarlyStopping(3, score_function, trainer)
evaluator.add_event_handler(Events.COMPLETED, handler)

# Piecewise linear parameter scheduler
scheduler = PiecewiseLinear(optimizer, 'lr', [(10, 0.5), (20, 0.45), (21, 0.3), (30, 0.1), (40, 0.1)])
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)

# TensorBoard logger: batch loss, metrics
tb_logger = TensorboardLogger(log_dir="tb-logger")
tb_logger.attach_output_handler(
    trainer, event_name=Events.ITERATION_COMPLETED(every=100), tag="training",
    output_transform=lambda loss: {"batch_loss": loss},
)

tb_logger.attach_output_handler(
    evaluator, event_name=Events.EPOCH_COMPLETED,
    tag="training", metric_names="all",
    global_step_transform=global_step_from_engine(trainer),
)

Distributed Training support

Run the same code across all supported backends seamlessly

Backends from native torch distributed configuration: nccl, gloo, mpi
Horovod framework with gloo or nccl communication backend
XLA on TPUs via pytorch/xla

import ignite.distributed as idist

def training(local_rank, *args, **kwargs):
    dataloder_train = idist.auto_dataloder(dataset, ...)

    model = ...
    model = idist.auto_model(model)

    optimizer = ...
    optimizer = idist.auto_optimizer(optimizer)

backend = 'nccl'  # or 'gloo', 'horovod', 'xla-tpu' or None
with idist.Parallel(backend) as parallel:
    parallel.run(training)

Distributed Training support

Distributed launchers

Handle distributed launchers with the same code

torch.multiprocessing.spawn
torch.distributed.launch
horovodrun
slurm

Distributed Training support

Unified Distributed API

High-level helper methods
- idist.auto_model()
- idist.auto_optim()
- idist.auto_dataloader()
Collective operations
- all_reduce, all_gather, and more

Any questions before we go on ?