Quick Start Guide

This guide provides a minimal, step-by-step tutorial to get you started with training a diffusion model.

Training a 2D Image Model

The easiest way to start is by using the Trainer class, which handles the entire training process for you. All you need is a folder of images.

Step 1: Prepare Your Dataset

Create a directory and fill it with the images you want to train on. For this example, let's assume your images are in a folder named path/to/your/images.

Step 2: Write the Training Script

Create a Python script (e.g., train.py) and add the following code:

from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

# 1. Define the U-Net model
model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True # use flash attention for efficiency
)

# 2. Define the diffusion model, wrapping the U-Net
diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,           # number of steps
    sampling_timesteps = 250    # number of sampling timesteps (using DDIM for faster inference)
)

# 3. Instantiate the Trainer
trainer = Trainer(
    diffusion,
    'path/to/your/images',      # folder containing your images
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,   # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,          # exponential moving average decay
    amp = True,                 # turn on mixed precision
    calculate_fid = True        # whether to calculate FID during training
)

# 4. Start training
trainer.train()

Step 3: Run the Script

Execute the script from your terminal:

python train.py

The trainer will automatically start the training process. Samples and model checkpoints will be saved periodically to a results/ directory in your current working folder.

Generating Samples After Training

Once training is complete, you can load a checkpoint and generate samples:

import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

# Re-instantiate your model and diffusion wrapper
model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000
)

# Load the trained model data
data = torch.load('./results/model-10.pt') # assuming 10 is your milestone
model.load_state_dict(data['model'])

# Generate samples
sampled_images = diffusion.sample(batch_size = 4)
# sampled_images.shape is (4, 3, 128, 128)

Training a 1D Sequence Model

This library also supports diffusion for 1D sequences, which is useful for tasks like audio generation or time-series modeling.

Step 1: Prepare Your Data

For this example, we'll use a random tensor as a placeholder for your actual sequence data.

import torch

# Your training data should be a tensor of shape (num_samples, channels, sequence_length)
# For example, 64 sequences, each with 32 features and a length of 128
training_seq = torch.rand(64, 32, 128)

Step 2: Write the 1D Training Script

from denoising_diffusion_pytorch import Unet1D, GaussianDiffusion1D, Trainer1D, Dataset1D

# 1. Define the 1D U-Net model
model = Unet1D(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    channels = 32 # number of features/channels in your sequence
)

# 2. Define the 1D diffusion model
diffusion = GaussianDiffusion1D(
    model,
    seq_length = 128,
    timesteps = 1000,
    objective = 'pred_v'
)

# 3. Create a Dataset object (or use your own custom PyTorch Dataset)
dataset = Dataset1D(training_seq)

# 4. Instantiate the 1D Trainer
trainer = Trainer1D(
    diffusion,
    dataset = dataset,
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,
    gradient_accumulate_every = 2,
    ema_decay = 0.995,
    amp = True
)

# 5. Start training
trainer.train()

After training, you can sample new sequences using diffusion.sample(batch_size = 4).