Quick Start Guide
This guide provides a minimal, step-by-step tutorial to get you started with training a diffusion model.
Training a 2D Image Model
The easiest way to start is by using the Trainer
class, which handles the entire training process for you. All you need is a folder of images.
Step 1: Prepare Your Dataset
Create a directory and fill it with the images you want to train on. For this example, let's assume your images are in a folder named path/to/your/images
.
Step 2: Write the Training Script
Create a Python script (e.g., train.py
) and add the following code:
from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer
# 1. Define the U-Net model
model = Unet(
dim = 64,
dim_mults = (1, 2, 4, 8),
flash_attn = True # use flash attention for efficiency
)
# 2. Define the diffusion model, wrapping the U-Net
diffusion = GaussianDiffusion(
model,
image_size = 128,
timesteps = 1000, # number of steps
sampling_timesteps = 250 # number of sampling timesteps (using DDIM for faster inference)
)
# 3. Instantiate the Trainer
trainer = Trainer(
diffusion,
'path/to/your/images', # folder containing your images
train_batch_size = 32,
train_lr = 8e-5,
train_num_steps = 700000, # total training steps
gradient_accumulate_every = 2, # gradient accumulation steps
ema_decay = 0.995, # exponential moving average decay
amp = True, # turn on mixed precision
calculate_fid = True # whether to calculate FID during training
)
# 4. Start training
trainer.train()
Step 3: Run the Script
Execute the script from your terminal:
python train.py
The trainer will automatically start the training process. Samples and model checkpoints will be saved periodically to a results/
directory in your current working folder.
Generating Samples After Training
Once training is complete, you can load a checkpoint and generate samples:
import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion
# Re-instantiate your model and diffusion wrapper
model = Unet(
dim = 64,
dim_mults = (1, 2, 4, 8),
flash_attn = True
)
diffusion = GaussianDiffusion(
model,
image_size = 128,
timesteps = 1000
)
# Load the trained model data
data = torch.load('./results/model-10.pt') # assuming 10 is your milestone
model.load_state_dict(data['model'])
# Generate samples
sampled_images = diffusion.sample(batch_size = 4)
# sampled_images.shape is (4, 3, 128, 128)
Training a 1D Sequence Model
This library also supports diffusion for 1D sequences, which is useful for tasks like audio generation or time-series modeling.
Step 1: Prepare Your Data
For this example, we'll use a random tensor as a placeholder for your actual sequence data.
import torch
# Your training data should be a tensor of shape (num_samples, channels, sequence_length)
# For example, 64 sequences, each with 32 features and a length of 128
training_seq = torch.rand(64, 32, 128)
Step 2: Write the 1D Training Script
from denoising_diffusion_pytorch import Unet1D, GaussianDiffusion1D, Trainer1D, Dataset1D
# 1. Define the 1D U-Net model
model = Unet1D(
dim = 64,
dim_mults = (1, 2, 4, 8),
channels = 32 # number of features/channels in your sequence
)
# 2. Define the 1D diffusion model
diffusion = GaussianDiffusion1D(
model,
seq_length = 128,
timesteps = 1000,
objective = 'pred_v'
)
# 3. Create a Dataset object (or use your own custom PyTorch Dataset)
dataset = Dataset1D(training_seq)
# 4. Instantiate the 1D Trainer
trainer = Trainer1D(
diffusion,
dataset = dataset,
train_batch_size = 32,
train_lr = 8e-5,
train_num_steps = 700000,
gradient_accumulate_every = 2,
ema_decay = 0.995,
amp = True
)
# 5. Start training
trainer.train()
After training, you can sample new sequences using diffusion.sample(batch_size = 4)
.