ELEC4840-python代写
时间:2024-02-23
ELEC4840-Tutorial-01:
Building Your First Training Pipeline
& Debug Techniques
Yi QIN
19 Feb, 2024
About me
Email: yqinar@connect.ece.ust.hk (will reply within 1 day)
Office hours: Wed. 3pm - 4pm (Room 3117)
Overview: Experimental Science
Programming Language Deep Learning Architecture
● Python
Experiment Platform
● Pytorch
Trilogy:
Ø Model
Ø Target Function
Ø Data & Processing
(Augmentation)
--By Saining XIE
● Google Colaboratory
● GPU Servers
The coding experiments will be implemented by three core components.
Overview: Deep Learning Coding Pipeline
Whatever complicated or easy the task is, DL coding always follows the same pipeline.
Prepare experiment platform and environments
• Colab usage (Open, run, save, import, export)
• Pip installation and virtual environments (torch/torchvision/numpy/…)
Prepare your data
• Construct Dataset
Prepare dataset from ‘torchhub/torchvision’
Prepare your own Dataset:
Basic elements (init(), getitem(), len())
Data augmentations: 2D, 3D, randomization.
Behaviour difference between train/eval.
• Construct Dataloader
Split train/valid/test
Controlling key variables (shuffle, batch_size, pin_memory, num_workers)
Overview: Deep Learning Coding Pipeline
Prepare your model
• Construct models
Prepare models from torchvision/hugging face
Prepare your own models:
Basic elements (init, member variables(self. ), forward)
Basic operators (Conv, Linear, Dropout, Normalisation)
Advanced architectures: ModuleList, Sequential
• Initialize your model:
Initialize model with certain distributions (kaiming/xavier/../zero mean)
Load whole pretrained models from torchvision/huggingface
Partially load pretrained models (load backbone)
Freeze/unfreeze model params
• Save model weights and checkpoints
Overview: Deep Learning Coding Pipeline
Prepare optimization and evaluation protocol
• Prepare optimization tools:
Define loss function (from torch / self-define)
Define optimizers (lr, beta, momentum,...)
Define schedulers (cosine, reduce on plateau, linear, …)
• Prepare evaluation tools:
Define metrics (acc, F1, …) (from scikit / torch / self-define)
Define averaging metrics over an epoch
Define differences between training/eval/testing metrics.
Overview: Deep Learning Coding Pipeline
Prepare your training and validation procedure
• Before training
Put everything in GPU
Define key training parameters (epoch / eval interval / saving interval…)
• On training step
zero_grad(), forward, loss, backward(), step()..
Training loss logging
• On training epoch end
Calculate training loss/metric over a epoch
• On validation during training
Set model to eval mode and obtain results
Record the best model and save parameters
• On training ends:
Load best model and testing
Overview: Deep Learning Coding Pipeline
Visualization and Logging
• Matplotlib
• Tensorboard / wandb
Debug
Prepare Experiment Platform and Environments
Google Colab
● A platform that allows you to run Python
codes.
● Free GPUs.
● Jupyter Notebook based.
Python Environments
● Python comes along with ‘packages’:
Powerful extensions.
● Install and use these extensions via
package managers: ‘pip’.
Prepare Colab: Setup
1. Sign up for a Google account.
https://colab.research.google.com/
2. File --> New notebook
Prepare Colab: Coding in Notebook
Code Blocks
● Type in Python codes and execute them.
Text Blocks
● Type in Markdown texts to explain your code.
Prepare Colab: Running Codes in Colab
In most cases, the code in the cell will start running after hitting the ‘run’ button.
GPU resources will be assigned to your running code automatically.
Prepare Colab: Running Codes in Colab
You may wish to manage your GPU resources in Colab (stop using GPU/Upgrade GPU) here.
Edit -> Notebook settings -> Hardware accelerator -> GPU
Prepare Colab: Bind Google Drive to Notebook
When the dataset is large / large pretrained model is needed, you can store your file in Google
Drive then bind it to the running notebook.
Prepare Python Environments
In most cases, Colab has installed the common packages you need.
If there are extra packages needed, you may install them via:
In the notebook code cell, type in: ! pip install [package name]
For example:
! pip install SimpleITK
After successful installation, you may import and use them in your code via:
In the notebook code cell, type in: import [package name]
For example:
import SimpleITK
• import the whole package: import [package name]
• import a certain function/class: from [package name] import [function/class name]
• import a package with the alias: import [package name] as [alias] (for your convenience)
Prepare Data
Step1: Create Dataset
● The class that you load datafile,
preprocess and augment data, and define
how the data is given to the network.
Step2: Create Dataloader
● The class in Pytorch that loads your
defined ‘Dataset’ and provides an iterable
object that can be used with a ‘for’ loop.
Defined ‘Dataset’ Shuffle? Batch size?...
‘Dataloader’ Feed data into the network using the ‘for’ loop.
+
Prepare Data: Create Dataset
Basic Components of ‘Dataset’
self.datalist
self.transform
Get data from index
Transform the data
Dataset
init()
Initialize the Dataset class,
define the components
……
__getitem__()
How the Dataset retrieves
data to feed the network.
Return the data
……
__len__() How much data is in this dataset.
Defines the data list to be loaded.
Defines how an image can be
loaded, augmented, and transformed
into a network-readable tensor.
Other advanced members…
Prepare Data: Create Dataset
Basics of transforms: You always have to define how the data is transformed.
ToTensor()
Resize(32,32)
Normalization(mean,std)
transforms.Compose
Other data augmentations you wanted
Raw Images
torch.tensors
from torchvision import transforms
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=0.5,std=0.5)
])
Step1: Import packages
Step2: Define transforms
Step3: Use transforms
tensor = transform(image)
Prepare Data: Create Dataset
Option 1: Load predefined Dataset
● Most of the popular datasets provide a
predefined ‘Dataset’ in the torchvision
package.
● Direct import and use them.
● Included dataset:
https://pytorch.org/vision/stable/datasets.html
Option2: Define custom Dataset
● Load your own data (medical 3D
images, self-curated data).
● Define your own data processing
procedures.
● Define how you want your data to feed
into the network.
● Covered in later Tutorials.
Prepare Data: Create Dataset
Option 1: Load predefined Dataset –MNIST example
● Step 1: import desired dataset and preprocessing function:
from torchvision import datasets, transforms
● Step 2: define transform as previous slides
● Step 3: initialize the Dataset you want:
example_dataset = datasets.MNIST(
root="./data/", # Where to save the data
transform=transform, # How the image is transformed
train=True, #The dataset is used for training
download=True)
Prepare Data: Create Dataloader
If the validation set is unprovided,
then we need to split a validation set from the training set.
● Step 1: calculate the training set size and validation size
split_train_size = int(0.8*(len(data_train)))
split_valid_size = len(data_train) - split_train_size
● Step 2: Split the dataset.
train_set, valid_set = torch.utils.data.random_split(
data_train, [split_train_size, split_valid_size])
Prepare Data: Create Dataloader
Define Dataloader:
example_loader = torch.utils.data.DataLoader(
dataset=train_set, # dataset to be loaded.
batch_size=64,
shuffle=True, # training True, valid and test False
num_workers=8, # processes number used for data loading
)
Prepare Data: Create Dataloader
You may check if the dataloader works normally using
for data in example_loader:
print(data) # variable is the data used for training.
Takeaways: You should understand these before proceed for next step:
● How the data is transformed/augmented?
● What are the parameters you set in dataloaders?
● What kinds of data does the dataloader return?
Prepare Model
Basic Components of a model
self.conv
self.linear
Get input
Pass through layers
Model
init()
Initialize the Model class,
define the network layers
……
forward()
How the model handles
input data using the
components defined in init() Return the output
……
Other advanced members…
Network
layers
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
## ----- write your code here
def forward(self, x):
output = xxx
return output
Prepare Model: Model Construction
torch.nn torch.nn.functional
Pytorch provides definition of different layers.
import torch.nn as nn
# This defines a function
example_linear=nn.Linear(in_features,out_features)
# The function can be used later.
Output = example_linear(input)
import torch.nn.functional as F
# This call a function can be used immediately.
output=F.linear(input,(out_features,in_features))
• Use this way to define models more often. • Use this way for some intermediate operation.
Prepare Model: Model Construction
In init() initialize different layers:
A ‘Layer’. Coding: nn.Linear(in_features=3,out_features=4)
A ‘Layer’. Coding: nn.Linear(in_features=?,out_features=?)
Prepare Model: Model Construction
In init() initialize different layers:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1=nn.Linear(3,4)
self.linear2=nn.Linear(4,2)
Prepare Model: Model Construction
More complicated operators, same mechanism.
nn.Conv2d(in_channels=1, out_channels=n1, kernel_size=5, stride=1, padding=2)
“C” in the tensor with shape “B,C,H,W” “5x5”/“3x3” kernel
Prepare Model: Model Construction
More complicated operators, same mechanism.
nn.Conv2d(in_channels=1, out_channels=n1, kernel_size=5, stride=1, padding=2)
“C” in the tensor with shape “B,C,H,W” “5x5”/“3x3” kernel
Prepare Model: Model Construction
More complicated operators, same mechanism.
nn.MaxPool2d(kernel_size=2, stride=2)
Prepare Model: Model Construction
More complicated operators, same mechanism.
x = x.view(-1, 4 * 4 * n2)
‘Flatten’ the tensor ‘x’ from shape (bs, n2, 4, 4) (4 dimensional)
to shape (bs, n2, 4, 4) (2dimensional)
Prepare Model: Forward Model
forward() connects all the components and outputs the results.
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1=nn.Linear(3,4)
self.linear2=nn.Linear(4,2)
def forward(self, x):
x = self.linear1(x)
output = self.linear2(x)
return output
The last layer’s output is
the next layer’s input.
Prepare Model: Forward Model
You can validate your model via printing the model architecture.
model= Model()
print(model)
You can verify whether this model is constructed correctly via.
model= Model()
output=model(input)
Prepare Optimization and Evaluation
We need to prepare optimization tools to optimize our defined model.
Optimization Toolsets
Loss Function
Calculate the error
the model predicts.
Optimizer
Update the model’s
parameter based on the
loss function’s output.
Scheduler (Optional)
Manage the optimizer’s
learning rate.
Prepare Optimization and Evaluation
Loss Function: Just like layers, also can be found in torch.nn or torch.nn.functional.
import torch.nn as nn
mse = nn.MSELoss() # Define the loss function
loss = mse(model_output, ground_truth) # The output is the loss function.
Example: Mean Squared Error
Defining your own loss function will be covered in later tutorial.
Prepare Optimization and Evaluation
Optimizers: Can be initialized from torch.optim
Name Shared Parameters Optim Specific Parameters and Empirical Range
SGD Model params to be optimized:model.parameters()
Learning rate:
lr
Momentum: (0.9-0.99)
Adam
betas: (beta1,beta2)
[beta1: (0.5,0.9,0.95,0.99)
beta2: (0.9,0.95,0.99)]
weight_decay: (0,1e-4, 1e-5, 1e-6)
Prepare Optimization and Evaluation
Optimizers: Can be initialized from torch.optim
import torch
optimizer=torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
Example: SGD
Prepare Optimization and Evaluation
We need to prepare evaluation metrics to evaluate the performance of the model.
For classification task
• Accuracy: Prediction matches the labels.
accuracy = torch.sum(prediction == labels)
sklearn.metrics.accuracy_score(y_true, y_pred)
• F1
sklearn.metrics.f1_score(y_true, y_pred)
• Precision: the ability of the classifier not to label as positive a sample that is negative.
sklearn.metrics.precision_score(y_true, y_pred)
API Reference — scikit-learn 1.4.1 documentation
Prepare Experiment Procedure
Three steps when running a experiment: Train, Evaluation, Testing
Evaluation Testing
● Used to evaluate model
performance DURING
TRAINING.
● Model: Disable dropout
● Data: No data
augmentation & shuffle
● Separated from training
set.
Training
● Used to evaluate model
performance AFTER
TRAINING.
● Model & Data: Same as
evaluation
● The data should never be
used in training/evaluation.
● Model: Enable dropout
● Data: Perform data
augmentation & random
shuffle
● Training set
Prepare Experiment Procedure
Core experiment procedure:
Training
Reached Validation
Interval? (10 epoch..)
Validation Save best record
Testing
No Yes
Experiment concepts:
● Epoch: looping over the
entire dataset
● Step: training over a mini-
batch
Prepare Experiment Procedure
Core training procedure:
Put data into GPU
Clear gradient
Forward the model
Calculate loss
Calculate gradient
Update parameters
for data in tqdm(loader_train):
images, labels = data
images, labels = images.cuda(), labels.cuda()
# set all gradients to zero
optimizer.zero_grad()
# model forward
# Please refer to the previous slide.
# calculate loss
# Please refer to the previous slide.
loss.backward()
optimizer.step()
Put model into GPU
Set model in training mode
model=model.cuda()
model.train()
Prepare Experiment Procedure
Core validation procedure (Classification):
Set model in eval mode model.eval()
Forward the model
Calculate metric
Compare with previous best
Postprocessing
(Optional) Save model
# Refer to the previous slide
# Return the index number which has the largest probability
pred = torch.argmax(outputs, dim=1)
torch.sum(pred == labels)
# Use a if clause here to compare
torch.save(model.state_dict(), ‘file_path.pth')
Prepare Experiment Procedure
Core testing procedure:
Set model in eval mode model.eval()
Forward the model
Calculate metric
Summarize
Postprocessing
# Refer to the previous slide
# Return the index number which has the largest probability
pred = torch.argmax(outputs, dim=1)
torch.sum(pred == labels)
Load the saved model model.load_state_dict(torch.load(‘filename.pth'))
Visualize and Logging
Core visualization tool: Matplotlib
Logging will be covered in future tutorials.
import matplotlib.pyplot as plt
y_value=[1,2,3,4,5]
fig = plt.figure()
plt.plot(y_value,color='blue’)
plt.xlabel('Epoch')
plt.ylabel('Training loss')
Prepare y value to be plotted
Create new “figure”
Set texts
Plot figure
How to debug?
● Bug comes along with programs.
● How to eliminate bugs?
Read Search
All errors can be traced by:
● Read errors.
● Read problematic codes.
● Read spurious variables.
● Visualize your
intermediate results.
Ask
10% errors can be asked via:
● Open a Github issue.
● Ask TAs via formatted email.
● Send emails to Corresp. Author.
90% errors can be found in:
● Stack Overflow
● Github Issues
● Google
● GPTs (poe.com)
Debug - Read
Python provides direct error tracing in the console log.
Find: “Traceback (most recent call last)”
The first stack
indicates which line of
your code goes wrong.
Check the code there.
The last part of the
error explains what
goes wrong.
Debug - Read
After finding the problematic code from the last page, you can:
● Print out spurious variables to check if it is identical to your design.
● Log the internal images out to check if it is identical to your design.
2D cases:
import matplotlib.pyplot as plt
# Suppose you want to check a tensor:
img_to_check=tensor_to_check.detach().cpu().numpy()
# detach() -> split it from gradient calculation
# cpu() -> transfer the tensor from GPU to CPU
# numpy() -> convert the tensor to numpy for plotting.
plt.imshow(img_to_check,cmap=‘gray’)
plt.savefig(‘filename.png’)
Debug - Read
After finding the problematic code from the last page, you can:
● Print out spurious variables to check if it is identical to your design.
● Log the internal images out to check if it is identical to your design.
3D cases:
import SimpleITK as sitk
# Suppose you want to check a tensor:
img_to_check=tensor_to_check.detach().cpu().numpy()
# detach() -> split it from gradient calculation
# cpu() -> transfer the tensor from GPU to CPU
# numpy() -> convert the tensor to numpy for plotting.
img_to_save=sitk.GetImageFromArray(img_to_check)
sitk.WriteImage(img_to_save,’filename.nii.gz’)
# Then open it in the ITKSnap.
Debug - Search
Simply search your error (the last part of the error logged in the console) in Google.
Or ask GPT.
Most of the common mistakes have the answer.
Debug - Ask
Describe the bug
Describe the error shown in the console.
To Reproduce
Describe how we can reproduce the error you make.
Expected behavior
What should be done, if the program works well.
Screenshots
Screenshots of your problematic code and error.
Environments
Describe what env you used to run the code. (Linux/Windows? Pytorch version? …)
Additional context
Additional information
Standard template when asking an unsolved bug at GitHub – from Project MONAI