Finetuning Deformanble DETR

This tutorial explains how to use the classes in Deformable DETR Finetune module to train a custom model based on Deformable DetrR50 architecture for object detection application.

Goals

  1. Train a model based on Deformable DetrR50 architecture to predict faces, using Mask Wearing Dataset.

  2. Use the trained model to make inferences.

Attention

For this tutorial, Mask Wearing Dataset is a requirement. We assume that the dataset was downloaded and saved in a folder. For the purposes of this tutorial, we will call such path as /paht/mask/wearing.

1. Train Deformable DETR50 Finetune

Deformable DETR Finetune contains two classes that differ only in the use of Iterative Bounding Box Refinement:

They are able to fix the number of classes of the last embedded layer to a desired value, without change the weights in previous layers.

See also

Note

This tutorial is based on Deformable DETR R50 Finetune with refinement. However, the same steps can be used for Deformable DETR R50 Finetune.

For training purposes, it is usual in Aloception to define a model on Pytorch lightning module. With a finetune model, the architecture definition changes, but the training process remains static:

[ ]:
from argparse import Namespace, ArgumentParser
import alonet
from alonet.detr import CocoDetection2Detr
from alonet.deformable_detr import LitDeformableDetr, DeformableDetrR50RefinementFinetune

# Build parser
parser = ArgumentParser()
parser = alonet.common.add_argparse_args(parser) # Common alonet parser
args = parser.parse_args([])
args.no_suffix = True # Fix run_id = expe_name

# Setup database
coco_loader = CocoDetection2Detr(
    name = "people_mask",
    train_folder = "train",
    train_ann = "train/_annotations.coco.json",
    val_folder = "valid",
    val_ann = "valid/_annotations.coco.json",
)
num_classes = len(coco_loader.CATEGORIES)

# Architecture definition
deformabe_finetune = DeformableDetrR50RefinementFinetune(
    num_classes = num_classes,
    weights = "deformable-detr-r50-refinement",
    activation_fn = "softmax"
)
lit_deformable = LitDeformableDetr(model = deformabe_finetune)

# Start train loop
args.max_epochs = 5 # Due to finetune, we just need 5 epochs to train this model
args.save = True
lit_deformable.run_train(
    data_loader = coco_loader,
    args = args,
    project = "deformable_detr",
    expe_name = "people_mask"
)

Note

CocoDetection2Detr definition will deploy an user prompt task to know where was storaged Mask Wearing Dataset. Here, the user must input /paht/mask/wearing. Check How to setup your data tutorial to know more about that.

Once the process has been completed, the $HOME/.aloception/project_run_id/run_id folder folder will be created with the different checkpoint files.

2. Make inferences

In order to make some inferences on the dataset using the trained model, we need to load the weights. For that, we can use one function in Alonet for this purpose. Also, we need to keep in mind the project and run id that we used in training process:

[ ]:
import torch
from alonet.common import load_training

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

# Define the architecture
detr_finetune = DeformableDetrR50RefinementFinetune(num_classes, activation_fn = "softmax")

# Load weights according project_run_id and run_id
args = Namespace(
    project_run_id = "deformable_detr",
    run_id = "people_mask",
)
lit_deformable = load_training(
    LitDeformableDetr,
    args = args,
    model = detr_finetune)
lit_deformable.model.to(device)

This enables to use the valid dataset and show some results:

[ ]:
frames = next(iter(coco_loader.val_dataloader()))
frames = frames[0].batch_list(frames).to(device)
pred_boxes = lit_deformable.inference(lit_deformable(frames))[0] # Inference from forward result
gt_boxes = frames[0].boxes2d # Get ground truth boxes

frames.get_view([
    gt_boxes.get_view(frames[0], title="Ground truth boxes"),
    pred_boxes.get_view(frames[0], title="Predicted boxes"),
]).render()

What is next ?

Learn how to export DETR/Deformable DETR models to tensorRT.

3. Optional: Make prediction in camera

If there is access to a local camera, the following code would allow you to take snapshots with the camera and make predictions at the same time:

[ ]:
%matplotlib inline
import cv2
from aloscene import Frame

def frame_process(frame):
    frame = Frame(frame[...,::-1].copy(), names = ("H", "W", "C"))
    frame = frame.transpose(0,2).transpose(1,2).to(device)
    frame = frame.batch_list([frame])
    return transform(frame)

def transform(frame):
    return frame.norm_resnet()

cap = cv2.VideoCapture(0)
if cap.isOpened():
    ret, frame = cap.read()

    # Image preprocessing and make predictions
    frame = frame_process(frame).to(device)
    pred_boxes = lit_deformable.inference(lit_deformable(frame))

    # Show result
    frame.get_view([
        pred_boxes[0].get_view(frame[0], title="Predicted boxes")
    ], size=(500,700)).render()

    # Close camera
    cap.release()
    cv2.destroyAllWindows()
else:
    print("[ERROR] Impossible to open camera")