Models

Basic usage

To instantiate a Deformable DETR R50 (resnet50 backbone):

from alonet.deformable_detr import DeformableDetrR50
model = DeformableDetrR50(num_classes=NUM_CLASS)

To instantiate a Deformable DETR R50 (resnet50 backbone) with iterative box refinement:

from alonet.deformable_detr import DeformableDetrR50Refinement
model = DeformableDetrR50Refinement(num_classes=NUM_CLASS)

If you want to finetune from the model pretrained on COCO dataset:

from alonet.deformable_detr import DeformableDetrR50Finetune
# NUM_CLASS is the number of classes in your finetune
model = DeformableDetrR50Finetune(num_classes=NUM_CLASS, weights="deformable-detr-r50")
# with iterative box refinement
from alonet.deformable_detr import DeformableDetrR50RefinementFinetune
# NUM_CLASS is the number of classes in your finetune
model = DeformableDetrR50RefinementFinetune(num_classes=NUM_CLASS, weights="deformable-detr-r50-refinement")

To run inference:

from aloscene import Frame
device = model.device # supposed that `model` is already defined as above

# read image and preprocess image with Resnet normalization
frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet()
# create a batch from a list of images
frames = aloscene.Frame.batch_list([frame])
frames = frames.to(device)

# forward pass
m_outputs = model(frames)
# get predicted boxes as aloscene.BoundingBoxes2D from forward outputs
pred_boxes = model.inference(m_outputs)
# Display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view([frame.boxes2d]).render()

Deformable DETR Base

class alonet.deformable_detr.deformable_detr.DeformableDETR(backbone, transformer, num_classes, num_queries=300, num_feature_levels=4, aux_loss=True, with_box_refine=False, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, weights=None, device=device(type='cuda'), activation_fn='sigmoid', return_intermediate_dec=True, strict_load_weights=True)

Bases: torch.nn.modules.module.Module

The Deformable DETR module for object detection. For more details, check its paper https://arxiv.org/abs/2010.04159

INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
build_backbone(backbone_name='resnet50', train_backbone=True, return_interm_layers=True, dilation=False)

Build backbone for Deformable DETR

Parameters
backbone_namestr, optional

name in torchvision.models, by default “resnet50”

train_backbonebool, optional

By default True

return_interm_layersbool, optional

Needed if we use segmentation or multi-scale, by default True

dilationbool, optional

If True, we replace stride with dilation in the last convolutional block (DC5). By default False.

Returns
alonet.deformable_detr.backbone.Backbone

Resnet backbone

build_decoder(dec_layers=6, return_intermediate_dec=True)
build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, num_feature_levels=4, dec_n_points=4)
build_positional_encoding(hidden_dim=256)
build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, enc_layers=6, dec_layers=6, num_feature_levels=4, dec_n_points=4, enc_n_points=4, return_intermediate_dec=True)
forward(frames, **kwargs)

Deformable DETR

Parameters
frames: aloscene.Frame

batched images, of shape [batch_size x 3 x H x W] with frames.mask: a binary mask of shape [batch_size x 1 x H x W], containing 1 on padded pixels

Returns
dict
  • “pred_logits”: logits classification (including no-object) for all queries.

    If self.activation_fn = “softmax”, shape = [batch_size x num_queries x (num_classes + 1)] If self.activation_fn = “sigmoid”, shape = [batch_size x num_queries x num_classes]

  • “pred_boxes”: The normalized boxes coordinates for all queries, represented as

    (center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.

  • “aux_outputs”: Optional, only returned when auxilary losses are activated. It is a list of

    dictionnaries containing the two above keys for each decoder layer.

  • “activation_fn”: str, “sigmoid” or “softmax” based on model configuration

forward_class_heads(transformer_outptus)
forward_heads(transformer_outptus, bb_outputs=None, **kwargs)

Apply Deformable heads

forward_position_heads(transformer_outptus)
get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, threshold=None, activation_fn=None)

Given the model outs_scores and the model outs_labels, return a list of filter for each output. If out_scores and outs_labels are not provided, the method will rely on the model forward outputs m_outputs to extract the outs_scores and the outs_labels on its own.

Parameters
outs_scorestorch.Tensor, optional

Predicted scores, by default None

outs_labelstorch.Tensor, optional

Predicted labels, by default None

m_outputsdict, optional

Dict of forward outputs, by default None

thresholdfloat, optional

Score threshold to use. if None and sigmoid is used, 0.2 will be used as default value.

softmax_threshold: float, optinal

Score threshold if softmax activation is used. None by default.

activation_fnstr, optional

Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.

Returns
List[torch.Tensor]

List of filter to select the query predicting an object, len = batch size

rtype

List[Tensor] ..

get_outs_labels(m_outputs=None, activation_fn=None)

Given the model outs_scores and the model outs_labels, return the labels and the associated scores.

Parameters
m_outputsdict, optional

Dict of forward outputs, by default None

thresholdfloat, optional

Score threshold if sigmoid activation is used. By default 0.2

activation_fnstr, optional

Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.

Returns
Tuple

(torch.Tensor, torch.Tensor) being the predicted labels and scores

rtype

List[Tensor] ..

inference(forward_out, threshold=0.2, filters=None, **kwargs)

Get model outptus as returned by the the forward method

training: bool
alonet.deformable_detr.deformable_detr.build_deformable_detr_r50(num_classes=91, box_refinement=True, weights=None, device=device(type='cuda'))

[summary]

Parameters
num_classesint, optional

Number of classes for objection detection, by default 91

box_refinementbool, optional

Use iterative box refinement, by default True

weightsstr, optional

Pretrained weights, by default None

devicetorch.device, optional

By default torch.device(“cuda”)

Returns
DeformableDETR
rtype

DeformableDETR ..

Deformable DETR R50

class alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50(*args, return_intermediate_dec=True, num_classes=91, **kwargs)

Bases: alonet.deformable_detr.deformable_detr.DeformableDETR

Deformable Detr with Resnet50 backbone

training: bool

Deformable DETR R50 with refinement

class alonet.deformable_detr.deformable_detr_r50_refinement.DeformableDetrR50Refinement(*args, return_intermediate_dec=True, num_classes=91, **kwargs)

Bases: alonet.deformable_detr.deformable_detr.DeformableDETR

Deformable Detr with Resnet50 backbone with box refinement

training: bool

Deformable DETR R50 Finetune

class alonet.deformable_detr.deformable_detr_r50_finetune.DeformableDetrR50Finetune(num_classes, activation_fn='sigmoid', base_weights='deformable-detr-r50', weights=None, **kwargs)

Bases: alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50

Pre made helpfull class to finetune the Deformable Deformable DetrR50 model on a custom class.

Parameters
num_classesint

Number of classes to use

activation_fnstr, optional

Activation function to use in class_embed layer, by default “sigmoid”

base_weightsstr, optional

DetrR50 weights, by default “deformable-detr-r50”

weightsstr, optional

Load weights from pth or ckpt file, by default None

*argsNamespace

Arguments used in Deformable DetrR50 module

**kwargsdict

Aditional arguments used in Deformable DetrR50 module

Raises
Exception

activation_fn must be “softmax” or “sigmoid”. However, activation_fn = “softmax” implies to work with background class. That means increases in one the num_classes automatically.

training: bool

Deformable DETR R50 Finetune with refinement

class alonet.deformable_detr.deformable_detr_r50_finetune.DeformableDetrR50RefinementFinetune(num_classes, activation_fn='sigmoid', base_weights='deformable-detr-r50-refinement', weights=None, **kwargs)

Bases: alonet.deformable_detr.deformable_detr_r50_refinement.DeformableDetrR50Refinement

Pre made helpfull class to finetune the Deformable DetrR50 with refinement model on a custom class.

Parameters
num_classesint

Number of classes to use

activation_fnstr, optional

Activation function to use in class_embed layer, by default “sigmoid”

base_weightsstr, optional

DetrR50 weights, by default “deformable-detr-r50-refinement”

weightsstr, optional

Load weights from pth or ckpt file, by default None

*argsNamespace

Arguments used in Deformable DetrR50 with refinement module

**kwargsdict

Aditional arguments used in Deformable DetrR50 with refinement module

Raises
Exception

activation_fn must be “softmax” or “sigmoid”. However, activation_fn = “softmax” implies to work with background class. That means increases in one the num_classes automatically.

training: bool