Models¶

Basic usage¶

To instantiate a Deformable DETR R50 (resnet50 backbone):

from alonet.deformable_detr import DeformableDetrR50
model = DeformableDetrR50(num_classes=NUM_CLASS)

To instantiate a Deformable DETR R50 (resnet50 backbone) with iterative box refinement:

from alonet.deformable_detr import DeformableDetrR50Refinement
model = DeformableDetrR50Refinement(num_classes=NUM_CLASS)

If you want to finetune from the model pretrained on COCO dataset:

from alonet.deformable_detr import DeformableDetrR50Finetune
# NUM_CLASS is the number of classes in your finetune
model = DeformableDetrR50Finetune(num_classes=NUM_CLASS, weights="deformable-detr-r50")

# with iterative box refinement
from alonet.deformable_detr import DeformableDetrR50RefinementFinetune
# NUM_CLASS is the number of classes in your finetune
model = DeformableDetrR50RefinementFinetune(num_classes=NUM_CLASS, weights="deformable-detr-r50-refinement")

To run inference:

from aloscene import Frame
device = model.device # supposed that `model` is already defined as above

# read image and preprocess image with Resnet normalization
frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet()
# create a batch from a list of images
frames = aloscene.Frame.batch_list([frame])
frames = frames.to(device)

# forward pass
m_outputs = model(frames)
# get predicted boxes as aloscene.BoundingBoxes2D from forward outputs
pred_boxes = model.inference(m_outputs)
# Display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view([frame.boxes2d]).render()

Deformable DETR Base¶

class alonet.deformable_detr.deformable_detr.DeformableDETR(backbone, transformer, num_classes, num_queries=300, num_feature_levels=4, aux_loss=True, with_box_refine=False, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, weights=None, device=device(type='cuda'), activation_fn='sigmoid', return_intermediate_dec=True, strict_load_weights=True)¶

Bases: torch.nn.modules.module.Module

The Deformable DETR module for object detection. For more details, check its paper https://arxiv.org/abs/2010.04159

INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))¶

build_backbone(backbone_name='resnet50', train_backbone=True, return_interm_layers=True, dilation=False)¶

Build backbone for Deformable DETR

Parameters

backbone_namestr, optional: name in torchvision.models, by default “resnet50”
train_backbonebool, optional: By default True
return_interm_layersbool, optional: Needed if we use segmentation or multi-scale, by default True
dilationbool, optional: If True, we replace stride with dilation in the last convolutional block (DC5). By default False.

Returns

alonet.deformable_detr.backbone.Backbone: Resnet backbone

build_decoder(dec_layers=6, return_intermediate_dec=True)¶

build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, num_feature_levels=4, dec_n_points=4)¶

build_positional_encoding(hidden_dim=256)¶

build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, enc_layers=6, dec_layers=6, num_feature_levels=4, dec_n_points=4, enc_n_points=4, return_intermediate_dec=True)¶

forward(frames, **kwargs)¶

Deformable DETR

Parameters

frames: aloscene.Frame: batched images, of shape [batch_size x 3 x H x W] with frames.mask: a binary mask of shape [batch_size x 1 x H x W], containing 1 on padded pixels

Returns

dict

“pred_logits”: logits classification (including no-object) for all queries.
If self.activation_fn = “softmax”, shape = [batch_size x num_queries x (num_classes + 1)] If self.activation_fn = “sigmoid”, shape = [batch_size x num_queries x num_classes]
“pred_boxes”: The normalized boxes coordinates for all queries, represented as
(center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.
“aux_outputs”: Optional, only returned when auxilary losses are activated. It is a list of
dictionnaries containing the two above keys for each decoder layer.
“activation_fn”: str, “sigmoid” or “softmax” based on model configuration

forward_class_heads(transformer_outptus)¶

forward_heads(transformer_outptus, bb_outputs=None, **kwargs)¶: Apply Deformable heads

forward_position_heads(transformer_outptus)¶

get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, threshold=None, activation_fn=None)¶

Given the model outs_scores and the model outs_labels, return a list of filter for each output. If out_scores and outs_labels are not provided, the method will rely on the model forward outputs m_outputs to extract the outs_scores and the outs_labels on its own.

Parameters

outs_scorestorch.Tensor, optional: Predicted scores, by default None
outs_labelstorch.Tensor, optional: Predicted labels, by default None
m_outputsdict, optional: Dict of forward outputs, by default None
thresholdfloat, optional: Score threshold to use. if None and sigmoid is used, 0.2 will be used as default value.
softmax_threshold: float, optinal: Score threshold if softmax activation is used. None by default.
activation_fnstr, optional: Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.

Returns

List[torch.Tensor]: List of filter to select the query predicting an object, len = batch size

rtype: List[Tensor] ..

get_outs_labels(m_outputs=None, activation_fn=None)¶

Given the model outs_scores and the model outs_labels, return the labels and the associated scores.

Parameters

m_outputsdict, optional: Dict of forward outputs, by default None
thresholdfloat, optional: Score threshold if sigmoid activation is used. By default 0.2
activation_fnstr, optional: Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.

Returns

Tuple: (torch.Tensor, torch.Tensor) being the predicted labels and scores

rtype: List[Tensor] ..

inference(forward_out, threshold=0.2, filters=None, **kwargs)¶: Get model outptus as returned by the the forward method

training: bool¶

alonet.deformable_detr.deformable_detr.build_deformable_detr_r50(num_classes=91, box_refinement=True, weights=None, device=device(type='cuda'))¶

[summary]

Parameters

num_classesint, optional: Number of classes for objection detection, by default 91
box_refinementbool, optional: Use iterative box refinement, by default True
weightsstr, optional: Pretrained weights, by default None
devicetorch.device, optional: By default torch.device(“cuda”)

Returns

DeformableDETR

rtype: DeformableDETR ..

Deformable DETR R50¶

class alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50(*args, return_intermediate_dec=True, num_classes=91, **kwargs)¶

Bases: alonet.deformable_detr.deformable_detr.DeformableDETR

Deformable Detr with Resnet50 backbone

training: bool¶

Deformable DETR R50 with refinement¶

class alonet.deformable_detr.deformable_detr_r50_refinement.DeformableDetrR50Refinement(*args, return_intermediate_dec=True, num_classes=91, **kwargs)¶

Bases: alonet.deformable_detr.deformable_detr.DeformableDETR

Deformable Detr with Resnet50 backbone with box refinement

training: bool¶

Deformable DETR R50 Finetune¶

class alonet.deformable_detr.deformable_detr_r50_finetune.DeformableDetrR50Finetune(num_classes, activation_fn='sigmoid', base_weights='deformable-detr-r50', weights=None, **kwargs)¶

Bases: alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50

Pre made helpfull class to finetune the Deformable Deformable DetrR50 model on a custom class.

Parameters

num_classesint: Number of classes to use
activation_fnstr, optional: Activation function to use in class_embed layer, by default “sigmoid”
base_weightsstr, optional: DetrR50 weights, by default “deformable-detr-r50”
weightsstr, optional: Load weights from pth or ckpt file, by default None
*argsNamespace: Arguments used in Deformable DetrR50 module
**kwargsdict: Aditional arguments used in Deformable DetrR50 module

Raises

Exception: activation_fn must be “softmax” or “sigmoid”. However, activation_fn = “softmax” implies to work with background class. That means increases in one the num_classes automatically.

training: bool¶

Models¶

Basic usage¶

Deformable DETR Base¶

Deformable DETR R50¶

Deformable DETR R50 with refinement¶

Deformable DETR R50 Finetune¶

Deformable DETR R50 Finetune with refinement¶