Models¶
Basic usage¶
To instantiate a Deformable DETR R50 (resnet50 backbone):
from alonet.deformable_detr import DeformableDetrR50 model = DeformableDetrR50(num_classes=NUM_CLASS)
To instantiate a Deformable DETR R50 (resnet50 backbone) with iterative box refinement:
from alonet.deformable_detr import DeformableDetrR50Refinement model = DeformableDetrR50Refinement(num_classes=NUM_CLASS)
If you want to finetune from the model pretrained on COCO dataset:
from alonet.deformable_detr import DeformableDetrR50Finetune # NUM_CLASS is the number of classes in your finetune model = DeformableDetrR50Finetune(num_classes=NUM_CLASS, weights="deformable-detr-r50")# with iterative box refinement from alonet.deformable_detr import DeformableDetrR50RefinementFinetune # NUM_CLASS is the number of classes in your finetune model = DeformableDetrR50RefinementFinetune(num_classes=NUM_CLASS, weights="deformable-detr-r50-refinement")
To run inference:
from aloscene import Frame device = model.device # supposed that `model` is already defined as above # read image and preprocess image with Resnet normalization frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet() # create a batch from a list of images frames = aloscene.Frame.batch_list([frame]) frames = frames.to(device) # forward pass m_outputs = model(frames) # get predicted boxes as aloscene.BoundingBoxes2D from forward outputs pred_boxes = model.inference(m_outputs) # Display the predicted boxes frame.append_boxes2d(pred_boxes[0], "pred_boxes") frame.get_view([frame.boxes2d]).render()
Deformable DETR Base¶
- class alonet.deformable_detr.deformable_detr.DeformableDETR(backbone, transformer, num_classes, num_queries=300, num_feature_levels=4, aux_loss=True, with_box_refine=False, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, weights=None, device=device(type='cuda'), activation_fn='sigmoid', return_intermediate_dec=True, strict_load_weights=True)¶
Bases:
torch.nn.modules.module.Module
The Deformable DETR module for object detection. For more details, check its paper https://arxiv.org/abs/2010.04159
- INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))¶
- build_backbone(backbone_name='resnet50', train_backbone=True, return_interm_layers=True, dilation=False)¶
Build backbone for Deformable DETR
- Parameters
- backbone_namestr, optional
name in torchvision.models, by default “resnet50”
- train_backbonebool, optional
By default True
- return_interm_layersbool, optional
Needed if we use segmentation or multi-scale, by default True
- dilationbool, optional
If True, we replace stride with dilation in the last convolutional block (DC5). By default False.
- Returns
- alonet.deformable_detr.backbone.Backbone
Resnet backbone
- build_decoder(dec_layers=6, return_intermediate_dec=True)¶
- build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, num_feature_levels=4, dec_n_points=4)¶
- build_positional_encoding(hidden_dim=256)¶
- build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=1024, enc_layers=6, dec_layers=6, num_feature_levels=4, dec_n_points=4, enc_n_points=4, return_intermediate_dec=True)¶
- forward(frames, **kwargs)¶
Deformable DETR
- Parameters
- frames: aloscene.Frame
batched images, of shape [batch_size x 3 x H x W] with frames.mask: a binary mask of shape [batch_size x 1 x H x W], containing 1 on padded pixels
- Returns
- dict
- “pred_logits”: logits classification (including no-object) for all queries.
If self.activation_fn = “softmax”, shape = [batch_size x num_queries x (num_classes + 1)] If self.activation_fn = “sigmoid”, shape = [batch_size x num_queries x num_classes]
- “pred_boxes”: The normalized boxes coordinates for all queries, represented as
(center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.
- “aux_outputs”: Optional, only returned when auxilary losses are activated. It is a list of
dictionnaries containing the two above keys for each decoder layer.
“activation_fn”: str, “sigmoid” or “softmax” based on model configuration
- forward_class_heads(transformer_outptus)¶
- forward_heads(transformer_outptus, bb_outputs=None, **kwargs)¶
Apply Deformable heads
- forward_position_heads(transformer_outptus)¶
- get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, threshold=None, activation_fn=None)¶
Given the model outs_scores and the model outs_labels, return a list of filter for each output. If out_scores and outs_labels are not provided, the method will rely on the model forward outputs m_outputs to extract the outs_scores and the outs_labels on its own.
- Parameters
- outs_scorestorch.Tensor, optional
Predicted scores, by default None
- outs_labelstorch.Tensor, optional
Predicted labels, by default None
- m_outputsdict, optional
Dict of forward outputs, by default None
- thresholdfloat, optional
Score threshold to use. if None and sigmoid is used, 0.2 will be used as default value.
- softmax_threshold: float, optinal
Score threshold if softmax activation is used. None by default.
- activation_fnstr, optional
Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.
- Returns
- List[torch.Tensor]
List of filter to select the query predicting an object, len = batch size
- rtype
List
[Tensor
] ..
- get_outs_labels(m_outputs=None, activation_fn=None)¶
Given the model outs_scores and the model outs_labels, return the labels and the associated scores.
- Parameters
- m_outputsdict, optional
Dict of forward outputs, by default None
- thresholdfloat, optional
Score threshold if sigmoid activation is used. By default 0.2
- activation_fnstr, optional
Either “sigmoid” or “softmax”. By default None. If “sigmoid” is used, filter is based on score threshold. If “softmax” is used, filter is based on non-background classes.
- Returns
- Tuple
(torch.Tensor, torch.Tensor) being the predicted labels and scores
- rtype
List
[Tensor
] ..
- inference(forward_out, threshold=0.2, filters=None, **kwargs)¶
Get model outptus as returned by the the forward method
- training: bool¶
- alonet.deformable_detr.deformable_detr.build_deformable_detr_r50(num_classes=91, box_refinement=True, weights=None, device=device(type='cuda'))¶
[summary]
- Parameters
- num_classesint, optional
Number of classes for objection detection, by default 91
- box_refinementbool, optional
Use iterative box refinement, by default True
- weightsstr, optional
Pretrained weights, by default None
- devicetorch.device, optional
By default torch.device(“cuda”)
- Returns
- DeformableDETR
- rtype
Deformable DETR R50¶
- class alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50(*args, return_intermediate_dec=True, num_classes=91, **kwargs)¶
Bases:
alonet.deformable_detr.deformable_detr.DeformableDETR
Deformable Detr with Resnet50 backbone
- training: bool¶
Deformable DETR R50 with refinement¶
- class alonet.deformable_detr.deformable_detr_r50_refinement.DeformableDetrR50Refinement(*args, return_intermediate_dec=True, num_classes=91, **kwargs)¶
Bases:
alonet.deformable_detr.deformable_detr.DeformableDETR
Deformable Detr with Resnet50 backbone with box refinement
- training: bool¶
Deformable DETR R50 Finetune¶
- class alonet.deformable_detr.deformable_detr_r50_finetune.DeformableDetrR50Finetune(num_classes, activation_fn='sigmoid', base_weights='deformable-detr-r50', weights=None, **kwargs)¶
Bases:
alonet.deformable_detr.deformable_detr_r50.DeformableDetrR50
Pre made helpfull class to finetune the Deformable
Deformable DetrR50
model on a custom class.- Parameters
- num_classesint
Number of classes to use
- activation_fnstr, optional
Activation function to use in
class_embed
layer, by default “sigmoid”- base_weightsstr, optional
DetrR50 weights, by default “deformable-detr-r50”
- weightsstr, optional
Load weights from pth or ckpt file, by default None
- *argsNamespace
Arguments used in
Deformable DetrR50
module- **kwargsdict
Aditional arguments used in
Deformable DetrR50
module
- Raises
- Exception
activation_fn
must be “softmax” or “sigmoid”. However,activation_fn
= “softmax” implies to work with background class. That means increases in one thenum_classes
automatically.
- training: bool¶
Deformable DETR R50 Finetune with refinement¶
- class alonet.deformable_detr.deformable_detr_r50_finetune.DeformableDetrR50RefinementFinetune(num_classes, activation_fn='sigmoid', base_weights='deformable-detr-r50-refinement', weights=None, **kwargs)¶
Bases:
alonet.deformable_detr.deformable_detr_r50_refinement.DeformableDetrR50Refinement
Pre made helpfull class to finetune the
Deformable DetrR50 with refinement
model on a custom class.- Parameters
- num_classesint
Number of classes to use
- activation_fnstr, optional
Activation function to use in
class_embed
layer, by default “sigmoid”- base_weightsstr, optional
DetrR50 weights, by default “deformable-detr-r50-refinement”
- weightsstr, optional
Load weights from pth or ckpt file, by default None
- *argsNamespace
Arguments used in
Deformable DetrR50 with refinement
module- **kwargsdict
Aditional arguments used in
Deformable DetrR50 with refinement
module
- Raises
- Exception
activation_fn
must be “softmax” or “sigmoid”. However,activation_fn
= “softmax” implies to work with background class. That means increases in one thenum_classes
automatically.
- training: bool¶