Models¶
Basic usage¶
To instantiate a DETR R50 (resnet50 backbone):
from alonet.detr import DetrR50 model = DetrR50()
If you want to finetune from the model pretrained on COCO dataset:
from alonet.detr import DetrR50Finetune # NUM_CLASS is the number of classes in your finetune model = DetrR50Finetune(num_classes=NUM_CLASS, weights="detr-r50")
To run inference:
from aloscene import Frame device = model.device # supposed that `model` is already defined as above # read image and preprocess image with Resnet normalization frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet() # create a batch from a list of images frames = aloscene.Frame.batch_list([frame]) frames = frames.to(device) # forward pass m_outputs = model(frames) # get predicted boxes as aloscene.BoundingBoxes2D from forward outputs pred_boxes = model.inference(m_outputs) # Display the predicted boxes frame.append_boxes2d(pred_boxes[0], "pred_boxes") frame.get_view([frame.boxes2d]).render()
Detr Base¶
End-to-End Object Detection with Transformers (DETR) model.
- class alonet.detr.detr.Detr(backbone, transformer, num_classes, num_queries, background_class=None, aux_loss=True, weights=None, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, device=device(type='cpu'), strict_load_weights=True)¶
Bases:
torch.nn.modules.module.Module
This is the DETR module that performs object detection
- Parameters
- backbonetorch.module
Torch module of the backbone to be used. See backbone.py
- transformertorch.module
Torch module of the transformer architecture. See transformer.py
- num_classesint
number of object classes
- num_queriesint
number of object queries, ie detection slot. This is the maximal number of objects DETR can detect in a single image. For COCO, we recommend 100 queries.
- background_classint, Optional
If none, the background_class will automaticly be set to be equal to the num_classes. In other word, by default, the background class will be set as the last class of the model
- weightsstr, Optional
Load weights from path or support
model_name
, by default None- devicetorch.device, Optional
Architecture makes in a specific device, by default torch.device(“cpu”)
- aux_lossbool, Optional
True if auxiliary decoding losses (loss at each decoder layer) are to be used, by default True
- return_dec_outputsbool, Optional
If True, the dict output will contains a key : “dec_outputs” with the decoder outputs of shape (stage, batch, num_queries, dim), by default False
- return_enc_outputsbool, Optional
If True, the dict output will contains a key : “enc_outputs” with the encoder outputs of shape (num_enc, stage, HB, WB), by default False
- return_bb_outputsbool, Optional
If True, the dict output will contains a key : “bb_outputs” with the the list of the different backbone outputs, by default False
- strict_load_weightsbool, Optional
Load the weights (if any given) with strict=True, by default True
- INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))¶
- build_backbone(backbone_name, train_backbone, return_interm_layers, dilation, aug_tensor_compatible=True)¶
Build backbone architecture
- Parameters
- backbone_namestr
Backbone name
- train_backbonebool
Train backbone parameters if required
- return_interm_layersbool
Return intermediate layers if required
- dilationbool
Use dilation
- aug_tensor_compatiblebool, optional
Compatibility with augmented tensors, by default True
- Returns
Backbone
Architecture used to encode input images
- build_bbox_embed()¶
MLP implemented to predict boxes coordinates
- Returns
- torch.nn
Multi-Layer perceptron with 4 neurons in last layer
- build_class_embed()¶
Layer defined to class embed
- Returns
- torch.nn
Class embed layer
- build_decoder(hidden_dim=256, num_decoder_layers=6)¶
Build decoder layer
- Parameters
- hidden_dimint, optional
Hidden dimension size, by default 256
- num_decoder_layersint, optional
Number of decoder layers, by default 6
- Returns
TransformerDecoder
Transformer decoder
- build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, normalize_before=False)¶
Build decoder layer
- Parameters
- hidden_dimint, optional
Hidden dimension size, by default 256
- dropoutfloat, optional
Dropout value, by default 0.1
- nheadsint, optional
Number of heads, by default 8
- dim_feedforwardint, optional
Feedfoward dimension size, by default 2048
- normalize_beforebool, optional
use normalize before each layer, by default False
- Returns
TransformerDecoderLayer
Transformer decoder layer
- build_positional_encoding(hidden_dim=256, position_embedding='sin', center=False)¶
Build the positinal encoding layer to combine input values with respect to theirs position
- Parameters
- hidden_dimint, optional
Hidden dimension size, by default 256
- position_embeddingstr, optional
Position encoding type, by default “sin”
- centerbool, optional
Use center in position encoding, by default False
- Returns
- torch.nn
Default architecture to encode input with values and theirs position
- Raises
- NotImplementedError
v3
andlearned
encoding types not support yet- ValueError
Support only
v2
andsine
encoding types
- build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, num_encoder_layers=6, num_decoder_layers=6, normalize_before=False)¶
Build transformer
- Parameters
- hidden_dimint, optional
Hidden dimension size, by default 256
- dropoutfloat, optional
Dropout value, by default 0.1
- nheadsint, optional
Number of heads, by default 8
- dim_feedforwardint, optional
Feedfoward dimension size, by default 2048
- num_encoder_layersint, optional
Number of encoder layers, by default 6
- num_decoder_layersint, optional
Number of decoder layers, by default 6
- normalize_beforebool, optional
use normalize before each layer, by default False
- Returns
Transformer
Transformer module
- forward(frames, **kwargs)¶
Detr model forward
- Parameters
- Returns
- dict
It outputs a dict with the following elements:
pred_logits
: The classification logits (including no-object) for all queries. Shape= [batch_size x num_queries x (num_classes + 1)]pred_boxes
: The normalized boxes coordinates for all queries, represented as (center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.aux_outputs
: Optional, only returned when auxilary losses are activated. It is a list of dictionnaries containing the two above keys for each decoder layer.bb_outputs
: Optional, only returned when backbone outputs are activated.enc_outputs
: Optional, only returned when transformer encoder outputs are activated.dec_outputs
: Optional, only returned when transformer decoder outputs are activated.
- forward_class_heads(transformer_outptus)¶
Forward from transformer decoder output into class_embed layer to get class predictions
- Parameters
- transformer_outptusdict
Output of transformer layer
- Returns
- torch.Tensor
Output of shpae [batch_size x num_queries x (num_classes + 1)]
- forward_heads(transformer_outptus, bb_outputs=None, **kwargs)¶
Apply Detr heads and make the final dictionnary output.
- Parameters
- transformer_outptusdict
Output of transformer layer
- bb_outputstorch.Tensor, optional
Backbone output to append in output, by default None
- Returns
- dict
Output describe in
forward()
function
- forward_position_heads(transformer_outptus)¶
Forward from transformer decoder output into bbox_embed layer to get box predictions
- Parameters
- transformer_outptusdict
Output of transformer layer
- Returns
- torch.Tensor
Output of shpae [batch_size x num_queries x 4]
- get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, background_class=None, threshold=None, **kwargs)¶
Given the model outs_scores and the model outs_labels whit method return a list of filter for each output. If
out_scores
andouts_labels
are not provided, the method will rely on the model forward outputs (m_outputs
) to extract theouts_scores
and theouts_labels
on its own.- Parameters
- outs_scorestorch.Tensor, Optional
Output score from
forward()
, by default None- outs_labelstorch.Tensor, Optional
Output labels from
forward()
, by default None- m_outputsdict, Optional
Forward outputs, by default None
- background_classint, Optional
ID background class, used to filter classes, by default
background_class
defined in constructor- thresholdfloat, Optional
Threshold value to filter classes by score, by default not implement
- Returns
- filters: list
List of filter to select the query predicting an object.
- get_outs_labels(m_outputs)¶
This method will return the label and class of each slot.
- Parameters
- m_outputs: dict
Model forward output
- Returns
- labels: torch.Tensor
predictec class for each slot
- scores: torch.Tensor
predicted score for each slot
- inference(forward_out, filters=None, background_class=None, threshold=None)¶
Given the model forward outputs, this method will return an
BoundingBoxes2D
tensor.- Parameters
- forward_outdict
Dict with the model forward outptus
- filterslist
list of torch.Tensor will a filter on which prediction to select to create the set of
BoundingBoxes2D
.
- Returns
- boxes
BoundingBoxes2D
Boxes filtered and predicted by forward outputs
- boxes
- training: bool¶
Detr R50¶
DETR model, that use the parameters of original DETR-R50 architecture.
- class alonet.detr.detr_r50.DetrR50(*args, num_classes=91, background_class=91, **kwargs)¶
Bases:
alonet.detr.detr.Detr
DETR R50 as described in the paper: https://arxiv.org/abs/2005.12872
- Parameters
- num_classesint, optional
Neuron number in embed layer, by default 91
- background_classint, optional
Id use for background class, by default 91
- *argsNamespace
Positional arguments (see Detr <detr> class)
- **kwargs: Dict
Additional parameters (see Detr <detr> class)
- training: bool¶
- alonet.detr.detr_r50.main(image_path)¶
Detr R50 Finetune¶
Module to create a custom DetrR50
model which allows to upload a decided pretrained
weights and change the class_embed layer in order to train custom classes.
- class alonet.detr.detr_r50_finetune.DetrR50Finetune(num_classes, background_class=None, base_weights='detr-r50', weights=None, *args, **kwargs)¶
Bases:
alonet.detr.detr_r50.DetrR50
Pre made helpfull class to finetune the
DetrR50
model on a custom class.- Parameters
- num_classesint
Number of classes to use
- background_classint, optional
Background class id, by default the last id
- base_weightsstr, optional
DetrR50 weights, by default “detr-r50”
- weightsstr, optional
Load weights from pth or ckpt file, by default None
- *argsNamespace
Arguments used in
DetrR50
module- **kwargsdict
Aditional arguments used in
DetrR50
module
- training: bool¶