Models¶

Basic usage¶

To instantiate a DETR R50 (resnet50 backbone):

from alonet.detr import DetrR50
model = DetrR50()

If you want to finetune from the model pretrained on COCO dataset:

from alonet.detr import DetrR50Finetune
# NUM_CLASS is the number of classes in your finetune
model = DetrR50Finetune(num_classes=NUM_CLASS, weights="detr-r50")

To run inference:

from aloscene import Frame
device = model.device # supposed that `model` is already defined as above

# read image and preprocess image with Resnet normalization
frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet()
# create a batch from a list of images
frames = aloscene.Frame.batch_list([frame])
frames = frames.to(device)

# forward pass
m_outputs = model(frames)
# get predicted boxes as aloscene.BoundingBoxes2D from forward outputs
pred_boxes = model.inference(m_outputs)
# Display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view([frame.boxes2d]).render()

Detr Base¶

End-to-End Object Detection with Transformers (DETR) model.

class alonet.detr.detr.Detr(backbone, transformer, num_classes, num_queries, background_class=None, aux_loss=True, weights=None, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, device=device(type='cpu'), strict_load_weights=True)¶

Bases: torch.nn.modules.module.Module

This is the DETR module that performs object detection

Parameters

backbonetorch.module: Torch module of the backbone to be used. See backbone.py
transformertorch.module: Torch module of the transformer architecture. See transformer.py
num_classesint: number of object classes
num_queriesint: number of object queries, ie detection slot. This is the maximal number of objects DETR can detect in a single image. For COCO, we recommend 100 queries.
background_classint, Optional: If none, the background_class will automaticly be set to be equal to the num_classes. In other word, by default, the background class will be set as the last class of the model
weightsstr, Optional: Load weights from path or support model_name, by default None
devicetorch.device, Optional: Architecture makes in a specific device, by default torch.device(“cpu”)
aux_lossbool, Optional: True if auxiliary decoding losses (loss at each decoder layer) are to be used, by default True
return_dec_outputsbool, Optional: If True, the dict output will contains a key : “dec_outputs” with the decoder outputs of shape (stage, batch, num_queries, dim), by default False
return_enc_outputsbool, Optional: If True, the dict output will contains a key : “enc_outputs” with the encoder outputs of shape (num_enc, stage, HB, WB), by default False
return_bb_outputsbool, Optional: If True, the dict output will contains a key : “bb_outputs” with the the list of the different backbone outputs, by default False
strict_load_weightsbool, Optional: Load the weights (if any given) with strict=True, by default True

INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))¶

build_backbone(backbone_name, train_backbone, return_interm_layers, dilation, aug_tensor_compatible=True)¶

Build backbone architecture

Parameters

backbone_namestr: Backbone name
train_backbonebool: Train backbone parameters if required
return_interm_layersbool: Return intermediate layers if required
dilationbool: Use dilation
aug_tensor_compatiblebool, optional: Compatibility with augmented tensors, by default True

Returns

Backbone: Architecture used to encode input images

build_bbox_embed()¶

MLP implemented to predict boxes coordinates

Returns

torch.nn: Multi-Layer perceptron with 4 neurons in last layer

build_class_embed()¶

Layer defined to class embed

Returns

torch.nn: Class embed layer

build_decoder(hidden_dim=256, num_decoder_layers=6)¶

Build decoder layer

Parameters

hidden_dimint, optional: Hidden dimension size, by default 256
num_decoder_layersint, optional: Number of decoder layers, by default 6

Returns

TransformerDecoder: Transformer decoder

build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, normalize_before=False)¶

Build decoder layer

Parameters

hidden_dimint, optional: Hidden dimension size, by default 256
dropoutfloat, optional: Dropout value, by default 0.1
nheadsint, optional: Number of heads, by default 8
dim_feedforwardint, optional: Feedfoward dimension size, by default 2048
normalize_beforebool, optional: use normalize before each layer, by default False

Returns

TransformerDecoderLayer: Transformer decoder layer

build_positional_encoding(hidden_dim=256, position_embedding='sin', center=False)¶

Build the positinal encoding layer to combine input values with respect to theirs position

Parameters

hidden_dimint, optional: Hidden dimension size, by default 256
position_embeddingstr, optional: Position encoding type, by default “sin”
centerbool, optional: Use center in position encoding, by default False

Returns

torch.nn: Default architecture to encode input with values and theirs position

Raises

NotImplementedError: v3 and learned encoding types not support yet
ValueError: Support only v2 and sine encoding types

build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, num_encoder_layers=6, num_decoder_layers=6, normalize_before=False)¶

Build transformer

Parameters

hidden_dimint, optional: Hidden dimension size, by default 256
dropoutfloat, optional: Dropout value, by default 0.1
nheadsint, optional: Number of heads, by default 8
dim_feedforwardint, optional: Feedfoward dimension size, by default 2048
num_encoder_layersint, optional: Number of encoder layers, by default 6
num_decoder_layersint, optional: Number of decoder layers, by default 6
normalize_beforebool, optional: use normalize before each layer, by default False

Returns

Transformer: Transformer module

forward(frames, **kwargs)¶

Detr model forward

Parameters

framesFrames: Images batched, of shape [batch_size x 3 x H x W] with a Mask: a binary mask of shape [batch_size x 1 x H x W], containing 1 on padded pixels

Returns

dict

It outputs a dict with the following elements:

pred_logits: The classification logits (including no-object) for all queries. Shape= [batch_size x num_queries x (num_classes + 1)]
pred_boxes: The normalized boxes coordinates for all queries, represented as (center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.
aux_outputs: Optional, only returned when auxilary losses are activated. It is a list of dictionnaries containing the two above keys for each decoder layer.
bb_outputs: Optional, only returned when backbone outputs are activated.
enc_outputs: Optional, only returned when transformer encoder outputs are activated.
dec_outputs: Optional, only returned when transformer decoder outputs are activated.

forward_class_heads(transformer_outptus)¶

Forward from transformer decoder output into class_embed layer to get class predictions

Parameters

transformer_outptusdict: Output of transformer layer

Returns

torch.Tensor: Output of shpae [batch_size x num_queries x (num_classes + 1)]

forward_heads(transformer_outptus, bb_outputs=None, **kwargs)¶

Apply Detr heads and make the final dictionnary output.

Parameters

transformer_outptusdict: Output of transformer layer
bb_outputstorch.Tensor, optional: Backbone output to append in output, by default None

Returns

dict: Output describe in forward() function

forward_position_heads(transformer_outptus)¶

Forward from transformer decoder output into bbox_embed layer to get box predictions

Parameters

transformer_outptusdict: Output of transformer layer

Returns

torch.Tensor: Output of shpae [batch_size x num_queries x 4]

get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, background_class=None, threshold=None, **kwargs)¶

Given the model outs_scores and the model outs_labels whit method return a list of filter for each output. If out_scores and outs_labels are not provided, the method will rely on the model forward outputs (m_outputs) to extract the outs_scores and the outs_labels on its own.

Parameters

outs_scorestorch.Tensor, Optional: Output score from forward(), by default None
outs_labelstorch.Tensor, Optional: Output labels from forward(), by default None
m_outputsdict, Optional: Forward outputs, by default None
background_classint, Optional: ID background class, used to filter classes, by default background_class defined in constructor
thresholdfloat, Optional: Threshold value to filter classes by score, by default not implement

Returns

filters: list: List of filter to select the query predicting an object.

get_outs_labels(m_outputs)¶

This method will return the label and class of each slot.

Parameters

m_outputs: dict: Model forward output

Returns

labels: torch.Tensor: predictec class for each slot
scores: torch.Tensor: predicted score for each slot

inference(forward_out, filters=None, background_class=None, threshold=None)¶

Given the model forward outputs, this method will return an BoundingBoxes2D tensor.

Parameters

forward_outdict: Dict with the model forward outptus
filterslist: list of torch.Tensor will a filter on which prediction to select to create the set of BoundingBoxes2D.

Returns

boxesBoundingBoxes2D: Boxes filtered and predicted by forward outputs

training: bool¶

Detr R50¶

DETR model, that use the parameters of original DETR-R50 architecture.

class alonet.detr.detr_r50.DetrR50(*args, num_classes=91, background_class=91, **kwargs)¶

Bases: alonet.detr.detr.Detr

DETR R50 as described in the paper: https://arxiv.org/abs/2005.12872

Parameters

num_classesint, optional: Neuron number in embed layer, by default 91
background_classint, optional: Id use for background class, by default 91
*argsNamespace: Positional arguments (see Detr <detr> class)
**kwargs: Dict: Additional parameters (see Detr <detr> class)

training: bool¶

alonet.detr.detr_r50.main(image_path)¶

Detr R50 Finetune¶

Module to create a custom DetrR50 model which allows to upload a decided pretrained weights and change the class_embed layer in order to train custom classes.

class alonet.detr.detr_r50_finetune.DetrR50Finetune(num_classes, background_class=None, base_weights='detr-r50', weights=None, *args, **kwargs)¶

Bases: alonet.detr.detr_r50.DetrR50

Pre made helpfull class to finetune the DetrR50 model on a custom class.

Parameters

num_classesint: Number of classes to use
background_classint, optional: Background class id, by default the last id
base_weightsstr, optional: DetrR50 weights, by default “detr-r50”
weightsstr, optional: Load weights from pth or ckpt file, by default None
*argsNamespace: Arguments used in DetrR50 module
**kwargsdict: Aditional arguments used in DetrR50 module

training: bool¶