Models

Basic usage

To instantiate a DETR R50 (resnet50 backbone):

from alonet.detr import DetrR50
model = DetrR50()

If you want to finetune from the model pretrained on COCO dataset:

from alonet.detr import DetrR50Finetune
# NUM_CLASS is the number of classes in your finetune
model = DetrR50Finetune(num_classes=NUM_CLASS, weights="detr-r50")

To run inference:

from aloscene import Frame
device = model.device # supposed that `model` is already defined as above

# read image and preprocess image with Resnet normalization
frame = aloscene.Frame(PATH_TO_IMAGE).norm_resnet()
# create a batch from a list of images
frames = aloscene.Frame.batch_list([frame])
frames = frames.to(device)

# forward pass
m_outputs = model(frames)
# get predicted boxes as aloscene.BoundingBoxes2D from forward outputs
pred_boxes = model.inference(m_outputs)
# Display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view([frame.boxes2d]).render()

Detr Base

End-to-End Object Detection with Transformers (DETR) model.

class alonet.detr.detr.Detr(backbone, transformer, num_classes, num_queries, background_class=None, aux_loss=True, weights=None, return_dec_outputs=False, return_enc_outputs=False, return_bb_outputs=False, device=device(type='cpu'), strict_load_weights=True)

Bases: torch.nn.modules.module.Module

This is the DETR module that performs object detection

Parameters
backbonetorch.module

Torch module of the backbone to be used. See backbone.py

transformertorch.module

Torch module of the transformer architecture. See transformer.py

num_classesint

number of object classes

num_queriesint

number of object queries, ie detection slot. This is the maximal number of objects DETR can detect in a single image. For COCO, we recommend 100 queries.

background_classint, Optional

If none, the background_class will automaticly be set to be equal to the num_classes. In other word, by default, the background class will be set as the last class of the model

weightsstr, Optional

Load weights from path or support model_name, by default None

devicetorch.device, Optional

Architecture makes in a specific device, by default torch.device(“cpu”)

aux_lossbool, Optional

True if auxiliary decoding losses (loss at each decoder layer) are to be used, by default True

return_dec_outputsbool, Optional

If True, the dict output will contains a key : “dec_outputs” with the decoder outputs of shape (stage, batch, num_queries, dim), by default False

return_enc_outputsbool, Optional

If True, the dict output will contains a key : “enc_outputs” with the encoder outputs of shape (num_enc, stage, HB, WB), by default False

return_bb_outputsbool, Optional

If True, the dict output will contains a key : “bb_outputs” with the the list of the different backbone outputs, by default False

strict_load_weightsbool, Optional

Load the weights (if any given) with strict=True, by default True

INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
build_backbone(backbone_name, train_backbone, return_interm_layers, dilation, aug_tensor_compatible=True)

Build backbone architecture

Parameters
backbone_namestr

Backbone name

train_backbonebool

Train backbone parameters if required

return_interm_layersbool

Return intermediate layers if required

dilationbool

Use dilation

aug_tensor_compatiblebool, optional

Compatibility with augmented tensors, by default True

Returns
Backbone

Architecture used to encode input images

build_bbox_embed()

MLP implemented to predict boxes coordinates

Returns
torch.nn

Multi-Layer perceptron with 4 neurons in last layer

build_class_embed()

Layer defined to class embed

Returns
torch.nn

Class embed layer

build_decoder(hidden_dim=256, num_decoder_layers=6)

Build decoder layer

Parameters
hidden_dimint, optional

Hidden dimension size, by default 256

num_decoder_layersint, optional

Number of decoder layers, by default 6

Returns
TransformerDecoder

Transformer decoder

build_decoder_layer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, normalize_before=False)

Build decoder layer

Parameters
hidden_dimint, optional

Hidden dimension size, by default 256

dropoutfloat, optional

Dropout value, by default 0.1

nheadsint, optional

Number of heads, by default 8

dim_feedforwardint, optional

Feedfoward dimension size, by default 2048

normalize_beforebool, optional

use normalize before each layer, by default False

Returns
TransformerDecoderLayer

Transformer decoder layer

build_positional_encoding(hidden_dim=256, position_embedding='sin', center=False)

Build the positinal encoding layer to combine input values with respect to theirs position

Parameters
hidden_dimint, optional

Hidden dimension size, by default 256

position_embeddingstr, optional

Position encoding type, by default “sin”

centerbool, optional

Use center in position encoding, by default False

Returns
torch.nn

Default architecture to encode input with values and theirs position

Raises
NotImplementedError

v3 and learned encoding types not support yet

ValueError

Support only v2 and sine encoding types

build_transformer(hidden_dim=256, dropout=0.1, nheads=8, dim_feedforward=2048, num_encoder_layers=6, num_decoder_layers=6, normalize_before=False)

Build transformer

Parameters
hidden_dimint, optional

Hidden dimension size, by default 256

dropoutfloat, optional

Dropout value, by default 0.1

nheadsint, optional

Number of heads, by default 8

dim_feedforwardint, optional

Feedfoward dimension size, by default 2048

num_encoder_layersint, optional

Number of encoder layers, by default 6

num_decoder_layersint, optional

Number of decoder layers, by default 6

normalize_beforebool, optional

use normalize before each layer, by default False

Returns
Transformer

Transformer module

forward(frames, **kwargs)

Detr model forward

Parameters
framesFrames

Images batched, of shape [batch_size x 3 x H x W] with a Mask: a binary mask of shape [batch_size x 1 x H x W], containing 1 on padded pixels

Returns
dict

It outputs a dict with the following elements:

  • pred_logits: The classification logits (including no-object) for all queries. Shape= [batch_size x num_queries x (num_classes + 1)]

  • pred_boxes: The normalized boxes coordinates for all queries, represented as (center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.

  • aux_outputs: Optional, only returned when auxilary losses are activated. It is a list of dictionnaries containing the two above keys for each decoder layer.

  • bb_outputs: Optional, only returned when backbone outputs are activated.

  • enc_outputs: Optional, only returned when transformer encoder outputs are activated.

  • dec_outputs: Optional, only returned when transformer decoder outputs are activated.

forward_class_heads(transformer_outptus)

Forward from transformer decoder output into class_embed layer to get class predictions

Parameters
transformer_outptusdict

Output of transformer layer

Returns
torch.Tensor

Output of shpae [batch_size x num_queries x (num_classes + 1)]

forward_heads(transformer_outptus, bb_outputs=None, **kwargs)

Apply Detr heads and make the final dictionnary output.

Parameters
transformer_outptusdict

Output of transformer layer

bb_outputstorch.Tensor, optional

Backbone output to append in output, by default None

Returns
dict

Output describe in forward() function

forward_position_heads(transformer_outptus)

Forward from transformer decoder output into bbox_embed layer to get box predictions

Parameters
transformer_outptusdict

Output of transformer layer

Returns
torch.Tensor

Output of shpae [batch_size x num_queries x 4]

get_outs_filter(outs_scores=None, outs_labels=None, m_outputs=None, background_class=None, threshold=None, **kwargs)

Given the model outs_scores and the model outs_labels whit method return a list of filter for each output. If out_scores and outs_labels are not provided, the method will rely on the model forward outputs (m_outputs) to extract the outs_scores and the outs_labels on its own.

Parameters
outs_scorestorch.Tensor, Optional

Output score from forward(), by default None

outs_labelstorch.Tensor, Optional

Output labels from forward(), by default None

m_outputsdict, Optional

Forward outputs, by default None

background_classint, Optional

ID background class, used to filter classes, by default background_class defined in constructor

thresholdfloat, Optional

Threshold value to filter classes by score, by default not implement

Returns
filters: list

List of filter to select the query predicting an object.

get_outs_labels(m_outputs)

This method will return the label and class of each slot.

Parameters
m_outputs: dict

Model forward output

Returns
labels: torch.Tensor

predictec class for each slot

scores: torch.Tensor

predicted score for each slot

inference(forward_out, filters=None, background_class=None, threshold=None)

Given the model forward outputs, this method will return an BoundingBoxes2D tensor.

Parameters
forward_outdict

Dict with the model forward outptus

filterslist

list of torch.Tensor will a filter on which prediction to select to create the set of BoundingBoxes2D.

Returns
boxesBoundingBoxes2D

Boxes filtered and predicted by forward outputs

training: bool

Detr R50

DETR model, that use the parameters of original DETR-R50 architecture.

class alonet.detr.detr_r50.DetrR50(*args, num_classes=91, background_class=91, **kwargs)

Bases: alonet.detr.detr.Detr

DETR R50 as described in the paper: https://arxiv.org/abs/2005.12872

Parameters
num_classesint, optional

Neuron number in embed layer, by default 91

background_classint, optional

Id use for background class, by default 91

*argsNamespace

Positional arguments (see Detr <detr> class)

**kwargs: Dict

Additional parameters (see Detr <detr> class)

training: bool
alonet.detr.detr_r50.main(image_path)

Detr R50 Finetune

Module to create a custom DetrR50 model which allows to upload a decided pretrained weights and change the class_embed layer in order to train custom classes.

class alonet.detr.detr_r50_finetune.DetrR50Finetune(num_classes, background_class=None, base_weights='detr-r50', weights=None, *args, **kwargs)

Bases: alonet.detr.detr_r50.DetrR50

Pre made helpfull class to finetune the DetrR50 model on a custom class.

Parameters
num_classesint

Number of classes to use

background_classint, optional

Background class id, by default the last id

base_weightsstr, optional

DetrR50 weights, by default “detr-r50”

weightsstr, optional

Load weights from pth or ckpt file, by default None

*argsNamespace

Arguments used in DetrR50 module

**kwargsdict

Aditional arguments used in DetrR50 module

training: bool