Models¶

Panoptic Head is a pytorch module that implements a network to connect with the output of a Detr-based model. This new module is able to predict a segmentation features, represented by a binary mask for each object predicted by Detr model.

Block diagram of panoptic head model, taken from End-to-End Object Detection with Transformers paper¶

Basic usage¶

Given that detr_panoptic implements the Panoptic Head for a Detr-based models, first the module have to implement and be passed as PanopticHead parameter:

from alonet.detr import DetrR50
from alonet.detr_panoptic import PanopticHead

detr_model = DetrR50()
model = PanopticHead(DETR_module=detr_model)

If you want to finetune from the model pretrained on COCO dataset, a DetrFinetune models must be used:

from alonet.detr import DetrR50Finetune
from alonet.detr_panoptic import PanopticHead

detr_model = DetrR50Finetune(num_classes=250)
model = PanopticHead(DETR_module=detr_model)

To run an inference:

from aloscene import Frame
device = model.device # supposed that `model` is already defined as above

# read image and preprocess image with Resnet normalization
frame = Frame(IMAGE_PATH).norm_resnet()
# create a batch from a list of images
frames = aloscene.Frame.batch_list([frame])
frames = frames.to(device)

# forward pass
m_outputs = model(frames)
# get boxes and MASK as aloscene.BoundingBoxes2D and aloscene.Mask from forward outputs
pred_boxes, pred_masks = model.inference(m_outputs)

# Display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.append_segmentation(pred_masks[0], "pred_masks")
frame.get_view([frame.boxes2d, frame.segmentation]).render()

Important

PanopticHead network is able to predict the segmentation masks, follow by each box predicted for the Detr-based models. Is for this reason that inference function return a new output: pred_masks.

Panoptic head API¶

Panoptic module to use in object detection/segmentation tasks.

class alonet.detr_panoptic.detr_panoptic.PanopticHead(DETR_module, freeze_detr=True, aux_loss=None, device=device(type='cpu'), weights=None, strict_load_weights=True)¶

Bases: torch.nn.modules.module.Module

Pytorch head module to predict segmentation masks from previous boxes detection task.

Parameters

DETR_modulealonet.detr.detr: Object detection module based on DETR architecture
freeze_detrbool, optional: Freeze DETR_module weights in train procedure, by default True
aux_loss: bool, optional: Return aux outputs in forward step (if required), by default use DETR_module.aux_loss attribute value
devicetorch.device, optional: Configure module in CPU or GPU, by default torch.device("cpu")
weightsstr, optional: Load weights from name project, by default None
strict_load_weightsbool: Load the weights (if any given) with strict = True (by default).

INPUT_MEAN_STD = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))¶

forward(frames, get_filter_fn=None, **kwargs)¶

PanopticHead forward, that joint to the previous boxes predictions the new masks feature.

Parameters

framesFrames: Input frame to network
get_filter_fnCallable: Function that returns two parameters: the dec_outputs tensor filtered by a boolean mask per batch. It is expected that the function will at least receive frames and m_outputs parameters as input. By default the function used to this purpuse is get_outs_filter() from based model.

Returns

dict

It outputs a dict with the following elements:

pred_logits: The classification logits (including no-object) for all queries. Shape = [batch_size x num_queries x (num_classes + 1)]
pred_boxes: The normalized boxes coordinates for all queries, represented as (center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.
pred_masks: Binary masks, each one to assign to predicted boxes. Shape = [batch_size x num_queries x H // 4 x W // 4]
bb_outputs: Backbone outputs, requered in this forward
enc_outputs: Transformer encoder outputs, requered on this forward
dec_outputs: Transformer decoder outputs, requered on this forward
pred_masks_info: Parameters to use in inference procedure
aux_outputs: Optional, only returned when auxilary losses are activated. It is a list of dictionnaries containing the two above keys for each decoder layer.

inference(forward_out, maskth=0.5, filters=None, **kwargs)¶

Given the model forward outputs, this method will return a set of BoundingBoxes2D and Mask, with its corresponding Labels per object detected.

Parameters

forward_outdict: Dict with the model forward outputs
maskthfloat, optional: Threshold value to binarize the masks, by default 0.5
filterslist, optional: List of filter to select the query predicting an object, by default None

Returns

BoundingBoxes2D: Boxes from DETR model
Mask: Binary masks from PanopticHead, one for each box.

training: bool¶

alonet.detr_panoptic.detr_panoptic.main(image_path)¶

DetrR50 Panoptic Finetune¶

Module to create a custom PanopticHead model using DetrR50 as based model, which allows to upload a decided pretrained weights and change the number of outputs in class_embed layer, in order to train custom classes.

class alonet.detr_panoptic.detr_r50_panoptic_finetune.DetrR50PanopticFinetune(num_classes, background_class=None, base_model=None, base_weights='detr-r50-panoptic', freeze_detr=False, weights=None, *args, **kwargs)¶

Bases: alonet.detr_panoptic.detr_panoptic.PanopticHead

Pre made helpfull class to finetune the DetrR50 and use a pretrained PanopticHead.

Parameters

num_classesint: Number of classes in the class_embed output layer
background_classint, optional: Background class, by default None
base_modeltorch.nn, optional: Base model to couple PanopticHead, by default DetrR50
base_weightsstr, optional: Load weights from original DetrR50 + PanopticHead, by default “detr-r50-panoptic”
freeze_detrbool, optional: Freeze DetrR50 weights, by default False
weightsstr, optional: Weights for finetune model, by default None

Raises

ValueError: weights must be a ‘.pth’ or ‘.ckpt’ file

training: bool¶