{ "cells": [ { "cell_type": "markdown", "source": [ "# Finetuning Deformanble DETR\n", "\n", "This tutorial explains how to use the classes in [Deformable DETR Finetune] module to train a custom model based on [Deformable DetrR50 architecture] for object detection application.\n", "\n", "
\n", "\n", "**Goals**\n", "\n", "1. Train a model based on [Deformable DetrR50 architecture] to predict faces, using [Mask Wearing Dataset].\n", "2. Use the trained model to make inferences.\n", "\n", "
\n", "\n", "
\n", "\n", "**Attention**\n", "\n", "For this tutorial, [Mask Wearing Dataset] is a requirement. We assume that the dataset was downloaded and saved in a folder. For the purposes of this tutorial, we will call such path as **/paht/mask/wearing**.\n", "\n", "
\n", "\n", "[Deformable DetrR50 architecture]: https://arxiv.org/abs/2010.04159\n", "[Mask Wearing Dataset]: https://public.roboflow.com/object-detection/mask-wearing\n", "[Deformable DETR Finetune]: ../alonet/deformable_models.rst#deformable-detr-r50-finetune" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## 1. Train Deformable DETR50 Finetune\n", "\n", "[Deformable DETR Finetune][Deformable DETR R50 Finetune] contains two classes that differ only in the use of **Iterative Bounding Box\n", "Refinement**:\n", "\n", "- [Deformable DETR R50 Finetune]\n", "- [Deformable DETR R50 Finetune with refinement]\n", "\n", "They are able to fix the number of classes of the last embedded layer to a desired value, **without change the weights in previous layers.**\n", "\n", "
\n", " \n", "**See also**\n", "\n", "* Deformable [DetrR50 architecture] for information about the Deformable DetrR50 architecture and Iterative Bounding Box Refinement improvements. \n", "* [Funetunig torch vision models](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)\n", "to learn more about finetuning. \n", "* [Models] to know all possible configurations of the model.\n", "\n", "
\n", "\n", "
\n", "\n", "**Note**\n", "\n", "This tutorial is based on [Deformable DETR R50 Finetune with refinement].\n", "However, the same steps can be used for [Deformable DETR R50 Finetune].\n", "\n", "
\n", "\n", "[Deformable DetrR50 architecture]: https://arxiv.org/abs/2010.04159\n", "[DetrR50 architecture]: https://arxiv.org/abs/2005.12872\n", "[Deformable DETR R50 Finetune]: ../alonet/deformable_models.rst#deformable-detr-r50-finetune\n", "[Deformable DETR R50 Finetune with refinement]: ../alonet/deformable_models.rst#deformable-detr-r50-finetune-with-refinement\n", "[Models]: ../alonet/deformable_models.rst" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "For training purposes, it is usual in [Aloception] to define a model on [Pytorch lightning module]. With a finetune model, the architecture definition changes, but the training process remains static:\n", "\n", "[Pytorch lightning module]: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html\n", "[Aloception]: ../index.rst" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "from argparse import Namespace, ArgumentParser\n", "import alonet\n", "from alonet.detr import CocoDetection2Detr\n", "from alonet.deformable_detr import LitDeformableDetr, DeformableDetrR50RefinementFinetune \n", "\n", "# Build parser\n", "parser = ArgumentParser()\n", "parser = alonet.common.add_argparse_args(parser) # Common alonet parser\n", "args = parser.parse_args([])\n", "args.no_suffix = True # Fix run_id = expe_name\n", "\n", "# Setup database\n", "coco_loader = CocoDetection2Detr(\n", " name = \"people_mask\",\n", " train_folder = \"train\",\n", " train_ann = \"train/_annotations.coco.json\",\n", " val_folder = \"valid\",\n", " val_ann = \"valid/_annotations.coco.json\",\n", ")\n", "num_classes = len(coco_loader.CATEGORIES)\n", "\n", "# Architecture definition\n", "deformabe_finetune = DeformableDetrR50RefinementFinetune(\n", " num_classes = num_classes, \n", " weights = \"deformable-detr-r50-refinement\",\n", " activation_fn = \"softmax\"\n", ")\n", "lit_deformable = LitDeformableDetr(model = deformabe_finetune)\n", "\n", "# Start train loop\n", "args.max_epochs = 5 # Due to finetune, we just need 5 epochs to train this model\n", "args.save = True\n", "lit_deformable.run_train(\n", " data_loader = coco_loader,\n", " args = args,\n", " project = \"deformable_detr\",\n", " expe_name = \"people_mask\"\n", ")" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "
\n", "\n", "**Note**\n", "\n", "[CocoDetection2Detr] definition will deploy an user prompt task to know where was storaged [Mask Wearing Dataset]. Here, the user must input */paht/mask/wearing*. Check [How to setup your data] tutorial to know more about that.\n", "\n", "
\n", "\n", "Once the process has been completed, the \\$HOME/.aloception/project_run_id/run_id folder folder will be created\n", "with the different checkpoint files.\n", "\n", "
\n", "\n", "**Hint**\n", "\n", "Check the following links to get more about:\n", "\n", "- [Pytorch lightning data modules]\n", "- [Pytorch lightning module]\n", "- [How to setup your data]\n", "- [Train a Deformanble model]\n", "- [Train a DetrR50 finetune model]\n", "\n", "
\n", "\n", "[Mask Wearing Dataset]: https://public.roboflow.com/object-detection/mask-wearing\n", "[Pytorch lightning module]: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html\n", "[CocoDetection2Detr]: ../alonet/detr_connectors.rst#cocodetection2detr\n", "[Pytorch lightning data modules]: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html\n", "[How to setup your data]: ./data_setup.rst\n", "[Train a Deformanble model]: training_deformable_detr.ipynb\n", "[Train a DetrR50 finetune model]: finetuning_detr.ipynb" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## 2. Make inferences\n", "\n", "In order to make some inferences on the dataset using the trained model, we need to load the weights. For that, we can use one function in [Alonet](../alonet/alonet.rst) for this purpose. Also, we need to keep in mind **the project and run id that we used in training process**:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "import torch\n", "from alonet.common import load_training \n", "\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "\n", "# Define the architecture\n", "detr_finetune = DeformableDetrR50RefinementFinetune(num_classes, activation_fn = \"softmax\")\n", "\n", "# Load weights according project_run_id and run_id\n", "args = Namespace(\n", " project_run_id = \"deformable_detr\",\n", " run_id = \"people_mask\",\n", ")\n", "lit_deformable = load_training(\n", " LitDeformableDetr, \n", " args = args, \n", " model = detr_finetune)\n", "lit_deformable.model.to(device)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "This enables to use the valid dataset and show some results:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "frames = next(iter(coco_loader.val_dataloader()))\n", "frames = frames[0].batch_list(frames).to(device)\n", "pred_boxes = lit_deformable.inference(lit_deformable(frames))[0] # Inference from forward result\n", "gt_boxes = frames[0].boxes2d # Get ground truth boxes\n", "\n", "frames.get_view([\n", " gt_boxes.get_view(frames[0], title=\"Ground truth boxes\"),\n", " pred_boxes.get_view(frames[0], title=\"Predicted boxes\"),\n", "]).render()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "
\n", "\n", "**What is next ?**\n", "\n", "Learn how to export **[DETR/Deformable DETR models to tensorRT]**.\n", "\n", "
\n", "\n", "[DETR/Deformable DETR models to tensorRT]: tensort_inference.rst" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## 3. Optional: Make prediction in camera\n", "\n", "If there is access to a local camera, the following code would allow you to take snapshots with the camera and make predictions at the same time:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "%matplotlib inline\n", "import cv2\n", "from aloscene import Frame\n", "\n", "def frame_process(frame): \n", " frame = Frame(frame[...,::-1].copy(), names = (\"H\", \"W\", \"C\"))\n", " frame = frame.transpose(0,2).transpose(1,2).to(device)\n", " frame = frame.batch_list([frame])\n", " return transform(frame)\n", "\n", "def transform(frame):\n", " return frame.norm_resnet()\n", "\n", "cap = cv2.VideoCapture(0)\n", "if cap.isOpened():\n", " ret, frame = cap.read()\n", " \n", " # Image preprocessing and make predictions\n", " frame = frame_process(frame).to(device)\n", " pred_boxes = lit_deformable.inference(lit_deformable(frame))\n", "\n", " # Show result \n", " frame.get_view([\n", " pred_boxes[0].get_view(frame[0], title=\"Predicted boxes\")\n", " ], size=(500,700)).render()\n", " \n", " # Close camera\n", " cap.release()\n", " cv2.destroyAllWindows()\n", "else:\n", " print(\"[ERROR] Impossible to open camera\")" ], "outputs": [], "metadata": {} } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 5 }