Introduction | Notion

The main idea behind this approach is to keep everything modular, thus, simple and organized. This way it relatively simple to do any changes needed to improve compatibility.

Here is a basic diagram to better illustrate how things are organized:

As you can see there are four main blocks. Let’s break them down:

The 4 main blocks

Image sender

This node only gets the image from the source and sends it through a ROS Topic

It’s a temporary node for tests and exemplification

Inference node

mediamodifier_cropped_image(2).png

This block consists of 2 sub-blocks: Inference Manager and Inference Solution.

Inference Manager: Gets all the data required to do the inference (image, model and model info) and sends to the Inference Solution. The result from the Inference Solution is sent through multiple ROS Topics
- Why is the inference solution separated? This is only one solution, probably not the best one for now (I didn’t test solutions made just for Jetsons neither DeepStream) , and certainly not the best in the future. Doing the inference in a more modular way, it’s easier to just swap this entire sub-block.
- Why multiple ROS Topics: This way it’s possible to individualize each category of inference, making the work easier to the receiver. If there is no data, there is no topic.
Inference Solution: Gets all the data needed and returns the inference result in a specific format
- How does it work? It receives a model file containing the network weights (depending on the format, the model info can be included too), and the name of a module. Using the module name, this sub-block will automatically import the respective module. There is a need to use this modules, since not all modules have the same data format in the input, neither the structure in the output.
- What do the modules do? The modules take care of to tasks:
  - It transforms the input data: Gets the original image and do all the needed transformations to make it compatible with the model
  - It reorganizes the output data: Different models can have different outputs in different arrangements and the module reorganizes (and sometimes transforms) the data into a established structure
It’s done in PyTorch, but the modules can be done in any framework that uses tensors convertible to numpy arrays *Possible modification 1 (Unnecessary transformations)

Possible modification 1: Add a parameter for the model type (PyTorch, TorchScript, ONNX, OpenVINO, TensorRT, CoreML, TensorFlow SavedModel, TensorFlow GraphDef, TensorFlow Lite, TensorFlow Edge TPU, TensorFlow.js, PaddlePaddle or a custom one) and load the model in a module (increase compatibility)

Model

mediamodifier_cropped_image(1).png

Temporary: The model must be in TorchScript

Possible modification 1: Add a parameter for model type (PyTorch, TorchScript, ONNX, OpenVINO, TensorRT, CoreML, TensorFlow SavedModel, TensorFlow GraphDef, TensorFlow Lite, TensorFlow Edge TPU, TensorFlow.js, PaddlePaddle or a customized one) and load the model in a module (increase compatibility)

Receiver

mediamodifier_cropped_image(3).png

This is the node only gets the inference results from each ROS Topic and merge them with the original image.

It’s a temporary node for tests and exemplification