Week 09 | Notion

Meeting date: 13-04-2023 (10h30)

Summary

This week I tried to find the yolov5 paper, but, as mentioned by one of the authors in ‣, it isn’t available yet and there is only a DOI to cite the official repo directory → https://doi.org/10.5281/zenodo.7347926. Since I had no luck with yolov5, I decided to reread the papers about multi-task neural networks that I read in ADPE and I looked for online resources (easier to understand) and more papers (to avoid arxiv) in order to get more knowledge to write the sota and analyze yolov5.

To get the diagrams for both yolov5-detection and yolov5-segmentation, I did a quick google search and got the following:

Detection

The-network-architecture-of-Yolov5-It-consists-of-three-parts-1-Backbone-CSPDarknet.jpg

Segmentation (adaptation - I didn’t find much information for segmentation)

With a simple overview it’s possible to see that the detection have only one block at the end (”Head: Yolo Layer”) and the segmentation have two blocks (”Detect” and “Seg-Lab v3+”) which almost made me conclude that segmentation is multi-tasked and detection isn’t.

In one of the papers, there is the definition of task:

https://doi.org/10.1007/s11042-018-6463-x “A task is generally referred to the learning of an output target using a single input source. If the input source consists of a single variable (or feature), we will have a univariate analysis, if the input source consists of multiple variables (or features), we will have a multivariate analysis. In this sense, “multiple tasks” could mean the learning of multiple output targets using a single input source, or the learning of single output target using multiple input sources, or a mixture of both. Depending on the definition of “multiple tasks”, the multi-task learning (MTL) could have different objective functions, as we will demonstrate in the following subsections.”

I may be completely wrong, but I'm beginning to think that yolo networks are multi-tasked: In Yolov5-detection there is only one input (image) and there are 3 outputs (classes, boxes and objects) that are inside of the “Head: Yolo Layer” box. To “better” visualize the models I created a simple python script to generate model graphs through TensorBoard

Detection

Segmentation

In the superior region of each graph is a structure that reminds the multi-headed structure of a multi-tasked network

In addiction to that, I found an example of a practical application of a multi-tasked network in PyTorch here, where the total loss is obtain from the sum of the losses of all tasks. Yolov5 have a similar approach:

names of variables

sum of losses

This week I also started to create some sections in the dissertation and I measured the FPS of some more models for Pedro Azevedo. In terms of software, I could not advance anything else.