Annotation of image and creating our own Mask-RCNN Model.

Shailja Tripathi
7 min readJul 14, 2020

--

Precautions are always preferred by the human race to survive the atmosphere they are in and as we all know the recent tragedy for this world is COVID-19 pandemic. For the ongoing pandemic, one of the best precautions is to wear a mask bt it’s a human tendency that we tend to follow rules until and unless we have someone to govern it and in this world it’s very hard to monitor each and every individual for this. Hence we need the support of technology for this.

In this project I have trained my own model for detecting masks on the face of the people on top of one of the advanced algorithm, that is Mask-RCNN. Mask-RCNN doesn’t have in build capacity to detect mask. So I will use the power of supervisely, which is the first available ecosystem to cover all aspects of training data development.

What is R-CNN?

Object detection is the process of finding and classifying objects in an image. One deep learning approach, regions with convolutional neural networks (R-CNN), combines rectangular region proposals with convolutional neural network features. R-CNN is a two-stage detection algorithm. The first stage identifies a subset of regions in an image that might contain an object. The second stage classifies the object in each region.

Stages in R-CNN

There are two stages of Mask R-CNN.

  1. First, it generates proposals about the regions where there might be an object based on the input image.
  2. Second, it predicts the class of the object, refines the bounding box and generates a mask in pixel level of the object based on the first stage proposal. Both stages are connected to the backbone structure.

Applications for R-CNN object detectors include:

  • Autonomous driving
  • Smart surveillance systems
  • Facial recognition

What is Mask R-CNN ?

Mask R-CNN has been the new state of art in terms of instance segmentation. Mask R-CNN is a deep neural network aimed to solve instance segmentation problem in machine learning or computer vision. In other words, it can separate different objects in a image or a video. We give an image, it gives us the object bounding boxes, classes and masks.

What is Supervisely ?

Supervisely is a web platform where we can find everything we need to build Deep Learning solutions within a single environment. This platform covers the entire R&D lifecycle for computer vision. It allows us to interact from image annotation to neural networks training 10x faster.

Benefits of using supervisely :

  1. Organize image annotation / data management / manipulation within a single platform at scale.
  2. Integrate custom NNs or user pretrained models from Model Zoo, perform / track / reproduce tons of experiments.
  3. Use data science workflows out of the box: upload new data and continuously improve the accuracy of your neural networks.
  4. Combine different neural networks together into a single pipeline with post-processing stages and deploy these pipelines as API.
  5. Utilize NNs to speed up image annotation process: the platform has trainable SmartTool, supports Active Learning and Human in the Loop.

Prerequisites:

1. Account on Supervisely.

2. Dataset for training the model. To get the dataset use kaggle.

3. Account on AWS cloud.

Starting with the project

Firstly login to supervisely.

This is the dataset that I have downloaded from Kaggle.

Go to one of the team and you will see this kind of setup,

Now go to the project bar,

Upload the dataset folder

click on the image dataset uploaded and then you have to annotate or select the region where we can see mask in the image. We need to do this for all images. This is called annotation of image through which the model will come to know which object we need to detect. For this we will use polygon object selector.

Click on the options button on the dataset there we can see run DTL in that choose From scratch, after doing this we will see the other dataset come up

After doing this we have to train the model, supervisely will bring its own agent or device kind of service which means it will provide us all the facility to run but we have to give them resources to train the model, hence we need to give the agent machine. For this go to cluster tab and add the agent.

Now to get those resources we can use AWS Cloud, first we have to request to extend the limit for this go to limit bar in Mumbai region.

Then select Running P2 Dedicated Hosts

Then click on the Request limit increase.

Select the following configuration.

Within one or two days you will get the confirmation mail regarding the issue we raised for increasing the limit. After that launch the instance.

Launch new Instance.

Search for deep learning and choose the highlighted AMI.

From the Instance type tab choose p2.xlarge.

After finally launching the instance, open the CMD and navigate where Key is present and run the command given in this pop-up, you just have to run the ssh command.

Get the root permission to run the agent code, for this go to supervisely and copy the command which it gave to run on agent, paste it and run it.

Go to supervisely and go to Neural networks tab and select Mask-RCNN and then press train and then select the training set which we annotated and press run. This will train the model and finally it will create the model.

We will see the tested images and if we click on it, we will see that all the images with mask are detected with very precise pixel marked.

The complete workflow of the machine learning training.

This model can help many companies and government to make their CCTV cameras Artificial Intelligence enabled to detect violence.

Thanks For Reading!

--

--