November 25, 2020

Deep Dive Into Open Images Dataset: A Large Scale Visual Dataset With Annotations And Bounding Boxes By Google

ver the years, Computer Vision has dramatically evolved, from just image classification to image segmentation and localization, providing bounding boxes around the detected objects with proper annotation and labels. Now it’s taken way forward to visual to speech and text conversions. These advances have inspired many deep learning models to be built for predictions.

Developed by Google in collaboration with CMU and Cornell Universities, Open Images Dataset has set a benchmark for visual recognition. Open Images contains nearly 9 million images with annotations and bounding boxes, image segmentation, relationships among objects and localized narratives. The dataset contains over 600 categories. From version 6 annotations are provided by Google Cloud Vision API. Humans have manually verified these automated labels, and then developers have tried to remove the false positives. Before this, annotations had been provided by professionals for consistency and accuracy. Many of these images contain complex visual scenes which include multiple labels.

As per version 4, Tensorflow API training dataset contains 1.7M images out of which 14.6M bounding boxes in images for 600 different classes. Validation set contains 41,620 images, and the test set includes 125,436 images.

There are six versions of Open Images until now.

V1- Released in 2016, Pretrained Inception V2 model trained on the dataset and released. Annotations were generated using Google’s BigQuery. Later inception v3 model was trained and fine-tuned on applications such as DeepDream.

V2 – Released in 2017, ResNet 101 image classification model was generated. Updated 2M bounding boxes images on 600 object classes and 4.3M images that were manually-verified labels on the training set. Common Visual Data Foundation(CVDP) provided a data visualizer on this data.

V3 – Released in 2017, inception ResNet v2 object detection model was trained on it and made a part of Tensorflow Object Detection API. They were updated with 3.7M bounding-boxes on images and 9.7M positive image-level labels on the training set. The dataset could be downloaded from Common Visual Data Foundation(CVDP).

V4- Released in 2018, Google AI had held a competition for automatic object detection and visual relationship tracks.

Download size- 565.11 GiB.

Code Snippet(With TensorFlowAPI)

import tensorflow_datasets.object_detection as tfds
train,test = tfds.load('openimagesv4', split=['train', 'test'])

There are three variants- Original pixels and quality

200,000 pixels, at 72 JPEG quality and

300,000 pixels, at 72 JPEG quality.

V5 – Released in 2019, 15.8M bounding boxes and 391k visual relationships. This version introduced the image segmentation masks in 2.7M images over 350 categories. With this version also, an Open Images Challenge for object detection was held.

Download Size: 535.63 GiB

Code Snippet(With TensorFlowAPI)

import tensorflow_datasets.object_detection as tfds
train,test = tfds.load('openimageschallenge2019detection', split=['train', 'test'])

There are 2 variants- 200,000 pixels, at 72 JPEG quality and

300,000 pixels, at 72 JPEG quality.

V6 – Released in 2020, this version introduced the localization narrative to 500K images. Along with these 123K images from COCO dataset were also provided localization narratives. Updated 23.5M new manually-verified labels, that makes a total of 59.9M images in 20,000 categories.

Google has made an official website for open images visualizer, download, documentation, challenges, news and other related information.

Leave a Reply

Your email address will not be published. Required fields are marked *