Setting up an image object detector

This tutorial sets an image object detector that will distinguish among 21 objects. The detector returns a bounding box for every detected object, centered around it along with a label, e.g. person, car, … This tutorial uses a pre-trained deep neural net on the VOC task.

A few examples:

Drawing Drawing Drawing Drawing

The detecting service allows for an application to send images and to receive the set of bounding boxes per image in return, in JSON format.

The following presupposes that DeepDetect has been built & installed.

Getting the pre-trained model

cd deepdetect
mkdir models
cd models
mkdir voc0712
cd voc0712
tar xvzf voc01712_dd.tar.gz

This prepares the model directory.

Setting up the detector service

Let’s start the DeepDetect server:

cd deepdetect/build/main
$ ./dede

and create a service:

curl -X PUT "http://localhost:8080/services/imageserv" -d '{
       "description":"object detection service",
	   "height": 300,
	   "width": 300

This should yield:


Testing object detection

We can now pass any image filepath or URL to our object detector, here is an example:

curl -X POST "http://localhost:8080/predict" -d '{
           "bbox": true,
	   "confidence_threshold": 0.1


{u'status': {u'msg': u'OK', u'code': 200}, u'body': {u'predictions': [{u'classes': [{u'cat': u'bird', u'prob': 0.8333460688591003, u'bbox': {u'xmin': 67.03402709960938, u'ymin': 414.25286865234375, u'ymax': 64.85651397705078, u'xmax': 354.663330078125}}, {u'cat': u'person', u'prob': 0.5956286191940308, u'bbox': {u'xmin': 75.99663543701172, u'ymin': 475.9880676269531, u'ymax': 66.72187805175781, u'xmax': 363.94293212890625}}, {u'cat': u'person', u'prob': 0.2928898334503174, u'bbox': {u'xmin': 495.8335876464844, u'ymin': 735.4041748046875, u'ymax': 506.434326171875, u'xmax': 652.080078125}}, {u'cat': u'person', u'prob': 0.24435117840766907, u'bbox': {u'xmin': 437.17041015625, u'ymin': 540.1434936523438, u'ymax': 111.70045471191406, u'xmax': 633.19970703125}}, {u'cat': u'bird', u'prob': 0.16601955890655518, u'bbox': {u'xmin': 40.96523666381836, u'ymin': 280.6235046386719, u'ymax': 71.90843200683594, u'xmax': 259.4865417480469}}, {u'cat': u'person', u'prob': 0.12583601474761963, u'bbox': {u'xmin': 358.8877868652344, u'ymin': 763.8483276367188, u'ymax': 532.8911743164062, u'xmax': 491.5361022949219}}, {u'cat': u'person', u'last': True, u'prob': 0.11492644995450974, u'bbox': {u'xmin': 213.4755096435547, u'ymin': 793.69287109375, u'ymax': 545.5011596679688, u'xmax': 355.6097717285156}}], u'uri': u''}]}, u'head': {u'method': u'/predict', u'service': u'imgserv', u'time': 1903.0}}

The resulting JSON contains:

  • bounding boxes as bbox JSON objects
  • the estimated category cat of the object
  • the confidence of the detection as a probability prob, the higher the better

Note that confidence_threshold allows to remove any prediction that has a prob strictly below the threshold.

You can look at the object detection Python script to generate the bounding boxes:


DeepDetect documentation