This article shows how to setup a REST API for an OCR system in five minutes.
- Goal: setup an API endpoint to which send images and get text position and characters in return
- Technology:
- A deep neural object detector that locates text in images
- A deep neural OCR model that reads detected text into a character string
For this, DeepDetect provides:
- A REST API for Deep Learning applications
- Pre-trained models that are free to use
- A simple way to chain models so that a single API call does all the work
DeepDetect setup
Let’s start a ready-to-use docker image of DeepDetect server with CPU and/or GPU support.
This is as easy as:
docker pull jolibrain/deepdetect_gpu
docker run -d -p 8080:8080 jolibrain/deepdetect_gpu
For the CPU only version, simply replace _gpu
with _cpu
in the above calls.
To monitor the logs of the server, you can do:
docker ps
# get the docker id
docker logs -f <dockerid>
That is all for running the Deep Learning REST API DeepDetect server.
Text detector setup
Loading the text detector deep model is as easy as:
curl -X PUT http://localhost:8080/services/word_detect -d '{
"description": "Word detection",
"model": {
"repository": "/opt/word_detect",
"create_repository": true,
"init":"https://deepdetect.com/models/init/desktop/images/detection/word_detect_v2.tar.gz"
},
"mllib": "caffe",
"type": "supervised",
"parameters": {
"input": {
"connector": "image"
}
}
}'
OCR model setup
Loading the pre-trained OCR model is as easy as executing the command below:
curl -X PUT http://localhost:8080/services/word_ocr -d '{
"description": "Word ocr",
"model": {
"repository": "/opt/multiword_ocr",
"create_repository": true,
"init":"https://deepdetect.com/models/init/desktop/images/ocr/multiword_ocr.tar.gz"
},
"mllib": "caffe",
"type": "supervised",
"parameters": {
"input": {
"connector": "image"
}
}
}'
Using the OCR REST API
The DeepDetect server is now ready to take queries, let’s try on the image below:
Now a typical API call looks like:
{
"chain": {
"calls": [
{
"data": [],
"parameters": {
"input": {
"connector": "image",
"keep_orig": true
},
"mllib": {
"gpu": true
},
"output": {
"bbox": true,
"confidence_threshold": 0.25
}
},
"service": "word_detect",
"data": ["https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg"]
},
{
"action": {
"parameters": {
"padding_ratio": 0.1
},
"type": "crop"
},
"id": "crop"
},
{
"parameters": {
"input": {
"connector": "image"
},
"mllib": {
"gpu": true
},
"output": {
"blank_label": 0,
"confidence_threshold": 0,
"ctc": true
}
},
"parent_id": "crop",
"service": "word_ocr"
}
],
"name": "ocr_api"
}
}
The JSON answer contains the words localization and characters:
{
"body": {
"predictions": [
{
"classes": [
{
"bbox": {
"xmax": 610.9924926757812,
"xmin": 566.7879638671875,
"ymax": 340.05706787109375,
"ymin": 316.2816162109375
},
"cat": "1",
"prob": 0.9992087483406067,
"word_ocr": {
"classes": [
{
"cat": "the",
"last": true,
"prob": 0.9995963592082262
}
]
}
},
{
"bbox": {
"xmax": 626.5315551757812,
"xmin": 551.6160888671875,
"ymax": 305.5302429199219,
"ymin": 278.2040100097656
},
"cat": "1",
"prob": 0.9990863800048828,
"word_ocr": {
"classes": [
{
"cat": "share",
"last": true,
"prob": 0.9981672442518175
}
]
}
},
{
"bbox": {
"xmax": 622.3469848632812,
"xmin": 557.6710815429688,
"ymax": 383.107177734375,
"ymin": 351.66339111328125
},
"cat": "1",
"last": true,
"prob": 0.9974628686904907,
"word_ocr": {
"classes": [
{
"cat": "road",
"last": true,
"prob": 0.9990777596831322
}
]
}
}
],
"uri": "https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg"
}
]
},
"head": {
"method": "/chain",
"time": 4302.0
},
"status": {
"code": 200,
"msg": "OK"
}
}
Simply replace https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg
with the URL of your choice.
This should have been no more than five minutes by now !