Tesseract Character recognition without training model

Using Tesseract recognize character without training our own models

Marshal SHI
10 min readApr 12, 2022
Photo by Markus Spiske on Unsplash

Background

Today, with the vast improvements in machine learning, character extraction and recognition from images is much simpler than before, thanks to well developed deep learning algorithms such as CNN, LSTM, etc. Before the advent of these sophisticated machine learning algorithms, one had to use template matching to match every character image with predefined templates. Template matching required us to have a well defined cropped character image — however, cropping the image to conformity was difficult.

Thus, finding a good algorithm for cropping characters and preprocessing images to conform to the requirements was time consuming.

Deep learning is one of the most powerful tools to perform image recognition. There are many libraries of trained models based on deep learning. For instance, Yolo is popular for object recognition. But if we want to use Yolo to create a bounding box for characters when doing character recognition, we have to create and train our own model, or additionally fine tune an existing model. In such cases, the most time-consuming parts are collecting datasets and training the model itself.

--

--

Marshal SHI

Robots make our life easier | Robotics, Reinforcement Learning, Web, Python, Rust & Life Hacking. At MotivEdge.io