Optical character recognition (OCR) is the technique of typed or printed or handwritten text to computer encoded text (plain text). Machine Translation is the translation (conversion) of text from one language to another.
Today an article about had been published in a Tamil magazine. A friend asked me to translate the same from Tamil in English. Instead of typing it all myself, I wanted to use machine translation – for that first, I needed to get the Tamil text out of the magazine page and then translate them using Google or Bing Translate.
There are two methods to get this done.
Method 1 – Using Google Docs
Following are the steps on how I did this:
- Scan the article.
- The page layout was in two columns, I cropped them into a single column individual files (Google Docs gets confused when you have two columns, it combines the text as one long line).
- I joined all the individual files into one long (single column) one – I converted to a PDF file, but it can be a long JPG file too.
- Upload the PDF file to Google Drive.
- Right the file and say “Open with Google Docs”.
- You will see a new file opened, with Tamil Text. Copy the entire text.
- Go to translate.google.com, paste the text and then convert to English.
- Copy the English text and paste it to a new file.
Method 2 – Using Google Translate app
A simpler method if I just wanted to read the Tamil text in English will be to point the camera in Google Translate app at the Tamil text and press Translate!