Technology

Read a Tamil magazine in English – automatic translation

Optical character recognition (OCR) is the technique of typed or printed or handwritten text to computer encoded text (plain text). Machine Translation is the translation (conversion) of text from one language to another.

Today an article about had been published in a Tamil magazine. A friend asked me to translate the same from Tamil in English. Instead of typing it all myself, I wanted to use machine translation – for that first, I needed to get the Tamil text out of the magazine page and then translate them using Google or Bing Translate.

There are two methods to get this done.

Method 1 – Using Google Docs

Following are the steps on how I did this:

  1. Scan the article.
  2. The page layout was in two columns, I cropped them into a single column individual files (Google Docs gets confused when you have two columns, it combines the text as one long line).
  3. I joined all the individual files into one long (single column) one – I converted to a PDF file, but it can be a long JPG file too.
  4. Upload the PDF file to Google Drive.
  5. Right the file and say “Open with Google Docs”.
  6. You will see a new file opened, with Tamil Text. Copy the entire text.
  7. Go to translate.google.com, paste the text and then convert to English.
  8. Copy the English text and paste it to a new file.
Step 1, 2, 3 – Convert the two column image to a long single column
Step 5, 6 – Upload the image file to Google Drive, then open with Google Docs
Step 7, 8 – Copy the Tamil text, translate using Google Translate and then get the English text

Method 2 – Using Google Translate app

A simpler method if I just wanted to read the Tamil text in English will be to point the camera in Google Translate app at the Tamil text and press Translate!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.