I was invited to present a guest talk for the annual Tamil Internet Conference 2019, that is underway at Anna University, organized by INFITT along with Tamil Virtual Academy and others. I presented a talk with demos titled aptly as “செயல் விளக்கம்” – Demonstration of tools for Tamil for writing and coding.
I demonstrated the following:
- Google Voice Typing in Tamil on your PC – கூகுளின் குரல்வழித் தமிழில் உள்ளிடல் வசதி.
- How to use Tesseract, the open-source OCR engine to convert scanned pages in Tamil to PDF with Embedded Text (in Tamil) feature enabled, so that search in Tamil and copy and paste of text works. I will write a post on this later – the steps are to take a picture of the page with Tamil text using Office Lens or Camscanner app, download Tesseract & install, and, then use the following command:
tesseract.exe -l tam tamilpage1.jpg tamilpageoutput pdf
- To improve on the scanning, I talked of a project by Kaniyam Foundation who have built a low-cost scan box here. In that page, they also talk on how to use ScanTailor to improve the scanned image quality.
- How you can use Grapheme package in Python to help in handling Unicode strings. After my talk, I was informed by TShrinivasan about another Python Package: Open-Tamil, that offers a lot more.
#pip import grapheme #Reference: https://github.com/alvinlindstam/grapheme import grapheme தமிழ்வரி = "அம்மா என்றால் அழகு, என் தாய் முத்துலட்சுமி என்று சொன்னான் முருகன்" NumOfLetters = len (தமிழ்வரி) print (தமிழ்வரி, ":", NumOfLetters) gl = grapheme.length(தமிழ்வரி) print ("Grapheme", ":", தமிழ்வரி, ":", gl) print ("First 8 characters: ", தமிழ்வரி[0:7]) glslice = grapheme.slice(தமிழ்வரி, 0, 8) print ("First 8 characters with Grapheme: ", glslice) தமிழ்வரி2 = "\u0BA4\u0BAE\u0BBF\u0BB4\u0BCD" NumOfLetters = len (தமிழ்வரி2) print (தமிழ்வரி2, ":", NumOfLetters) gl2 = grapheme.length(தமிழ்வரி2) print ("Grapheme", ":", தமிழ்வரி2, ":", gl2)
- Speech to Text and Text to Speech for Tamil with Python.
- Python code to use Google Cloud Vision API to extract text from an image and then use Google Cloud Translate API to translate the text from Tamil to English.
From the time of the first conference in 1999, there have been last 18 conferences, we have seen tremendous progress on the science and engineering behind Indian Language computing that’s happening. In 2013 conference that happened in Malaysia, I had spoken about the migration of tools and applications to the Cloud and Windows 8!