From the time of the first conference in 1999, there have been last 18 Tamil Internet conferences, and, we have seen tremendous progress on the science and engineering behind Indian Language computing that’s happening.
In this post, I am sharing the notes I have taken during the TIC 2019 that’s underway at Anna University. [Disclaimer: These notes are NOT meant to be comprehensive, they are just my notes of (only) the sessions I attended, please treat them as such]
In the keynote, Hon’ble Minister Mr “Mafoi” Pandiarajan spoke in length about his department’s work on developing one of the large corpora of Tamil words with meaning – செந்தமிழ்ச் சொற்பிறப்பியல் அகரமுதலித் திட்ட இயக்ககம் – சொற்குவை (Sorkuvai).
Prof Rajeev Sangal of IIIT Hyderabad talks on speech to speech machine translation, stresses on the importance of the human in the loop. He says for Hindi, his mother tongue to survive and thrive, Tamil, Telugu and all other Indian languages have to survive and thrive, we have a symbiotic connection, we have to support each other. Great insights & vision.
Ms Aparna of Amazon India talking about Alexa’s speech recognition. I learned from her talk of a concept called phoneme – any of the perceptually distinct units of sound in a specified language that distinguishes one word from another.
She asked us to do three activities to draw parallels on what’s happening with Voice Assistants with how humans understand a language.
- “Raise your hands” – all of us did it.
- “Please make me pasta” – we understood but couldn’t do it as we don’t have the raw materials.
- “Something in Spanish” – we heard the voice but didn’t understand.
She explained how the Entity relationships work:
- From the AC room you tell your kids, please close the door.
- Switch off the stove, when you have rice cooking and milk boiling.
- Bring the Gita book.
Mr Santosh K Misra, IAS, CEO of Tamil Nadu e-governance agency was talking about the need of the hour being native NLP tools and applications in Tamil.
Say in a chatbot today, current methodology is to translate from Tamil the user input (typed or spoken) to English, action it, then translate it back to Tamil for speech. Here in these translations, too much of the context and information are lost.
He has made an open call out to academia, researchers and business to take up to do native processing of Tamil. His agency is open to funding this and to take interested students as interns with stipends. Great opportunity for those working on this space.
In one of the paper presentation, I heard of a term called AnnCorra: Annotating Corpora Guidelines For POS And Chunk Annotation For Indian Languages.
I listened to the well-known author Mr Maalan (மாலன் நாராயணன்) on the journey of Tamil on the Internet. It was a well-presented talk.
A senior journalist by Mr Venkatesh Rathakrishnan spoke on the problem of fake news and especially news reported with bias. He expressed very valid concerns. I was relieved that I am not the only one who was thinking on these lines.
Another journalist and a translation expert Mr Aazhi Senthil Nathan (செ.ச.செந்தில்நாதன்) spoke on the journey of Tamil translation, how it is different from European languages, the present state with advanced tools like Google translate, and, how these tools with the help of a human editor can increase the productivity by 3 times and bring down the cost to 30%.
Next was a presentation by Mr T Shrinivasan, a well known open source contributor of Tamil packages, member of Kaniyam foundation and a programming language called Ezhil. He talked about his work on Open-Tamil, an open-source package for handling Tamil text with Python.