Today I presented at the Tamil Internet 2004 conference held in Singapore, on counting the number of letters/alphabets in a Unicode String. The problem is that if we use the string length functions included in major programming platforms we get only the number of characters based on the storage locations needed. They don’t understand the language and so don’t return letter (எழுத்து) count, instead, they return the count based on the number of Unicode code points. A few months earlier, I had written a quick fix solution in VB.NET to handle this.

Tamil Internet 2004 conference held in Singapore

Tamil Internet 2004 conference held in Singapore

As a reusable solution to this problem, today I presented a paper in TI2004 with implementations in Microsoft .NET, JavaScript and PERL.

T.N.C.Venkatarangan presenting on counting unicode tamil letters

T.N.C.Venkatarangan presenting on counting unicode Tamil letters

My full paper (PDF) can be downloaded from here, the presentation (PPT) from here and the generic implementation with full source code for all the 3 platforms here (ZIP) .

Prof M Anandakrishnan, Dr Kalyanasundram and Dr N Kannan listening to Venkatarangan - TI2004, Singapore

Prof M Anandakrishnan, Dr Kalyanasundram and Dr N Kannan listening to Venkatarangan – TI2004, Singapore

Tamil Unicode has always been an issue of heavy discussion, today Badri Seshadri chaired the session well; and gave opportunities to everyone to express their views without allowing the core focus to be lost or the time to exceed. Thanks, Badri.

If you want to read more about TI 2004, don’t forget to visit Badri’s Blog and the TI 2004 website.