Today I presented in Tamil Internet 2004 conference held at Singapore, on counting the number of letters/alphabets in a Unicode String. The problem is that if we use the string length functions included in major programming platforms we get only the number of characters based on the storage locations needed. They don’t understand the language and so don’t return letter (எழுத்து) count, instead, they return the count based on the number of Unicode code points. A few months earlier, I had written a quick fix solution in VB.NET to handle this.
As a reusable solution to this problem, today I presented a paper in TI2004 with implementations in Microsoft .NET, JavaScript and PERL.
My full paper (PDF) can be downloaded from here, the presentation (PPT) from here and the generic implementation with full source code for all the 3 platforms here (ZIP) .
Tamil Unicode has always been an issue of heavy discussion, today Badri Seshadri chaired the session well; gave opportunities to everyone to express their views without allowing the core focus to be lost or the time to exceed. Thanks, Badri.
If you want to read more about TI 2004, don’t forget to visit Badri’s Blog and the TI 2004 website.