Counting Letters in an Unicode String

Today I presented in Tamil Internet 2004 conference held at Singapore, on counting the number of letters/alphabets in an Unicode String. The problem is that if we use the string length functions included in major programming platforms we get only number of characters based on storage sizes. They don’t understand the language and so don’t return letter (எழுத்து) count, instead they return only count based on character storage. You can read my earlier post for more on this. To come up with a reusable solution to this problem, I presented today a paper in TI2004 with implementations in major programming platforms like Microsoft .NET, JavaScript and PERL.

My full paper (PDF) can be downloaded from here, the presentation (PPT) from here and the generic implementation with full source code for all the 3 platforms here (ZIP) .

Tamil Unicode has always been a issue of heavy discussion, today Badri Seshadri chaired the session well; gave opportunities to everyone to express their views without allowing the core focus to be lost or the time to exceed. Thanks Badri.

If you want to read more about TI 2004, don’t forget to visit Badri’s Blog and the TI 2004 website.