Events Technology

WWW: Technology, Standards and I18N Conference in New Delhi

WWW: Technology, Standards and I18N Conference in New Delhi
WWW: Technology, Standards and I18N Conference in New Delhi

Today there was a conference in Hotel Lalit, New Delhi on “WWW: Technology, Standards and Internationalization Conference” and the inauguration of W3C India office in TDIL, Government of India.

Ms.Swaran Latha of TDIL & Director of W3C India Office

  • Character sets and codes for all 22 official Indian Languages and 11 Scripts are now in UNICODE
  • Efforts happening on PLS 1.0 (Pronunciation of Language Specifications) starting with Hindi. TDIL will soon start work on other Indian languages
  • Issues specific to Indic languages on CSS3 style sheets like line breaks, drop case and others need to be handled & discussed
  • In terms of CSS3 Japanese have done some excellent work on these lines, refer:http://www.w3.org/TR/jlreq/; Arabic and other Bi-Directional languages as well, refer: http://www.w3.org/TR/html-bidi/. Indic languages in this area are not having any reference documents, this was told to be one main reason for Indic Languages having any rendering issues on CSS3 or HTML. Work needs to done on preparing reference documents for Drop Case, Underline, Indentation, Bullets and so on. Indic Languages have issues on Vertical Layouts, as they need to displayed Syllable by Syllable rather than Characters. Only with these documents in place, we can talk to browser vendors to enable 100% support for Indian Languages
  • In terms of XML Normalization, IDs need to be in Non-Latin Characters. Lot of work needs to be done here too
  • Work needs to be done in Character Model (http://www.w3.org/TR/charmod/), Speech Synthesis Markup Language (SSML)  and TimeZone (http://www.w3.org/TR/timezone/) to update them with Indic Languages and India specific information
  • TDIL has worked and submitted CLDR (Common Locale Data Repository) for six Indian languages to all international standards body
  • We need to think on issues on IDN relating to Indic languages, especially since many languages share common scripts. For Example, once Hindi goes live in IDN, Marathi & Bodo may not have many options in IDN names. Refer RFC 5646 for language tagging, UNICODE to Punycode becomes necessary
  • TDIL has worked on a Mobile Initiation Plan for Hindi, Bangla, Marathi & Tamil. A 7-bit UNICODE based Encoding scheme for many Indian languages have been approved in 3GPP for Mobile SMS usage in Sep ‘09. If the standards are agreed (UTF-8/UTF-16) between TELCOS, Mobile Browser developers & OEMs for Indic Languages then huge results can be seen in One year for having Indic Languages in Mobile web (A related presentation can be seen here)
  • Issues that TDIL is working on include Orthography, Pronunciation, One Script – Many Languages, Very few linguistic experts know IT, Working on Collation/Sort Order with State Government, Lack of Parallel Corpora between English and Indic Languages
  • Only 5% of people in India can read & write English
  • A presentation made by Ms.Swaran Latha on the same topic in a NASSCOM event can be seen here
Dr.Jeffrey Jaffe

Dr.Jeffrey Jaffe

Dr.Jeffrey Jaffe, CEO, W3C Foundation

  • Says how W3C standards are all fully open, it doesn’t prevent proprietary innovations but once the innovation gets into the fabric of the web, W3C tries to standardize it.Then the vendor has to turn it royalty free including patents for it
  • Sir Tim Berners Lee is working with UK government to publish all their data in the most open, rich semantic standards. Wish similar efforts are under way in India soon
  • About 2 Billion People are using Internet today

Dr. S. Ramadorai ,Vice Chairman, TCS India

  • Inclusiveness of all 22 Indian languages will have a vested interest to industry as they will give a larger consumer base to service providers.
  • Case is not just of interoperability & translation between Indian languages it is now about world languages to Indian languages & back

Dr. Raghuram Krishnapuram, Senior Manager, IBM India Research Lab

Mr. Rajendra S. Pawar, Vice Chairman, NASSCOM

  • Every physicist writes his last book on philosophy
  • My children’s generation is more comfortable sharing personal information in Social Networks. Society has come to a full cycle to become a better society by sharing
  • The industrial revolution did well in distributing wealth. The unintended consequence was, of Humans were seen as an intrusion in the man-machine equation. The information revolution has brought back the man to the centre, a full cycle.
  • How industrial world’s scarcity mentality has given way to abundance mentality in the Information world
  • In India, we bow to people who give away their wealth, but look up to people who have lot of wealth
  • The main question now is who will give the soul to the web and maintain it. That’s why all of us should be members in W3C & contribute

Shri R. Chandrashekhar, Secretary ,DIT, MCIT

  • No one knows better than India the perils of a monolingual Internet. Real world is very diff with multiple languages in usage
  • For India & developing countries Bits & Bites (meaning food) are both important

Shri Sachin Pilot, Hon’ble Minister of State for Communications & IT

  • Vision of IT for 500 million Indians by 2022, IT Dept has to train about 10 million

Internationalization aspects in W3C

  • Richard Ishida W3C Internationalization head talked about I18N aspects in W3C. Reference: www.w3c.org/International is the best single place to go for information relating to I18N from W3C, Twitter:  @webi18n. How to author HTML & CSS for I18N considerations.
  • Dr. A. Kumaran, Microsoft Research India talked on I18N & Name search. He says “what is songs are for a Bollywood movie, are names in social networking”. There are 60 million web pages which have misspelt “Barack Obama”. There are 1500 known misspellings of Britney Spears. These name search issues compound across languages. A multilingual name search can be done by generating hash code in each language for a name and then compared. This approach is much better than transliteration. In one trial, for example, the accuracy for Tamil jumped from 0.29 to 0.69 with this method. To begin with, you need 10,000 Parallel Names in two languages
  • Manish Bhargava, Google Inc., USA head Indic efforts in Google which work on 40 Lang which maps to about 99% of world population. He talked about “web for all”.  Hindi has 290 Million users, Bengali 215M users, Telugu 75M users, Tamil 77M Internet users. India has 50M Internet users. PC penetration in India is 2%. Introduction of Google transliteration Keyboard feature in Google search resulted in about 7% increase in the volume of searches in Indic languages in just 2 weeks of introduction. Indic Languages user base in the web is heterogeneous with NRIs – 20 Million, Developed Users in India – 300 Million and Emerging users in India – 600 Million. Only 50% of the world will be on the Internet by 2030. Lack of Online content in Indic Languages is a major constraint and Google picked up English content and translated them to Hindi and put them back in Wikipedia.
  • Pravin Satpute, Redhat India is talking about how there are four rendering engines in Linux including PANGO. Harfbuzz is a project to unify these

Web Access through Mobile and Handheld devices

  • Dr. Phil Archer, W3C Mobile Lead talked on Geolocation API, Device API to take advantage of mobile device capabilities like a camera from your Mobile Web page. Recommends seeing the Mobile web application best practise document
  • Prof.Devendra Jalihal of IIT Madras talked on a scheme for efficient Hindi mobile keypad. When inputting Indic Language, Multi-Tap (more taps required) issue crop up & Dictionary based typing is not smooth as well. The finding of the research was that “Spreading vowels in Indian Languages to separate buttons in a mobile phone keyboard is good for efficiency in typing”
  • Vivekananda Pai of Reverie Technology says Indic languages need 100% perfect rendering support. Devices give English users a good experience but need to do the same for Indic languages. S.K.Mohanty, CDAC is one of the pioneers on Typography for Indian Languages. TV Remote will become the second most popular device in India, once 3G becomes popular

Some other data points:

  • To support WAP a device needs 50 Mhz CPU and 0.5 MB of RAM. To support Basic Web a device needs 110 Mhz CPU and 1 MB of RAM. To support Web Kit a device needs 400 Mhz CPU and 5 MB of RAM
  • Mobile Users in India are 471 Million. Out of which devices capable of Internet Mobile: 127 M, Used once: 12 M, Active users: 4 M as per TRAI Sep ‘09 report
  • In India, a user sends about 30 SMS / Month – very low compared to other Asian countries especially South East Asia
  • Each of the Top 4 Regional newspapers in India exceeds by 3 Times the top English newspaper in India in terms of readership