<?xml version='1.0' encoding='utf-8' ?>
<!--  If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/  -->
<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:media='http://search.yahoo.com/mrss/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>Santhosh Thottingal</title>
  <link>http://santhoshtr.livejournal.com/</link>
  <description>Santhosh Thottingal - LiveJournal.com</description>
  <lastBuildDate>Sun, 14 Jun 2009 06:36:41 GMT</lastBuildDate>
  <generator>LiveJournal / LiveJournal.com</generator>
  <lj:journal>santhoshtr</lj:journal>
  <lj:journalid>12479053</lj:journalid>
  <lj:journaltype>personal</lj:journaltype>
  <atom10:link rel='hub' href='http://pubsubhubbub.appspot.com/' />
  <image>
    <url>http://l-userpic.livejournal.com/64127326/12479053</url>
    <title>Santhosh Thottingal</title>
    <link>http://santhoshtr.livejournal.com/</link>
    <width>100</width>
    <height>98</height>
  </image>

<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/17518.html</guid>
  <pubDate>Sun, 14 Jun 2009 06:36:41 GMT</pubDate>
  <title>Goodbye LJ!</title>
  <link>http://santhoshtr.livejournal.com/17518.html</link>
  <description>Good Bye Livejournal..&lt;br /&gt;I am moving to my new home : &lt;a href=&quot;http://thottingal.in/blog&quot; rel=&quot;nofollow&quot;&gt;http://thottingal.in/blog&lt;/a&gt;&lt;br /&gt;Friends, Please update your bookmarks, feed subscriptions etc..&lt;br /&gt;Migrating all  livejournal posts was too easy with new version of wordpress.</description>
  <comments>http://santhoshtr.livejournal.com/17518.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>7</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/17353.html</guid>
  <pubDate>Tue, 26 May 2009 15:59:23 GMT</pubDate>
  <title>Openoffice Indic Regional Language group</title>
  <link>http://santhoshtr.livejournal.com/17353.html</link>
  <description>We just formed Indic Regional Language group for Openoffice. This is as per the &lt;a href=&quot;http://wiki.services.openoffice.org/wiki/NLC&quot; rel=&quot;nofollow&quot;&gt;Openoffice Native Language Consortium Plans&lt;/a&gt;. The objectives of such groups can be read from &lt;a href=&quot;http://wiki.services.openoffice.org/wiki/Regional_Groups&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;. Basically the group is meant for better coordination among Indic languages to make Openoffice experience in our language better.&lt;br /&gt;The announcement of this group is &lt;a href=&quot;http://native-lang.openoffice.org/servlets/ReadMsg?list=dev&amp;amp;msgNo=8769&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thanks to &lt;a href=&quot;http://www.standardsandfreedom.net&quot; rel=&quot;nofollow&quot;&gt;Charles-H. Schulz&lt;/a&gt;, we got a mailing list &lt;a href=&quot;http://native-lang.openoffice.org/servlets/SummarizeList?listName=indic&quot; rel=&quot;nofollow&quot;&gt;indic@native-lang.openoffice.org&lt;/a&gt;. To subscribe login to &lt;a href=&apos;http://native-lang.openoffice.org&apos; rel=&apos;nofollow&apos;&gt;http://native-lang.openoffice.org&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;We just  started, and I will soon setup a wiki page there. To start with , I will collect the list of issues pending for Indian languages from people from various languages and will find out people from various languages as point of contacts. Feel free to contact me for anything related to Openoffice in your language.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update: June 3, 2009&lt;/b&gt;: &lt;a href=&quot;http://wiki.services.openoffice.org/wiki/NLC/IndicGroup&quot; rel=&quot;nofollow&quot;&gt;This is our wiki page &lt;/a&gt;</description>
  <comments>http://santhoshtr.livejournal.com/17353.html</comments>
  <category>openoffice</category>
  <category>indic</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/16972.html</guid>
  <pubDate>Wed, 13 May 2009 10:04:23 GMT</pubDate>
  <title>In solidarity</title>
  <link>http://santhoshtr.livejournal.com/16972.html</link>
  <description>&lt;a href=&quot;http://binayaksen.net&quot; rel=&quot;nofollow&quot;&gt;&lt;img src=&quot;http://binayaksen.net/wp-content/gallery/site-graphics/have-a-heart-1.gif&quot; /&gt;&lt;/a&gt;</description>
  <comments>http://santhoshtr.livejournal.com/16972.html</comments>
  <category>politics</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/16753.html</guid>
  <pubDate>Sun, 29 Mar 2009 12:09:00 GMT</pubDate>
  <title>Python isalpha is buggy</title>
  <link>http://santhoshtr.livejournal.com/16753.html</link>
  <description>This code
&lt;br /&gt;
&lt;pre&gt;
#!/usr/bin/env python
# -*- coding: utf-8 -*-
ml_string=u&quot;സന്തോഷ്  हिन्दी&quot;
for ch in ml_string:
    if(ch.isalpha()):
        print ch
&lt;/pre&gt;   
&lt;br /&gt;
gives this output
&lt;br /&gt;
&lt;pre&gt;
സ
ന
ത
ഷ
ह
न
द
&lt;/pre&gt;
And fails for all mathra signs of Indian languages. This is a &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=466912&quot; rel=&quot;nofollow&quot;&gt; known &lt;/a&gt; &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=474124&quot; rel=&quot;nofollow&quot;&gt; bug&lt;/a&gt; in glibc.
Does anybody know whether python internally use glibc functions for this basic string operations or use separate character database llke QT does?</description>
  <comments>http://santhoshtr.livejournal.com/16753.html</comments>
  <category>bugs</category>
  <category>python</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/16507.html</guid>
  <pubDate>Sat, 28 Mar 2009 14:04:59 GMT</pubDate>
  <title>N-gram Visualization Experiment</title>
  <link>http://santhoshtr.livejournal.com/16507.html</link>
  <description>Following image shows the python-graphviz generated visualization of N-Gram representation of first paragraph &lt;a href=&quot;http://hi.wikipedia.org/wiki/%E0%A4%9A%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%8D%E0%A4%B0%E0%A4%AF%E0%A4%BE%E0%A4%A8&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt; from Hindi wikipedia. The image represents the possible paths through which a sentence can be constructed if we start from a word भारत. &lt;br /&gt;Click to view the enlarged image&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/000156za/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/000156za&quot; width=&quot;384&quot; height=&quot;960&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;</description>
  <comments>http://santhoshtr.livejournal.com/16507.html</comments>
  <category>experiment</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/16339.html</guid>
  <pubDate>Wed, 25 Mar 2009 17:10:42 GMT</pubDate>
  <title>Localization: What are we missing?</title>
  <link>http://santhoshtr.livejournal.com/16339.html</link>
  <description>[This blog post is kind of self criticism and written not forgetting the valuable contribution that l10n communities are doing. ]&lt;br /&gt;Some observations on the Localized desktops in Indian Languages&lt;br /&gt;* Not all localization team members try the application that he/she translate at least once before working on the PO file. Result: If somebody does the localization without understanding what the application does and try the en_US interface, he/she miss the context of the strings. An example I have seen : the string &quot;Querying&quot; was translated to xx_IN language string which means &quot;Questioning&quot; instead of the required string corresponding to &quot;Searching&quot;. Sometimes we miss to understand how much space the string is going to take in the screen and we translate a small English word to a long xx_IN string to make the meaning clear. Result: Ugly interface.&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/00011yfx/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/00011yfx&quot; width=&quot;640&quot; height=&quot;148&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;Tamil gedit from Ubuntu 8.10(Click to enlarge)&lt;br /&gt;* Not all localization team members *try* the application that he/she translated after completing the PO file or even after the application is released. This happens when he/she translates many applications(sometimes if it is part his/her job).&lt;br /&gt;* Practically, there is no process called *testing* localized desktop in our SDLC. L10N members translates a PO file and sometimes he/she translates it as text file rather than a user interface. It is must that we should bring some process to make sure that the localized desktop is tested for usability, contextually correct translation, spelling mistakes, wrong short cut keys, fuzzy strings , non translated strings in main interface etc etc.&lt;br /&gt;* Since the ratio between the total number of applications in a desktop environment and number of team members is very less, we end up in translating one application by many people. Result: inconsistent translation and no ownership for ensuring the translation quality. Ramadoss from Tamil team was suggesting that ideally , for each application there should be a person from each language , who is responsible for timely translation, testing. He can take more than one application responsibility but not more than say, 10. Practically, this requires a big l10n community per language and unfortunately we don&apos;t have it as of now.&lt;br /&gt;* Peer review, one of the important and mandatory process in l10n is not happening properly when the release date is approaching. L10N communities often try to meet the percentage of completion somehow. IMHO, the new l10n tools frameworks often miss to give importance for peer review in the workflow they design. FOSS community, being inclusive in nature  often welcomes new l10n contributors. I have seen many members improving their l10n skills after making the corrections as per the review comments from others. When a new l10n workflow allows every contributor to submit their translated PO file without the peer review from community, the ultimate result is very bad user interface. We have seen this many times with Rosetta translations of Ubuntu. Everybody going there tries out the Rosetta &quot;features&quot; and leave few strings &quot;translated&quot; there. And Ubuntu takes those strings for their immediate release. Upstream translations are never taken on time or the &quot;translated&quot; strings are never submitted to upstream. Result: Very bad localized desktop with many spelling mistakes, inconstant translations etc.. We ml_IN team used to watch who is &quot;contributing&quot; through Rosetta and make him work with the community.  I hope the new translation frameworks will give sufficient attention to this problem. If we are not keeping a balance between newbie translation and quality assuarance ,  our localized desktops will not improve.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/000133rg/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/000133rg&quot; width=&quot;640&quot; height=&quot;360&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;Again Tamil gedit, but from Debian Lenny. Compare it with the Ubuntu version shown above(Click to enlarge)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;* User feedback: The number of users who use the desktop in their own mother tongue, even though the % of translation is more than 80% for many languages, is very less. IMHO, It is because of a &apos;dependency conflict&apos; of the following things&lt;br /&gt;	a) A person who is not good in English&lt;br /&gt;	b) A person want to use computer in mother language for some &quot;purpose&quot;&lt;br /&gt;	c) A person who is capable of spending Rs ~20K for a computer &lt;br /&gt; Most of the cases, there is a conflict between any 2 of the following and that ends up in a) Person use his desktop in english b) Person not using computer at all. I am sure that if there are good number of users, we will not end up in interfaces I showed in the screen shots.&lt;br /&gt;* One inconsistency I noticed across localized desktops is regarding the shortcut keys/accelerator keys. Some languages use English short cut keys and give at the end of the word in Brackets for eg: അടയ്ക്കുക (C).  As you can see in screen shots sometimes we have small letter and sometimes capital letter for that. Some languages use letters in xx_IN itself. But there is no consistency. For Control and Alt keys, some language translate them, some others keep it in English. What is the problem with English short cut key? For using English short cut key , the user should be using English layout keyboard. For shortcut keys in xx_IN, one should be using xx_IN keyboard layout. For a user(assume that he use xx_IN desktop since he is not good in English) typing in xx_IN in gedit using xx_IN keyboard, is it possible to use the short cut keys if we give in English? Are we expecting that for using short cut key while typing the document, he change switch his keyboard layout ? (btw, anybody noticed that Apple doesn&apos;t use Accelerator keys in its OS?)&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/00010ph2/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/00010ph2&quot; width=&quot;600&quot; height=&quot;480&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;bn_IN gedit in Ubuntu 8.10(Click to enlarge)&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/00012s8f/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/00012s8f&quot; width=&quot;640&quot; height=&quot;400&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;ml_IN gnome dictionary client in Ubuntu 8.10(Click to enlarge)&lt;br /&gt;&lt;br /&gt;Suggestion/Ideas are welcome... How  can we make our localized desktop more beautiful and user friendly?</description>
  <comments>http://santhoshtr.livejournal.com/16339.html</comments>
  <category>localization</category>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/15959.html</guid>
  <pubDate>Sun, 11 Jan 2009 05:53:38 GMT</pubDate>
  <title>Updates...</title>
  <link>http://santhoshtr.livejournal.com/15959.html</link>
  <description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.j4v4m4n.in&quot; rel=&quot;nofollow&quot;&gt;Praveen&lt;/a&gt; prepared &lt;a href=&quot;http://gnupravi.blip.tv/&quot; rel=&quot;nofollow&quot;&gt;videos&lt;/a&gt; from the matrix screen savers in 6 languages&lt;/li&gt;
&lt;li&gt; &lt;a href=&quot;http://www.gnu.org/fry&quot; rel=&quot;nofollow&quot;&gt;This video&lt;/a&gt;  is translated to &lt;a href=&quot;http://www.gnu.org/fry/happy-birthday-to-gnu-translation.html&quot; rel=&quot;nofollow&quot;&gt; Malayalam.&lt;/a&gt; For those who are interested in how to do that refer &lt;a href=&quot;http://www.gnu.org/fry/happy-birthday-to-gnu-in-your-language.html&quot; rel=&quot;nofollow&quot;&gt;this&lt;/a&gt;&lt;/li&gt;
&lt;li&gt; I prepared the &lt;a href=&quot;http://git.savannah.gnu.org/gitweb/?p=smc.git;a=tree;f=collation&quot; rel=&quot;nofollow&quot;&gt;glibc collation table for Malayalam&lt;/a&gt; . But still some more bugs to be fixed&lt;/li&gt;
&lt;li&gt; We friends are working on adding Saka year system to KDE calendar system and &lt;a href=&quot;http://git.savannah.gnu.org/gitweb/?p=smc.git;a=tree;f=calendar/kde&quot; rel=&quot;nofollow&quot;&gt;it is almost ready&lt;/a&gt; . And here is the video : &lt;a href=&quot;http://blip.tv/file/1656477&quot; rel=&quot;nofollow&quot;&gt;Saka calendar in KDE &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/DICT&quot; rel=&quot;nofollow&quot;&gt;Dict&lt;/a&gt; based english-malayalam dictionary is in developement and we are &lt;a href=&quot;http://git.savannah.gnu.org/gitweb/?p=smc.git;a=tree;f=dictionary&quot; rel=&quot;nofollow&quot;&gt;ready for a beta release&lt;/a&gt;. &lt;a href=&quot;http://rajeeshknambiar.wordpress.com/2009/01/01/english-malayalam-dict-rfc2229/&quot; rel=&quot;nofollow&quot;&gt;Rajeesh&lt;/a&gt; did a woderful job in preparing it &lt;/li&gt;
&lt;li&gt; &lt;a href=&quot;http://l10n.kde.org/team-infos.php?teamcode=ml&quot; rel=&quot;nofollow&quot;&gt;KDE Malayalam team&lt;/a&gt;  is &lt;a href=&quot;http://l10n.kde.org/stats/gui/trunk-kde4/team/ml/&quot; rel=&quot;nofollow&quot;&gt;doing great job&lt;/a&gt;  &lt;/li&gt;
&lt;li&gt; Worked on hunspell to handle the agglutinative nature of Indian languages and found some problems with hunspell. Hunspell developer Nemth is looking into that &lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://sujithh.info/2009/01/2009-foss-contribution/&quot; rel=&quot;nofollow&quot;&gt;Sujith added Malayalam support&lt;/a&gt; to KLetters &lt;/li&gt;
&lt;li&gt;Planning to attend &lt;a href=&quot;http://mec.fossmeet.in/&quot; rel=&quot;nofollow&quot;&gt;fossmeet @ Model Engineering College&lt;/a&gt;, Cochin&lt;/li&gt;
&lt;/ul&gt;</description>
  <comments>http://santhoshtr.livejournal.com/15959.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/15786.html</guid>
  <pubDate>Sun, 21 Dec 2008 16:32:03 GMT</pubDate>
  <title>KDE Indic Screensavers</title>
  <link>http://santhoshtr.livejournal.com/15786.html</link>
  <description>&lt;p&gt;I ported all of the Matrix screensavers with Indian language glyphs to KDE4. For details about the screensavers  please read:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://santhoshtr.livejournal.com/7078.html&quot;&gt;Hacking the GLMatrix screensaver&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://santhoshtr.livejournal.com/13439.html&quot;&gt;Screensavers in your language&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;
&lt;p&gt;
Download the binary packages: &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/kscreensavers-indic-matrix_1.0.0.deb&quot; rel=&quot;nofollow&quot;&gt;Deb package&lt;/a&gt;, and &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/kscreensavers-indic-matrix-1.0.1-2.i386.rpm&quot; rel=&quot;nofollow&quot;&gt;RPM package&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
There are 6 screensavers in that package, for Malayalam, Hindi, Oriya , Bengali, Tamil and Gujarati. After installation, goto KDE system settings-&amp;gt;Desktop-&amp;gt;Screensaver and select any of this.
&lt;/p&gt;
 Screenshots(click to get the image in original size):&lt;br /&gt;
&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000yg8c/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000yg8c/s320x240&quot; width=&quot;320&quot; height=&quot;177&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
&lt;br /&gt;
KDE Screensaver configuration for Hindi:
&lt;br /&gt;
&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000zdpy/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000zdpy/s320x240&quot; width=&quot;304&quot; height=&quot;240&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
Enjoy...!</description>
  <comments>http://santhoshtr.livejournal.com/15786.html</comments>
  <category>kde</category>
  <category>screensaver</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/15599.html</guid>
  <pubDate>Tue, 16 Dec 2008 17:04:15 GMT</pubDate>
  <title>Hyphenation of Indian Languages in Webpages</title>
  <link>http://santhoshtr.livejournal.com/15599.html</link>
  <description>In my last blogpost I explained &lt;a href=&quot;http://santhoshtr.livejournal.com/15266.html&quot;&gt;hyphenation of Indian language text in openoffice&lt;/a&gt;. In this blogpost I will explain how hyphenation can be done in webpages.
&lt;p&gt;
As I explained importance of hyphenation come into picture when we justify the text. The length of the lines are controlled by the parent tags.... Unicode had defined a special character called soft hyphen for  hyphenation denoted by &amp;amp;shy; . In HTML, the plain hy­phen is rep­re­sent­ed by the &quot;-&quot; char­ac­ter (&amp;amp;#45; or&amp;amp;#x2D;). The soft hy­phen is rep­re­sent­ed by the char­ac­ter en­ti­ty ref­er­ence &amp;amp;shy; (&amp;amp;#173; or &amp;amp;#xAD;)
&lt;/p&gt;
&lt;p&gt;User agents-browsers can break the line whenever a soft hyphen is found. So if we have a javascript based implemenation, which insert the softhyphen in between the words based on language specific rules, we can achieve hyphenation in webpages too.
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&quot;http://code.google.com/p/hyphenator/&quot; rel=&quot;nofollow&quot;&gt;Hyphenator&lt;/a&gt; is a project which does exactly the same.&lt;i&gt; &quot;Hyphenator.js brings client-side hyphenation of HTML-Documents on to every browser by inserting soft hyphens using hyphenation patterns and Frank M. Liangs hyphenation algorithm commonly known from LaTeX and Openoffice. &quot;&lt;/i&gt;
&lt;/p&gt;
&lt;p&gt;
Hyphenator was not tested for any non-latin languages so far. I tried to add support for Indian languages and the result was satisfactory. I used the 
same rules I defined for openoffice. Unlike latin languages, the number of hyphenation patterns for Indian languages is very less and the performance is good because of that.
&lt;/p&gt;
&lt;p&gt;
I have added Malayalam, Tamil, Hindi, Oriya, Kannda, Telugu, Bengali, Gujarati and Panjabi support to it. &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/hyphenation/web/example.html&quot; rel=&quot;nofollow&quot;&gt;You can see a working example here&lt;/a&gt;. (I wanted to embed one example here. But livejournal doesnot allow javascript inside blog body ). The column layout is done by CSS. Try resizing the browser windows and try a print preview too..
&lt;/p&gt;
&lt;p&gt;
Don&apos;t forget to read the source code of that page. It is very simple. If you want hyphenation in your webpage, all you need is to include the javascript as done in the example.  We need to provide the lang attributes for nodes so that the required patterns for that language can be loaded. I placed the new language patterns temporarily in download area of SMC. I will ask the author of  Hyphenator to include it in upstream itself. Code is  &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/hyphenation/web&quot; rel=&quot;nofollow&quot;&gt;available here&lt;/a&gt;
&lt;/p&gt;&lt;hr /&gt;
&lt;b&gt;Update(18-Dec-2008):&lt;/b&gt;Thanks to Mathias Nater, author of hyphenator, the patterns were added to &lt;a href=&quot;http://code.google.com/p/hyphenator&quot; rel=&quot;nofollow&quot;&gt;upstream&lt;/a&gt;.</description>
  <comments>http://santhoshtr.livejournal.com/15599.html</comments>
  <category>hyphenation</category>
  <category>web</category>
  <category>javascript</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/15266.html</guid>
  <pubDate>Sat, 13 Dec 2008 14:11:25 GMT</pubDate>
  <title>Hyphenation of Indian Languages and Openoffice</title>
  <link>http://santhoshtr.livejournal.com/15266.html</link>
  <description>&lt;b&gt;What is Hiphenation?&lt;/b&gt;
&lt;p&gt;
Hyphenation is the process inserting hyphens  in between the syllables of a word so that when the text is &lt;a href=&quot;http://en.wikipedia.org/wiki/Justification_(typesetting)&quot; rel=&quot;nofollow&quot;&gt;justified&lt;/a&gt;, maximum space is utilized. 
&lt;/p&gt;
&lt;p&gt;
Hiphenation is an important feature that DTP softwares provide. For Indian languages there is no good DTP softwares available. XeTex is the only choice to work with unicode and professional quality page layout. But xetex and DTP are not exactly same.  Inkscape can be used as temporary solution. But only for small scale works. There is a project going on to add Harfbuzz backend to Scribus, the freedomware DTP package. 
&lt;/p&gt;
&lt;p&gt;
Hiphenation is also requred in many other places. Actually it is required where ever we &apos;justify&apos; a block of text in openoffice or any wordprocessors. Same is the case of webpages. If we justify a block of text in ml_IN, let is see what is happening now&lt;/p&gt;
&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000wd6p&quot; width=&quot;267&quot; height=&quot;118&quot; border=&quot;0&quot; /&gt;
&lt;p&gt;
Note the long gaps between words. This is a screenshot taken from firefox. The default hiphenation just breaking the lines in space characters. And no doubt that it makes the pages ugly. The problem becomes worse if the length of the word is more and column width is less.
&lt;/p&gt;
&lt;p&gt;
So what is the solution?
&lt;/p&gt;
&lt;p&gt;Ideal solution : Applications should be aware of the language, its hiphenation rules and should to the hiphenation wherever required.&lt;/p&gt;
&lt;p&gt;Openoffice can take hiphenation dictionaries just like spell checkers. But for Indian languages,  we are yet to prepare  hiphenation dictionaries(more on that later.) . CSS3 draft of w3c has a provision for&lt;a href=&quot;http://www.w3.org/TR/css3-text/#hyphenate&quot; rel=&quot;nofollow&quot;&gt; hyphenate&lt;/a&gt;. But it is stil in draft stage&lt;/p&gt;

&lt;b&gt;Algorithm For Hiphenation&lt;/b&gt;
&lt;p&gt;
The basic for all hyphenation algorithms is the hyphenation algorithm, designed by Frank Liang in 1983, which is adopted in TeX. &lt;a href=&quot;http://en.wikipedia.org/wiki/TeX#Hyphenation_and_justification&quot; rel=&quot;nofollow&quot;&gt;Wikipedia artcle of TeX&lt;/a&gt; explain this with very simple example
&lt;blockquote&gt;
If TeX must find the acceptable hyphenation positions in the word encyclopedia, for example, it will consider all the subwords of the extended word .encyclopedia., where . is a special marker to indicate the beginning or end of the word. The list of subwords include all the subwords of length 1 (., e, n, c, y, etc), of length 2 (.e, en, nc, etc), etc, up to the subword of length 14, which is the word itself, including the markers. TeX will then look into its list of hyphenation patterns, and find subwords for which it has calculated the desirability of hyphenation at each position. In the case of our word, 11 such patterns can be matched, namely 1c4l4, 1cy, 1d4i3a, 4edi, e3dia, 2i1a, ope5d, 2p2ed, 3pedi, pedia4, y1c. For each position in the word, TeX will calculate the maximum value obtained among all matching pattern, yielding en1cy1c4l4o3p4e5d4i3a4. Finally, the acceptable positions are those indicated by an odd number, yielding the acceptable hyphenations en-cy-clo-pe-di-a. This system based on subwords allows the definition of very general patterns (such as 2i1a), with low indicative numbers (either odd or even), which can then be superseded by more specific patterns (such as 1d4i3a) if necessary. These patterns find about 90% of the hyphens in the original dictionary; more importantly, they do not insert any spurious hyphen. In addition, a list of exceptions (words for which the patterns do not predict the correct hyphenation) are included with the Plain TeX format; additional ones can be specified by the user.
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
For  more details about the algorithm used in Openoffice &lt;a href=&quot;http://markmail.org/download.xqy?id=rwne7kf67ttyk62l&amp;amp;number=2&quot; rel=&quot;nofollow&quot;&gt; read&lt;/a&gt; this paper by Nemeth Laszlo&lt;/p&gt;
&lt;b&gt;Hiphenation in Indian languages.&lt;/b&gt;
&lt;p&gt;Unlike  English or  any other languages, hiphenation in Indian languages are not that much complex. In general following are the rules
&lt;ul&gt;
&lt;li&gt;[consonant][vowel][consonat] can  be hiphenated as [consonant][vowel] - [consonat]  if vowel is not a virama or halant &lt;/li&gt;
&lt;li&gt;Dont split a word after ZWJ&lt;/li&gt;
&lt;li&gt;We can split a word after ZWNJ&lt;/li&gt;
&lt;li&gt;plus any language specific rules. For eg: in ml_IN a line should not start with a chillu letter.&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;
&lt;b&gt;Hiphenation Dictionaries for Indian languages.&lt;/b&gt;
&lt;p&gt;
Based on the above mentioned rules, Let us try to create hiphenation dictionaries for Indian languages. I will explain this with the help of a Hindi word example: अनुपल्ब्ध.
We have to define the following rules in the dictionary for this  &lt;br /&gt;
अ1  -&amp;gt;  1 is odd number , ie. word can be splitterd after अ &lt;br /&gt;
ु1 -&amp;gt;  1 is odd number , ie. word can be splitterd after ु &lt;br /&gt;
1ल  -&amp;gt;  1 is odd number , ie. word can be splitterd before  ल &lt;br /&gt;
1प -&amp;gt;  1 is odd number , ie. word can be splitterd before  प &lt;br /&gt;
1ब -&amp;gt;  1 is odd number , ie. word can be splitterd before  ब &lt;br /&gt;
्2 -&amp;gt;  2 is even number , ie. word can NOT be splitterd after  ्  &lt;br /&gt;
1ध -&amp;gt;  1 is odd number , ie. word can be splitterd before  ध &lt;br /&gt;
So the end result is अ+नु+प+ल्ब्ध &lt;br /&gt;
&lt;/p&gt;
Same way we can create the Hyphenation dictionaries for all other languages. I have prepared the Hyphenation dictionaries for 8 Indian Languages. &lt;a href=&quot;http://git.savannah.gnu.org/gitweb/?p=smc.git;a=tree;f=hyphenation&quot; rel=&quot;nofollow&quot;&gt;Download it from the git repo of the SMC&lt;/a&gt;. &lt;br /&gt;
&lt;b&gt;How to Install a xx_IN hyphenation dictionary.&lt;/b&gt;
&lt;ul&gt;
&lt;li&gt; Copy the hyphenation dictionay hyph_xx_IN to /usr/share/myspell/dicts folder.&lt;/li&gt;
&lt;li&gt;  Create a file at /usr/share/myspell/infos/ooo/ folder named openoffice.org-hyphenation-xx  with one line content&lt;br&gt;
HYPH xx IN hyph_xx_IN
&lt;/li&gt;
&lt;li&gt;Run this command sudo update-openoffice-dicts &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Open the openoffice writer, Open some fille in your language or type some text. Justify the text. Set the language of the selection by using Tools-&amp;gt;Language  menu Hiphenate it by using Tools-&amp;gt;Language-&amp;gt;Hiphenation menu.
&lt;/p&gt;
&lt;p&gt;
Hope it works :). I tested only Hindi and Malayalam. For other languages , inform me if you see any problems or if it is not working . Here is the hyphenated Malayalam paragraph. Compare it with the image I showed at the beginning of this blogpost
&lt;/p&gt;
&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000xw9h&quot; width=&quot;263&quot; height=&quot;98&quot; border=&quot;0&quot; /&gt;
&lt;p&gt;
Ok. so after testing these hyphenation dictionaries, if we provide them to upstream and packaged, hyiphenation problems in openoffice is solved. :)
&lt;/p&gt;
&lt;p&gt;
But.... How to solve this problem in web pages?!. We will discuss it in next blogpost!&lt;br /&gt;
PS: Thanks to Nemeth Laszlo , author of Hunspell and Openoffice Hyphenation for helping me to prepare the hyphenation tables.
&lt;/p&gt;
&lt;hr /&gt;
&lt;b&gt;Update(Apr 16,2009)&lt;/b&gt; The hyphenation dictionaries were packaged for Fedora and will be part of Fedora 11</description>
  <comments>http://santhoshtr.livejournal.com/15266.html</comments>
  <category>hyphenation</category>
  <category>openoffice</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/15068.html</guid>
  <pubDate>Fri, 05 Dec 2008 14:49:41 GMT</pubDate>
  <title>Yahoo search bug</title>
  <link>http://santhoshtr.livejournal.com/15068.html</link>
  <description>None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong..&lt;br /&gt;See the below image:&lt;br /&gt;&lt;br /&gt; &lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000ta1c&quot; width=&quot;320&quot; height=&quot;228&quot; border=&quot;0&quot; /&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. As a result, the word is rendered wrong in the page. &lt;br /&gt;This happens for all languages which use ZWJ, ZWNJ, ZWS etc. It breaks the word just before the zwnj/zwj and puts the end of bold tag to highlight the search result..&lt;br /&gt;&lt;br /&gt;I showed this to &lt;a href=&quot;http://t3.dotgnu.info/blog/&quot; rel=&quot;nofollow&quot;&gt;Gopal&lt;/a&gt; and told me that he filed a bug on that.</description>
  <comments>http://santhoshtr.livejournal.com/15068.html</comments>
  <category>bugs</category>
  <category>yahoo</category>
  <lj:security>public</lj:security>
  <lj:reply-count>4</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/14738.html</guid>
  <pubDate>Sun, 30 Nov 2008 12:33:30 GMT</pubDate>
  <title>KDE spellchecker not working for Indian Languages</title>
  <link>http://santhoshtr.livejournal.com/14738.html</link>
  <description>As I mentioned in my blog post on &lt;a href=&quot;http://santhoshtr.livejournal.com/13832.html&quot;&gt;Language detection&lt;/a&gt; the sonnet spellchecker of KDE  is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries
&lt;blockquote&gt;Loop through the chars of the word, until the current char is not a letter/ anymore.&lt;/blockquote&gt;
And for this , it use the QChar::.isLetter() function. This functions fails for Matra signs of our languages. 
&lt;p&gt;
A screenshot from a text area in Konqueror:&lt;/p&gt;&lt;p&gt;
&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000rw6t/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000rw6t&quot; width=&quot;246&quot; height=&quot;28&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
&lt;/p&gt;
For example
&lt;code&gt;
&lt;pre&gt;
#include &amp;lt;QtCore/QString&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
int main(){
	QChar letter ;
	letter = &apos;அ&apos;;
	fprintf(stdout,&quot;%d\n&quot;, letter.isLetter());
	letter = &apos;ी&apos;;
	fprintf(stdout,&quot;%d\n&quot;, letter.isLetter());
}
&lt;/pre&gt;
&lt;/code&gt;
In this program, you will get true as output for அ and false for ी. 
&lt;p&gt;
When I showed this to &lt;a href=&quot;http://sayamindu.randomink.org/ramblings/&quot; rel=&quot;nofollow&quot;&gt;Sayamindu&lt;/a&gt; during &lt;a href=&quot;http://foss.in&quot; rel=&quot;nofollow&quot;&gt;foss.in&lt;/a&gt; , he showed me a &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=466912&quot; rel=&quot;nofollow&quot;&gt;bug in glibc &lt;/a&gt;. Eventhough the bug is about Bengali, it is applicable for all languages. It is assigned to &lt;a href=&quot;http://pravin-s.blogspot.com/&quot; rel=&quot;nofollow&quot;&gt;Pravin Satpute&lt;/a&gt; and he told me that he got a solution and will be submitting soon to glibc.
&lt;/p&gt;

&lt;p&gt;
But I am wondering why this bug in KDE unnoticed so far? Nobody used spellcheck for Indian languages in KDE?!
&lt;/p&gt;
&lt;p&gt;
Let me explain why this is not happening in GNOME spellchecker if this is a glibc bug. In gnome, this word splitting will be done in application itself using gtk_text_iter_* and these iteration through words are done by pango words boundary detection algorithms.&lt;/p&gt;
&lt;a href=&quot;https://bugs.kde.org/show_bug.cgi?id=176537&quot; rel=&quot;nofollow&quot;&gt;Filed a bug&lt;/a&gt; in KDE to track it.</description>
  <comments>http://santhoshtr.livejournal.com/14738.html</comments>
  <category>kde</category>
  <category>bugs</category>
  <category>spell checker</category>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/14350.html</guid>
  <pubDate>Sat, 22 Nov 2008 17:12:48 GMT</pubDate>
  <title>Youtube to MPEG or Ogg video conversion</title>
  <link>http://santhoshtr.livejournal.com/14350.html</link>
  <description>Here is the two line method to convert a youtube video to oggvorbis video.&lt;br /&gt;
Locate clive and ffmpeg2theora in your package and install&lt;br /&gt;
&lt;code&gt;$clive &lt;a href=&quot;http://in.youtube.com/watch?v=6JeZ5oeAEyU&quot; rel=&quot;nofollow&quot;&gt;http://in.youtube.com/watch?v=6JeZ5oeAEyU &lt;/a&gt;&lt;/code&gt;(replace this with the youtube address you want) 
It will create a flv file.&lt;br /&gt;
&lt;b&gt;Convert to mpeg video file&lt;/b&gt;&lt;br /&gt;
&lt;code&gt; $ffmpeg -i AmericaAmerica.flv  AmericaAmerica.mpg&lt;/code&gt;&lt;br /&gt;
&lt;b&gt;Convert  to ogg video file&lt;/b&gt;&lt;br /&gt;
&lt;code&gt;$ffmpeg2theora AmericaAmerica.mpg &lt;/code&gt;(replace it with the name of the flv file the previous command created)&lt;br /&gt;
Done. You can see the .ogg file in the directory from where you executed the above commands&lt;br /&gt;</description>
  <comments>http://santhoshtr.livejournal.com/14350.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/14154.html</guid>
  <pubDate>Sat, 15 Nov 2008 15:35:32 GMT</pubDate>
  <title>Dhvani 0.94 Released</title>
  <link>http://santhoshtr.livejournal.com/14154.html</link>
  <description>&lt;p&gt;A new version of &lt;a href=&quot;http://dhvani.sourceforge.net&quot; rel=&quot;nofollow&quot;&gt;Dhvani&lt;/a&gt; -The Indian Language Text to Speech System is available now.  The new version comes with the following improvements/features&lt;/p&gt;
					&lt;ul&gt;
&lt;li&gt;Support for 11 languages- Hindi, Panjabi, Gujarati, Marati, Bengali, Oriya, Telugu, Kannada, Tamil , Malayalam and Pashto(Afganistan)&lt;/li&gt;
&lt;li&gt; Pitch and Tempo modification for speech&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://dhvani.sourceforge.net/doc/outputfile-format.html&quot; rel=&quot;nofollow&quot;&gt;Direct ogg-vorbis speech output and optional wav output format&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://dhvani.sourceforge.net/doc/apis.html&quot; rel=&quot;nofollow&quot;&gt;C/C++ APIs&lt;/a&gt;  for applications to use dhvani as a shared library.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://dhvani.sourceforge.net/doc/screenreader.html&quot; rel=&quot;nofollow&quot;&gt;Generic driver for Speech-dispatcher&lt;/a&gt; and Integration to Orca through speech dispatcher&lt;/li&gt;
&lt;li&gt; Python binding through speech dispatcher&lt;/li&gt;
&lt;li&gt;Improved &lt;a href=&quot;http://dhvani.sourceforge.net/doc/langauge-detection.html&quot; rel=&quot;nofollow&quot;&gt;language detection algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
Dhvani documentation is available &lt;a href=&quot;http://dhvani.sourceforge.net/doc&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.

Binary packages and source code are available &lt;a href=&quot;http://sourceforge.net/projects/dhvani&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;
&lt;br&gt;
&lt;b&gt;Thanks&lt;/b&gt;
&lt;ul&gt;
&lt;li&gt;Rahul Bhalerao for Marathi module and patches&lt;/li&gt;
&lt;li&gt;Zabeehkhan for Pashto Module&lt;/li&gt;
&lt;li&gt;Nirupama, CDAC Chennai and CDAC Noida people for testing and reporting bugs&lt;/li&gt;
&lt;li&gt;NRCFOSS Chennai, Krishnakanth Mane and many others  for feedbacks &lt;/li&gt;
&lt;li&gt; &lt;a href=&quot;http://www.amidasimputer.com/&quot; rel=&quot;nofollow&quot;&gt;Amida Simputer&lt;/a&gt; team for  patches on Telugu module especially the Telugu number reading logic &lt;/li&gt;
&lt;li&gt; Debayan and Roshan for testing and informing problems&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;
There was  good amount of code change in this version. Still there are many improvements to do in language modules and synthesizer. Some of the language modules requires developers who speak that language. Syntheziser got some improvements and require some amount of research to make the speech more natural. So your feedbacks, suggestions, bug reports and patches are valuable. &lt;/p&gt;
&lt;p&gt;
PS: A note for quick usage after installation from binary: After installing deb or rpm, Open gedit, edit-&amp;gt;preferences-&amp;gt;plugins, enable external tools. Dhvani will be available as a plugin there. Select some text in any of the supporting languages and click the Dhvani menu.
&lt;/p&gt;&lt;/a&gt;</description>
  <comments>http://santhoshtr.livejournal.com/14154.html</comments>
  <category>dhvani</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/13832.html</guid>
  <pubDate>Thu, 13 Nov 2008 16:15:06 GMT</pubDate>
  <title>Language Detection and Spellcheckers</title>
  <link>http://santhoshtr.livejournal.com/13832.html</link>
  <description>A few weeks back there was a discussion on #indlinux IRC channel about automatic language detection. The idea is, spellcheckers or any language tools should not ask the users to select a language. Instead,  they should detect the language automatically. The idea is not new. There is a KDE bug &lt;a href=&quot;http://bugs.kde.org/show_bug.cgi?id=66516&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;and Ubuntu has this as an &lt;a href=&quot;http://brainstorm.ubuntu.com/idea/10469/&quot; rel=&quot;nofollow&quot;&gt;brainstorm idea&lt;/a&gt;.  It seems M$ word already &lt;a href=&quot;http://help.lockergnome.com/office/SpellCheck-Detect-language-automatically-working-ftopict879615.html&quot; rel=&quot;nofollow&quot;&gt;have&lt;/a&gt; &lt;a href=&quot;http://www.pcreview.co.uk/forums/thread-887637.php&quot; rel=&quot;nofollow&quot;&gt; this&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A sample use case can be this: &quot;While preparing a document in Openoffice, I want to write in English as well as in Hindi. For doing spellcheck, I need to manually change the language rather than the application detect it automatically&quot;&lt;br /&gt; &lt;br /&gt;Regarding the algorithm behind automatic language detection, there are many approaches. Statistical approaches are effective for languages sharing same script(For eg: languages which use latin script or Hindi and Marathi). N-gram based methods are used in statistical approach.  &lt;a href=&quot;http://www.freepatentsonline.com/6167369.html&quot; rel=&quot;nofollow&quot;&gt;Here is a &apos;patented&apos; idea&lt;/a&gt; . And &lt;a href=&quot;http://code.activestate.com/recipes/326576/&quot; rel=&quot;nofollow&quot;&gt;this page&lt;/a&gt; explains a character trigram approach. Google has a language detection service(&lt;a href=&apos;http://www.google.com/uds/samples/language/detect.html&apos; rel=&apos;nofollow&apos;&gt;http://www.google.com/uds/samples/language/detect.html&lt;/a&gt;) and it &lt;a href=&quot;http://sourceforge.net/mailarchive/forum.php?thread_name=992b8210810041028r2391e433he05e2c7ccfc50f21%40mail.gmail.com&amp;amp;forum_name=indlinux-group&quot; rel=&quot;nofollow&quot;&gt;seems it is  still&lt;/a&gt; in development or &apos;learning stage&apos;.&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Here is an example of statistical language detection: &lt;a href=&quot;http://languid.cantbedone.org/&quot; rel=&quot;nofollow&quot;&gt;languid&lt;/a&gt;(It did not work for me when I tried, But you can download the source code and check)&lt;br /&gt; &lt;br /&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Sonnet_(KDE)&quot; rel=&quot;nofollow&quot;&gt;Sonnet&lt;/a&gt; is the spellchecker framework of KDE written by  &lt;a href=&quot;http://blog.jacobrideout.net&quot; rel=&quot;nofollow&quot;&gt;J. Rideout&lt;/a&gt;. It is also trying to provide the language detection feature. &lt;a href=&quot;http://www.linux.com/articles/59963&quot; rel=&quot;nofollow&quot;&gt;Here is an old article&lt;/a&gt; in linux.com about that. It is based on n-gram based text categorization and is a port of &lt;a href=&quot;http://languid.cantbedone.org/&quot; rel=&quot;nofollow&quot;&gt;languid&lt;/a&gt;. From the article:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;A gram is a segment of text made of N number of characters. Sonnet uses trigrams, made from three characters. By analyzing the popularity of any given trigram within a text, one may make assumptions about the language the text is written in. Rideout gives an example: &quot;The top trigram for our English model is &apos;_th&apos; and for Spanish &apos;_de&apos;. Therefore, if the text contains many words that start with &apos;th&apos; and no words that start with &apos;de,&apos; it is more likely the text is in English [than Spanish]. Additionally, there are several optimizations which include only checking the language against languages with similar scripts and some heuristics that use the language of neighboring text as a hint.&quot;&lt;/blockquote&gt;&lt;br /&gt; &lt;br /&gt;(I tried sonnet and could not get it working for ml_IN. Instead of words, it was iterating through letters. Anyway I will check this problem later.)&lt;br /&gt;&lt;br /&gt;As far as Indian languages are concerned, Unicode code range based language detection will work for most of the cases. Most of the languages has its own script and Unicode code point range. For example, detecting Malayalam is a matter of checking the letters are in the Malayalam Unicode range. But for Devanagari script it is not straight forward. Hindi , Marathi etc use Devanagari script. Dhvani, the text to speech system for Indian languages use a simple algorithm for language detection(&lt;a href=&apos;http://dhvani.sourceforge.net/doc/language-detection.html&apos; rel=&apos;nofollow&apos;&gt;http://dhvani.sourceforge.net/doc/language-detection.html&lt;/a&gt;). There the Hindi and Marathi is identified by giving a priority for LANG environment variable. But it will fail if somebody try to use Marathi in an English desktop(Users can specify the language to be used – In that case language detection will not be done.).&lt;br /&gt;&lt;br /&gt;In the case of spell checkers other than LANG environment variable there are other options. When you type in gedit or any text editors, detecting the keyboard layout will be one way of detecting the language. But it depends which IME the users uses. It can be xkb or scim or even a copy-paste.&lt;br /&gt; &lt;br /&gt;Anyway, it is pretty clear that the current natural language features in the free  desktops requires more improvements.  Based on a discussion we had in #indlinux IRC, we had setup a &lt;a href=&quot;http://www.indlinux.org/wiki/index.php/LanguageNeutralInterfaces&quot; rel=&quot;nofollow&quot;&gt;wiki page&lt;/a&gt; here to discuss on this.&lt;br /&gt;&lt;br /&gt;As a &lt;a href=&quot;http://en.wikipedia.org/wiki/Proof_of_concept&quot; rel=&quot;nofollow&quot;&gt;proof of concept&lt;/a&gt;, I tried to write a spellchecker for Gedit texteditor with language detection for Indian languages. Basically it uses Unicode character range. It is a gedit plugin written in python. And it uses &lt;a href=&quot;http://pyenchant.sourceforge.net/&quot; rel=&quot;nofollow&quot;&gt;pyenchant&lt;/a&gt; spellcheck wrapper library. Install python-enchant using your package manager if it is not already installed. Download the &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/misc/gedit-plugin/ISpellcheck.gedit-plugin&quot; rel=&quot;nofollow&quot;&gt;plugin&lt;/a&gt; and &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/misc/gedit-plugin/ISpellcheck.py&quot; rel=&quot;nofollow&quot;&gt;python module&lt;/a&gt; to ~/.gnome2/gedit/plugins folder and restart gedit. Enable external tools and new Spellchecker plugin in edit-&amp;gt;preferences-&amp;gt;plugins. It does not have the pango error style underline or suggestions in context menu as of now. It just prints the results and suggestions in the console of gedit.  And ‘Add to Dictionary’ etc are not there now.&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000qtc3/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000qtc3&quot; width=&quot;50%&quot; height=&quot;50%&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;I would like to request interested developers to come forward and make this feature ready to use in free desktops. Suggestions are welcome. We need good algorithms for detecting the language too.&lt;br /&gt;A sample use case: &quot;System locale is English and I am typing a document in Hindi and want to write some Marathi sentences in between. Without manually changing the language, system detect the language of each word and check the spelling against corresponding dictionaries.&quot;&lt;br /&gt;&lt;br /&gt;PS: Because of the inflectional and agglutinative nature of some of the Indian languages, the spell checking is not at all effective. I will write on that later.</description>
  <comments>http://santhoshtr.livejournal.com/13832.html</comments>
  <category>spell checker</category>
  <category>language computing</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/13815.html</guid>
  <pubDate>Tue, 11 Nov 2008 17:26:01 GMT</pubDate>
  <title>Gedit plugin for showing unicode codepoints</title>
  <link>http://santhoshtr.livejournal.com/13815.html</link>
  <description>While working with Unicode text, it is often required to get the Unicode code points of text for debugging. Using python, it is very easy to get the unicode codepoints of the text. Following examples illustrates it.&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; &quot;സന്തോഷ്&quot;.decode(&quot;utf-8&quot;)&lt;br /&gt;u&apos;\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d&apos;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;or&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; str=u&quot;സന്തോഷ്&quot;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; print repr(str)&lt;br /&gt;u&apos;\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d&apos;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;Well, But we need to take python console and type/paste the text etc..How can we make it more easy? What if pressing F12 key after selecting some text  gives the codepoints?&lt;br /&gt;So I wrote a plugin for gedit. I never knew that writing a gedit plugin is too easy. &lt;a href=&quot;http://live.gnome.org/Gedit/PythonPluginHowTo&quot; rel=&quot;nofollow&quot;&gt;This tutorial&lt;/a&gt; gives all the required information.&lt;br /&gt;Download the &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/misc/gedit-plugin/show_codepoints.gedit-plugin&quot; rel=&quot;nofollow&quot;&gt;plugin file&lt;/a&gt; and &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/misc/gedit-plugin/show_codepoints.py&quot; rel=&quot;nofollow&quot;&gt;python module&lt;/a&gt; and place it in .gnome2/gedit/plugins folder inside your home folder. And restart gedit. Enable the plugin from Edit-&amp;gt;Preferences-&amp;gt;Plugins menu. Note that you need to enable the External tools plugin too.&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000pqq0/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000pqq0/s320x240&quot; width=&quot;236&quot; height=&quot;240&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Select some text and press F12. If text is not selected, entire content of the document will be used.</description>
  <comments>http://santhoshtr.livejournal.com/13815.html</comments>
  <category>plugin</category>
  <category>gedit</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/13439.html</guid>
  <pubDate>Mon, 27 Oct 2008 11:17:26 GMT</pubDate>
  <title>Screensavers in your language</title>
  <link>http://santhoshtr.livejournal.com/13439.html</link>
  <description>I had written a blog post about &lt;a href=&quot;http://santhoshtr.livejournal.com/7078.html&quot;&gt;hacking the glmatrix screensaver with the glyphs of our languages&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Now I have those screensavers in the following languages: &lt;br /&gt;&lt;br /&gt;Hindi :  &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/hindi-matrix_2.18.1.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; ,&lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/hindi-matrix-2.18.1-2.noarch.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Gujarati :  &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/gujarati-matrix_2.22.1-2_all.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; , &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/gumatrix-2.24.1-2.i386.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Bengali  : &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/bengali-matrix_2.22.1-2_all.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; , &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/bengali-matrix-2.22.1-3.noarch.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Oriya: &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/oriya-matrix_2.22.1-2_all.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; , &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/oriya-matrix-2.22.1-3.noarch.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Tamil : &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/ta-matrix_2.20.1.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; ,  &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/tamil-matrix-2.20.1-2.noarch.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Malayalam: &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/mlmatrix_2.22.1.deb&quot; rel=&quot;nofollow&quot;&gt;Deb Package&lt;/a&gt; ,  &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Screensaver/mlmatrix-2.22.1-2.noarch.rpm&quot; rel=&quot;nofollow&quot;&gt; RPM &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Try it and enjoy !!&lt;br /&gt;ps: I used the default fonts of Fedora 9 for these. If you have any specific font to be used please let me know. I used Dyuthi calligraphic font for Malayalam.</description>
  <comments>http://santhoshtr.livejournal.com/13439.html</comments>
  <category>screensaver</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/13149.html</guid>
  <pubDate>Mon, 27 Oct 2008 07:31:00 GMT</pubDate>
  <title>Swanalekha M17N based Input Method for 11 Languages</title>
  <link>http://santhoshtr.livejournal.com/13149.html</link>
  <description>Swanalekha is an Input method originally designed for Malayalam.  It is works with &lt;a href=&quot;http://sourceforge.net/projects/scim&quot; rel=&quot;nofollow&quot;&gt;scim&lt;/a&gt;. as well as &lt;a href=&quot;http://m17n.org/&quot; rel=&quot;nofollow&quot;&gt;m17n&lt;/a&gt;. The input method scheme is transliteration based and it has a unique feature of candidate list menu(which I will explain shortly). Now I have extended it to 10 other Indian languages.&lt;br /&gt;&lt;br /&gt;Before explaining how swanalekha is different from other phonetic/transliteration based input methods, let me explain some of the characteristics of transliteration. Transliteration based input methods were following a strict one to one mapping from english letters to another Indian language. For eg:  The ka=क ,pa = प  , ti = टि etc.. when you write bharath, you will easily transliterate it to hindi as भारत. But for a rule  based transliteration system it is भरत unless the english is bhaarath. Some times it may be Bhaarat too.. See another example: Kartik. it should be transliterated to കാര്‍ത്തിക് in Malayalam. So some people write it as Karthik, and some others write it as karthick too. All these are based on personal preferences. But when it use transliteration based input methods,  people find difficulty with using a strict rule based writing method. There they have to write kaa for കാ or કા or கா  or  কা.  Users like to get what they mean without the difficulty following the strict rules of transliteration. In an Intelligent transliteration based system when somebody write linux they should be able to map it to लिनक्स . Some times a choice to select लैनक्स is also preferable. This is what &lt;a href=&quot;http://www.google.co.in/transliterate/indic&quot; rel=&quot;nofollow&quot;&gt;google transliteration&lt;/a&gt; does.  No rules, no learning.. just type in english...&lt;br /&gt;&lt;br /&gt;Google&apos;s Transliteration is based on machine learning and statistical approach. And it works only when we are online and only in webpages. Now I will explain how swanalekha tries to provide a solution for the above problem.&lt;br /&gt;For each english letter or pattern , we saw that there are multiple choices . ka can be क, का . ga can be ക, ഗ, ഖ, ഘ, ഗാ in Malayalam.  sa can be स, श etc.. So swanalekha provides all these candidates as a suggestion menu under the cursor while typing. See the below image of Hindi swanalekha version.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://pics.livejournal.com/santhoshtr/pic/0000kfp7/&quot;&gt;&lt;img src=&quot;http://pics.livejournal.com/santhoshtr/pic/0000kfp7/s320x240&quot; width=&quot;223&quot; height=&quot;240&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The differences between google transliteration and swanalekha are:&lt;br /&gt;a) Google transliterate is web based and works in webpages when you are online. Swanalekha works in all applications in your GNU/Linux desktop such as gedit, openoffice, kwrite, firefox...&lt;br /&gt;b) Google transliterate gives suggestions as words, but swanalekha works in letter level (not exactly a single letter. but like का, કા etc. )&lt;br /&gt;c) Google transliterate is machine learning based. But swanalekha is rule based with &apos;one to many&apos; pattern mapping in m17n&lt;br /&gt;  &lt;br /&gt;The candidates are mapped to English string patterns inside the source code- the m17n input method files - .mim files. &lt;br /&gt;You can download the .mim files from &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Swanalekha/m17n/swanalekha-m17n-04.tar.gz&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;. Icons for each language is also provided. You can see .mim files for Malayalam, Hindi, Telugu, Oriya, Tamil, Bengali, Assamese, Panjabi, Gujarati, Marathi and Kannada.  Note that other than Malayalam all other source files are not complete. They are generated using a small python script from Malayalam mapping file.  They are just templates with approximate mapping. And should be corrected and modified by a person who know that language very well. Malayalam mapping is tested and it is already packaged for Fedora and already present in m17n upstream as part of m17n-contrib package. It is widely used by GNU/Linux users in Kerala too. &lt;br /&gt;Candidate selection based Input methods are very common in CJK(Chinese, Japanese, Korean) languages. Swanalekha is first implementation of candidate list outside CJK using scim and m17n.&lt;br /&gt;&lt;br /&gt;So if anybody is interested in testing and correcting the mappings for your language, please continue reading :)&lt;br /&gt;&lt;br /&gt;How to Install :&lt;br /&gt;download the tar ball containing all .mim files and icons from &lt;a href=&quot;http://download.savannah.gnu.org/releases/smc/Swanalekha/m17n/swanalekha-m17n-04.tar.gz&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;. Extract it and copy all .mim to /usr/share/m17n&lt;br /&gt;sudo cp *.mim /usr/share/m17n&lt;br /&gt;sudo cp *.png /usr/share/m17n/icons&lt;br /&gt;&lt;br /&gt;Note that you need to install scim-m17n before doing this. Most of the distros will have it pre installed&lt;br /&gt;After copying these , restart your X by pressing alt+ctrl+del or do a logout+login &lt;br /&gt;Open gedit, select input method as scim, and select your language from the scim menu.  Start typing&lt;br /&gt;&lt;br /&gt;How to correct the maps?&lt;br /&gt;Open the .mim file for your language using any text editor.&lt;br /&gt;You will see lines in lisp syntax. No, You need not know Lisp :)&lt;br /&gt;For example in hi-swanalekha.mim, you will see a line like this&lt;br /&gt;  (&quot;sa&quot; ((&quot;स&quot;) (&quot;श&quot;)))&lt;br /&gt;This means, for &apos;sa&apos;, show स  and श as candidates with स  as default option. If you want to add सा as third option just change the line like this&lt;br /&gt;  (&quot;sa&quot; ((&quot;स&quot;) (&quot;श&quot;) (&quot;सा&quot;)))&lt;br /&gt;If any pattern is not found in .mim file just add one more line there following the above syntax. Only thing is you should be careful about opening and closing parenthesis since it is Lisp.&lt;br /&gt;&lt;br /&gt;Once you are done, install it by just copying it to /usr/share/m17n folder. Restarting X is required to restart scim. or even a &apos;killall scim&apos; will do sometimes&lt;br /&gt;Don&apos;t change any other code(code for candidate selection using up/down arrow, and using number keys) unless you know what you are doing.&lt;br /&gt;&lt;br /&gt;Let me know if you face any issues..&lt;br /&gt;&lt;br /&gt;Happy Hacking and Happy Deepavali !!!</description>
  <comments>http://santhoshtr.livejournal.com/13149.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/13055.html</guid>
  <pubDate>Thu, 04 Sep 2008 16:38:05 GMT</pubDate>
  <title>Geo-visualisation, the FOSS way</title>
  <link>http://santhoshtr.livejournal.com/13055.html</link>
  <description>My friend Jaisen Nedumpala has been developing a Geo-visualisation system for &lt;a href=&quot;http://cheruvannur.web4all.in/&quot; rel=&quot;nofollow&quot;&gt;Cheruvannoor Grama Panchayath(Page in ml_IN)&lt;/a&gt; of Kerala. The system, developed using FOSS tools is available &lt;a href=&quot;http://cheruvannur.web4all.in/resources/&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;

&lt;blockquote&gt;
&quot;Development of effective geo-visualisation based decision support system (DSS) involved primarily data compilation from collateral sources, setting up appropriate hardware configuration, design of database and design of  a spatial DSS. &quot;
&lt;/blockquote&gt;

Jaisen used softwares like GRASS, UMN MapServer and ka-Map. He has written a detailed &lt;a href=&quot;http://cheruvannur.web4all.in/visualisation_methodology/&quot; rel=&quot;nofollow&quot;&gt;documentation(English)&lt;/a&gt; on how he developed this and what are all the tools used.</description>
  <comments>http://santhoshtr.livejournal.com/13055.html</comments>
  <category>foss</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/12655.html</guid>
  <pubDate>Sun, 31 Aug 2008 15:56:21 GMT</pubDate>
  <title>UTF8Decoder</title>
  <link>http://santhoshtr.livejournal.com/12655.html</link>
  <description>&lt;a href=&quot;http://zabeehkhan.blogspot.com&quot; rel=&quot;nofollow&quot;&gt;zabeehkhan&lt;/a&gt; was trying to code a Pashto (ps_AF) module for &lt;a href=&quot;http://dhvani.sourceforge.net&quot; rel=&quot;nofollow&quot;&gt;dhvani&lt;/a&gt;. And he told me that &quot;it is not saying anything&quot; :). So I took the code and found the problem. Dhvani has a UTF-8 decoder and UTF-16 converter. It was written by Dr. Ramesh Hariharan and was tested only with the unicode range of the languages in India. It was buggy for most of the other languages and there by the language detection logic and text parsing logic was failing. So I did some googling, went through the code tables of gucharmap and got some helpful information from &lt;a href=&quot;http://www.cl.cam.ac.uk/~mgk25/unicode.html&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://blogs.oreilly.com/digitalmedia/2005/11/using-c-intrinsic-functions-2.html&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;
&lt;br /&gt;
So here is my new UTF8Decoder and converter
&lt;br /&gt;


&lt;pre&gt;
&lt;font color=&quot;#444444&quot;&gt;/*
UTF8Decoder.c
This program converts a utf-8 encoded string to utf-16 hexadecimal code sequence

UTF-8 is a variable-width encoding of Unicode.
UTF-16 is a fixed width encoding of two bytes

A UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to
encode a character. For example, the character U+000A (line feed) must be accepted from
a UTF-8 stream only in the form 0x0A, but not in any of the following five possible overlong forms:

  0xC0 0x8A
  0xE0 0x80 0x8A
  0xF0 0x80 0x80 0x8A
  0xF8 0x80 0x80 0x80 0x8A
  0xFC 0x80 0x80 0x80 0x80 0x8A

Ref: UTF-8 and Unicode FAQ for Unix/Linux http://www.cl.cam.ac.uk/~mgk25/unicode.html

Author: Santhosh Thottingal &amp;lt;santhosh.thottingal at gmail.com&amp;gt;
License: This program is licensed under GPLv3 or later version(at your choice)
*/&lt;/font&gt;
&lt;font color=&quot;0000ff&quot;&gt;&lt;strong&gt;#include&lt;font color=&quot;#008000&quot;&gt;&amp;lt;stdlib.h&amp;gt;&lt;/font&gt;&lt;/strong&gt;&lt;/font&gt;
&lt;font color=&quot;0000ff&quot;&gt;&lt;strong&gt;#include&lt;font color=&quot;#008000&quot;&gt;&amp;lt;stdio.h&amp;gt;&lt;/font&gt;&lt;/strong&gt;&lt;/font&gt;
&lt;font color=&quot;0000ff&quot;&gt;&lt;strong&gt;#include&lt;font color=&quot;#008000&quot;&gt;&amp;lt;string.h&amp;gt;&lt;/font&gt;&lt;/strong&gt;&lt;/font&gt;
&lt;strong&gt;unsigned&lt;/strong&gt; &lt;strong&gt;short&lt;/strong&gt;
&lt;font color=&quot;#2040a0&quot;&gt;utf8_to_utf16&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;strong&gt;unsigned&lt;/strong&gt; &lt;strong&gt;char&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;, &lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;

  &lt;strong&gt;unsigned&lt;/strong&gt; &lt;strong&gt;short&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;		&lt;font color=&quot;#444444&quot;&gt;/*utf-16 character */&lt;/font&gt;
  &lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;i&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
  &lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
  &lt;strong&gt;if&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0x80&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;	&lt;font color=&quot;#444444&quot;&gt;/*ascii character till 128 */&lt;/font&gt;
    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
      &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
      &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;+&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;+&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;
  &lt;strong&gt;else&lt;/strong&gt; &lt;strong&gt;if&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;gt;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;&amp;gt;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;7&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
      &lt;strong&gt;if&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0xE0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0x1F&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;1&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;
      &lt;strong&gt;else&lt;/strong&gt; &lt;strong&gt;if&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0xF8&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0x07&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;3&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;

      &lt;strong&gt;for&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;trailing&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;-&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;-&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
	  &lt;strong&gt;if&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;+&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;+&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0xC0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;!&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0x80&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
	    &lt;strong&gt;break&lt;/strong&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;6&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	  &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;|&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;text&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;ptr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0x3F&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
	&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;

    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;
  &lt;strong&gt;return&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;c&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;

&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;


&lt;font color=&quot;#444444&quot;&gt;/* for testing */&lt;/font&gt;
&lt;strong&gt;int&lt;/strong&gt;
&lt;font color=&quot;#2040a0&quot;&gt;main&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
  &lt;strong&gt;char&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;instr&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#008000&quot;&gt;&amp;quot;സന്തോഷ് തോട്ടിങ്ങല്‍&amp;quot;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;	&lt;font color=&quot;#444444&quot;&gt;/* my name :) */&lt;/font&gt;
  &lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;length&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;strlen&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;instr&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
  &lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;i&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;

  &lt;strong&gt;for&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;i&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&amp;lt;&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;length&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;
    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
      &lt;font color=&quot;#2040a0&quot;&gt;printf&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#008000&quot;&gt;&amp;quot;0x%.4x &amp;quot;&lt;/font&gt;, &lt;font color=&quot;#2040a0&quot;&gt;utf8_to_utf16&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;instr&lt;/font&gt;, &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;i&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;
  &lt;font color=&quot;#2040a0&quot;&gt;printf&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#008000&quot;&gt;&amp;quot;&lt;font color=&quot;#77dd77&quot;&gt;\n&lt;/font&gt;&amp;quot;&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
&lt;font color=&quot;#444444&quot;&gt;/* output is:
0x0d38 0x0d28 0x0d4d 0x0d24 0x0d4b 0x0d37 0x0d4d 0x0020 0x0d24 0x0d4b 0x0d1f 0x0d4d 0x0d1f 0x0d3f 0x0d19 0x0d4d 0x0d19 0x0d32 0x0d4d 0x200d 
*/&lt;/font&gt;

  &lt;strong&gt;return&lt;/strong&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;

&lt;/pre&gt;

There may be already existing libraries for this, but writing a simple one  ourself is fun and good learning experience.

For example, in python, to get the UTF-16 code sequence for a unicode string, we can use this:&lt;br /&gt;
&lt;code&gt;
str=u&quot;സന്തോഷ്‌&quot;&lt;br /&gt;
print repr(str)
&lt;/code&gt;&lt;br /&gt;
This gives the following output&lt;br /&gt;
&lt;code&gt;
u&apos;\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d&apos;
&lt;/code&gt;&lt;br /&gt;</description>
  <comments>http://santhoshtr.livejournal.com/12655.html</comments>
  <category>dhvani</category>
  <category>unicode</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/12483.html</guid>
  <pubDate>Tue, 19 Aug 2008 15:12:52 GMT</pubDate>
  <title>Say NO to Software Patents</title>
  <link>http://santhoshtr.livejournal.com/12483.html</link>
  <description>&lt;div align=&quot;center&quot;&gt;&lt;a href=&quot;http://fci.wikia.com/wiki/Say_No_To_Software_Patents#Candle_Light_Vigil&quot; rel=&quot;nofollow&quot;&gt;&lt;img alt=&quot;No Patent for Softwares&quot; src=&quot;http://images.wikia.com/fci/images/0/04/No.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;</description>
  <comments>http://santhoshtr.livejournal.com/12483.html</comments>
  <category>patents</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/12092.html</guid>
  <pubDate>Sat, 26 Jul 2008 16:33:31 GMT</pubDate>
  <title>say_namaskaar.c</title>
  <link>http://santhoshtr.livejournal.com/12092.html</link>
  <description>&lt;pre&gt;
&lt;font color=&quot;#444444&quot;&gt;/* say_namaskaar.c
 *  This is a sample C code using dhvani text to speech API which I am 
 *  developing now and planning to release soon. New version of dhvani 
 *  will provide a shared library libdhvani and it allows other C or C++
 *  applications to use dhvani synthesizer. Tamil and Marathi modules, pitch, tempo 
 *  control etc are the features for the coming release.
 *  I need to prepare documentation, fix many bugs, test, commit the files in cvs ...
 *  Looking for some free time for all these...
 *  Visit &lt;a href=&quot;http://dhvani.sourceforge.net&quot; rel=&quot;nofollow&quot;&gt;http://dhvani.sourceforge.net&lt;/a&gt;
 */&lt;/font&gt;

&lt;font color=&quot;#444444&quot;&gt;/* compile with gcc -ldhvani -o namaskaar say_namaskaar.c */&lt;/font&gt;
&lt;font color=&quot;0000ff&quot;&gt;&lt;strong&gt;#include &lt;font color=&quot;#008000&quot;&gt;&amp;lt;dhvani/dhvani_lib.h&amp;gt;&lt;/font&gt;&lt;/strong&gt;&lt;/font&gt;
&lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;main&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;strong&gt;int&lt;/strong&gt; &lt;font color=&quot;#2040a0&quot;&gt;argc&lt;/font&gt;, &lt;strong&gt;char&lt;/strong&gt; &lt;font color=&quot;4444FF&quot;&gt;*&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;argv&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;[&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;]&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;{&lt;/strong&gt;&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;dhvani_options&lt;/font&gt; &lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;font color=&quot;#444444&quot;&gt;/* Set the pitch and tempo of the speech */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;.&lt;font color=&quot;#2040a0&quot;&gt;tempo&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;-&lt;/font&gt;&lt;font color=&quot;#FF0000&quot;&gt;10.0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt; &lt;font color=&quot;#444444&quot;&gt;/* reduce the speed by 10%  */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;.&lt;font color=&quot;#2040a0&quot;&gt;pitch&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;2.0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;    &lt;font color=&quot;#444444&quot;&gt;/* increase the pitch b 2 semitons */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;.&lt;font color=&quot;#2040a0&quot;&gt;rate&lt;/font&gt; &lt;font color=&quot;4444FF&quot;&gt;=&lt;/font&gt; &lt;font color=&quot;#FF0000&quot;&gt;16000&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;  &lt;font color=&quot;#444444&quot;&gt;/* 16KHz Sampling rate */&lt;/font&gt;
    &lt;font color=&quot;#444444&quot;&gt;/* Initialize dhvani */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;dhvani_init&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;font color=&quot;#444444&quot;&gt;/* Say Namaskar */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;dhvani_say&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#008000&quot;&gt;&amp;quot;नमसकार&amp;quot;&lt;/font&gt;,  &lt;font color=&quot;4444FF&quot;&gt;&amp;amp;&lt;/font&gt;&lt;font color=&quot;#2040a0&quot;&gt;options&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;font color=&quot;#444444&quot;&gt;/* close the synthesizer */&lt;/font&gt;
    &lt;font color=&quot;#2040a0&quot;&gt;dhvani_close&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;)&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
    &lt;strong&gt;return&lt;/strong&gt; &lt;font color=&quot;#FF0000&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;4444FF&quot;&gt;;&lt;/font&gt;
&lt;font color=&quot;4444FF&quot;&gt;&lt;strong&gt;}&lt;/strong&gt;&lt;/font&gt;
 &lt;font color=&quot;#444444&quot;&gt;
/*  We can write a blog post in C too :P . Syntax highlighted by &lt;a href=&quot;http://www.palfrader.org/code2html&quot; rel=&quot;nofollow&quot;&gt;Code2HTML&lt;/a&gt; */&lt;/font&gt;
&lt;/pre&gt;</description>
  <comments>http://santhoshtr.livejournal.com/12092.html</comments>
  <category>dhvani</category>
  <category>hack</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/11935.html</guid>
  <pubDate>Tue, 24 Jun 2008 16:48:31 GMT</pubDate>
  <title>Dhvani Now Speaks Marathi</title>
  <link>http://santhoshtr.livejournal.com/11935.html</link>
  <description>Thanks to &lt;a href=&quot;http://rahulpmb.blogspot.com&quot; rel=&quot;nofollow&quot;&gt;Rahul Bhalerao&lt;/a&gt; , he wrote the Marathi module for &lt;a href=&quot;http://dhvani.sourceforge.net&quot; rel=&quot;nofollow&quot;&gt;dhvani&lt;/a&gt;- The Indian Language Text to speech System. Dhvani can speak 10 Indian languages now:  Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi,  Oriya, Panjabi, Tamil, and Telugu.&lt;br /&gt;&lt;br /&gt;Rahul also gave some patches for hindi module and for some other bugs. The code is available in CVS.&lt;br /&gt;&lt;br /&gt;The automatic language detection algorithm will not work for Marathi since it uses the devanagari script and I have assigned the unicode range used for language detection to Hindi. So it requires a langauge switch like &quot;dhvani -l mr inputfile&quot;&lt;br /&gt;&lt;br /&gt;Many new features for dhvani are in development, incluiding pitch and tempo control of the generated speech. And I am trying to improve the code quality too.&lt;br /&gt;&lt;br /&gt;I had demonstrated the tamil module at NRCFOSS, AU-KBC centre, chennai a few days back and &lt;a href=&quot;http://amachu.net&quot; rel=&quot;nofollow&quot;&gt;Amachu&lt;/a&gt; offered help for improving the tamil pronunciaton rules.&lt;br /&gt;&lt;br /&gt;For those are interested in Marathi module, I have some sample speech files generated by dhvani in ogg format. The text is taken from an article about Marathi langauge in Marathi wikipedia. &lt;a href=&quot;http://mr.wikipedia.org/wiki/मराठी_भाषा&quot; rel=&quot;nofollow&quot;&gt;Here&lt;/a&gt; is the article and &lt;a href=&quot;http://santhosh00.googlepages.com/marathi.txt&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; is the exact text used for the speech&lt;br /&gt;1. &lt;a href=&quot;http://santhosh00.googlepages.com/marathi.ogg&quot; rel=&quot;nofollow&quot;&gt;With default pitch and tempo- Male voice&lt;/a&gt;&lt;br /&gt;2. &lt;a href=&quot;http://santhosh00.googlepages.com/marathi-pitch4.ogg&quot; rel=&quot;nofollow&quot;&gt;Female voice by positive pitch shift- A feature in development&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Dhvani has an IRC channel now: #dhvani at freenode</description>
  <comments>http://santhoshtr.livejournal.com/11935.html</comments>
  <category>dhvani</category>
  <lj:mood>creative</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/11701.html</guid>
  <pubDate>Mon, 02 Jun 2008 16:51:43 GMT</pubDate>
  <title>Canonical Equivalence in Unicode: Some notes</title>
  <link>http://santhoshtr.livejournal.com/11701.html</link>
  <description>Some Notes on Canonical Equivalence in Unicode:&lt;br /&gt;
Unicode defines canonical equivalence as follows:
From &lt;a href=&quot;http://unicode.org/reports/tr15/#Canonical_Equivalence&quot; rel=&quot;nofollow&quot;&gt;UAX #15&lt;/a&gt;

&lt;blockquote&gt;
Canonical Equivalence&lt;br /&gt;
This section describes the relationship of normalization to respecting (or preserving) canonical equivalence. A process (or function) respects canonical equivalence when canonical-equivalent inputs always produce canonical-equivalent outputs. For a function that transforms one string into another, this may also be called preserving canonical equivalence. There are a number of important aspects to this concept:
&lt;br /&gt;&lt;ol&gt;
   &lt;li&gt; The outputs are not required to be identical, only canonically equivalent.&lt;/li&gt;
   &lt;li&gt;Not all processes are required to respect canonical equivalence. For example:&lt;ul&gt;
          &lt;li&gt;A function that collects a set of the General_Category values present in a string will and should produce a different value for &amp;lt; angstrom sign, semicolon &amp;gt; than for &amp;lt; A, combining ring above, greek question mark &amp;gt;, even though they are canonically equivalent.&lt;/li&gt;
          &lt;li&gt; A function that does a binary comparison of strings will also find these two sequences different.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
   &lt;li&gt;Higher-level processes that transform or compare strings, or that perform other higher-level functions, must respect canonical equivalence or problems will result.&lt;/li&gt;&lt;/ol&gt;
&lt;/blockquote&gt;
One good example for this is 
Ά = U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS and its canonical decomposition is defined as U+0391 GREEK CAPITAL LETTER ALPHA + U+0301 COMBINING ACUTE ACCENT = Α +  ‍́ 

Following are the defined canonically equivalent unicode sequences for some Indic Languages
&lt;br /&gt;
&lt;b&gt;Bengali:&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt; U+09CB BENGALI VOWEL SIGN O = U+09C7 BENGALI VOWEL SIGN E + U+09BE BENGALI VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt; U+09CC BENGALI VOWEL SIGN AU = U+09C7 BENGALI VOWEL SIGN E + U+09D7 BENGALI AU LENGTH MARK &lt;/li&gt;
&lt;li&gt;U+09DC BENGALI LETTER RRA =  U+09A1 BENGALI LETTER DDA + U+09BC BENGALI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+09DD BENGALI LETTER RHA = U+09A2 BENGALI LETTER DDHA + U+09BC BENGALI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+09DF BENGALI LETTER YYA = U+09AF BENGALI LETTER YA + U+09BC BENGALI SIGN NUKTA&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;Devanagari&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt;U+0929 DEVANAGARI LETTER NNNA= U+0928 DEVANAGARI LETTER NA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt; U+0931 DEVANAGARI LETTER RRA = U+0930 DEVANAGARI LETTER RA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0934 DEVANAGARI LETTER LLLA = U+0933 DEVANAGARI LETTER LLA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0958 DEVANAGARI LETTER QA = U+0915 DEVANAGARI LETTER KA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0959 DEVANAGARI LETTER KHHA = U+0916 DEVANAGARI LETTER KHA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095A DEVANAGARI LETTER GHHA = U+0917 DEVANAGARI LETTER GA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095B DEVANAGARI LETTER ZA =  U+091C DEVANAGARI LETTER JA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095C DEVANAGARI LETTER DDDHA =  U+0921 DEVANAGARI LETTER DDA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095D DEVANAGARI LETTER RHA =  U+0922 DEVANAGARI LETTER DDHA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095E DEVANAGARI LETTER FA = U+092B DEVANAGARI LETTER PHA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+095F DEVANAGARI LETTER YYA = U+092F DEVANAGARI LETTER YA + U+093C DEVANAGARI SIGN NUKTA&lt;/li&gt;
&lt;/ol&gt;
&lt;i&gt;(Note:  I saw the  ॻ U+097B DEVANAGARI LETTER GGA, ॼ U+097C DEVANAGARI LETTER JJA , and ॾ U+097E DEVANAGARI LETTER DDDA in Debanagari. I am not sure where these letters will be used and whether these are related to GHA, JHA and DDHA )&lt;i&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;b&gt;Gujarati:&lt;br /&gt;&lt;/b&gt;
Gujarati doesnot have any characters with canonically equivalent sequence.
&lt;br /&gt;&lt;br /&gt;
&lt;b&gt;Gurmukhi :&lt;/b&gt;&lt;ol&gt;
&lt;li&gt;U+0A36 GURMUKHI LETTER SHA = U+0A38 GURMUKHI LETTER SA + U+0A3C GURMUKHI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0A59 GURMUKHI LETTER KHHA = U+0A16 GURMUKHI LETTER KHA + U+0A3C GURMUKHI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0A5A GURMUKHI LETTER GHHA = U+0A17 GURMUKHI LETTER GA + U+0A3C GURMUKHI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0A5B GURMUKHI LETTER ZA = U+0A1C GURMUKHI LETTER JA + U+0A3C GURMUKHI SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0A5E GURMUKHI LETTER FA = U+0A2B GURMUKHI LETTER PHA + U+0A3C GURMUKHI SIGN NUKTA&lt;/li&gt;
&lt;/ol&gt;

&lt;b&gt;Kannada:&lt;/b&gt;&lt;ol&gt;
&lt;li&gt;U+0CC0 KANNADA VOWEL SIGN II = U+0CBF KANNADA VOWEL SIGN I + U+0CD5 KANNADA LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0CC7 KANNADA VOWEL SIGN EE = U+0CC6 KANNADA VOWEL SIGN E + U+0CD5 KANNADA LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0CC8 KANNADA VOWEL SIGN AI = U+0CC6 KANNADA VOWEL SIGN E + U+0CD6 KANNADA AI LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0CCA KANNADA VOWEL SIGN O = U+0CC6 KANNADA VOWEL SIGN E + U+0CC2 KANNADA VOWEL SIGN UU&lt;/li&gt;
&lt;li&gt;U+0CCB KANNADA VOWEL SIGN OO = U+0CC6 KANNADA VOWEL SIGN E + U+0CC2 KANNADA VOWEL SIGN UU + U+0CD5 KANNADA LENGTH MARK&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;Malayalam :&lt;/b&gt;&lt;ol&gt;
&lt;li&gt;U+0D4A MALAYALAM VOWEL SIGN O=  U+0D46 MALAYALAM VOWEL SIGN E + U+0D3E MALAYALAM VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt;U+0D4B MALAYALAM VOWEL SIGN OO = U+0D47 MALAYALAM VOWEL SIGN EE + U+0D3E MALAYALAM VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt;U+0D4C MALAYALAM VOWEL SIGN AU = U+0D46 MALAYALAM VOWEL SIGN E + U+0D57 MALAYALAM AU LENGTH MARK&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;Oriya :&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt;U+0B48 ORIYA VOWEL SIGN AI = U+0B47 ORIYA VOWEL SIGN E + U+0B56 ORIYA AI LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0B4B ORIYA VOWEL SIGN O = U+0B47 ORIYA VOWEL SIGN E + U+0B3E ORIYA VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt;U+0B4C ORIYA VOWEL SIGN AU = U+0B47 ORIYA VOWEL SIGN E + U+0B57 ORIYA AU LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0B5C ORIYA LETTER RRA = U+0B21 ORIYA LETTER DDA + U+0B3C ORIYA SIGN NUKTA&lt;/li&gt;
&lt;li&gt;U+0B5D ORIYA LETTER RHA = U+0B22 ORIYA LETTER DDHA + U+0B3C ORIYA SIGN NUKTA&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;Tamil:&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt;U+0B94 TAMIL LETTER AU = U+0B92 TAMIL LETTER O + U+0BD7 TAMIL AU LENGTH MARK&lt;/li&gt;
&lt;li&gt;U+0BCA TAMIL VOWEL SIGN O = U+0BC6 TAMIL VOWEL SIGN E + U+0BBE TAMIL VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt;U+0BCB TAMIL VOWEL SIGN OO = U+0BC7 TAMIL VOWEL SIGN EE + U+0BBE TAMIL VOWEL SIGN AA&lt;/li&gt;
&lt;li&gt;U+0BCC TAMIL VOWEL SIGN AU = U+0BC6 TAMIL VOWEL SIGN E + U+0BD7 TAMIL AU LENGTH MARK&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;&lt;u&gt;Notes:&lt;/u&gt;&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt; When you search a decomposed form of codepoints, you are also getting the search results of the atomic codepoint. Eg:&lt;br /&gt;
മ + േ + ാ  == മോ and മ + ോ  == മോ &lt;br /&gt;
Eventhough the code points are differnt for both of them, since there is a canonical equivalence between them, when you search one, you should get the second.
&lt;/li&gt;
&lt;li&gt; When you sort the words with canonically equivalent codes, they should come adjacent.&lt;/li&gt;

&lt;li&gt;  There are some languages where unicode defined the atomic code points obsolete and corresponding sequence as valid one without defining the canonical equivalence. One example is Khmer language. There
U+17A8 KHMER INDEPENDENT VOWEL QUK was the atomically enocoded letter for the sequence U+17A7 KHMER INDEPENDENT VOWEL QU U+1780 KHMER LETTER KA. But now the description of U+17A8 KHMER INDEPENDENT VOWEL QUK says  &quot; obsolete ligature for the sequence U+17A7 KHMER INDEPENDENT VOWEL QU U+1780 KHMER LETTER KA
, use of the sequence is now preferred  &quot;&lt;/li&gt;

&lt;li&gt; Unicode 5.1 defined new codepoints for the 6 chillu letters in Malayalam which are currently represented by consonant + virama  + ZWJ. So a single letter can be represented in both ways. But unicode did not define the canonical equivalence for them. This results in dual encoding and users will not able to get the search results with one representation if they search with other representation.&lt;/li&gt;

&lt;li&gt;  Windows does not implement the canonical equivalence at all(when tested in windows XP Sp2 for the above mentioned scripts).&lt;/li&gt;
&lt;li&gt;Google search gives the correct search results for words with canonical equivalent sequence. Example used is മോഷണം &amp;lt;=&amp;gt; മോഷണം&lt;/li&gt;
&lt;li&gt;Yahoo search gives the correct search results for words with canonical equivalent sequence. Example word used is മോഷണം &amp;lt;=&amp;gt; മോഷണം. But unlike google, the search key highlighted in search results uses a decomposed sequence. i.e Yahoo replaces the search key in the search results with decomposed unicode sequence while showing the results.&lt;/li&gt;
&lt;/ol&gt;
&lt;b&gt;&lt;u&gt;References:&lt;/u&gt;&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt;http://en.wikipedia.org/wiki/Canonical_equivalence&lt;/li&gt;
&lt;li&gt;http://unicode.org/reports/tr15/&lt;/li&gt;
&lt;/ol&gt;&lt;/i&gt;&lt;/i&gt;</description>
  <comments>http://santhoshtr.livejournal.com/11701.html</comments>
  <category>unicode</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://santhoshtr.livejournal.com/11279.html</guid>
  <pubDate>Mon, 02 Jun 2008 16:44:54 GMT</pubDate>
  <title>Firefox spellcheck bugs...</title>
  <link>http://santhoshtr.livejournal.com/11279.html</link>
  <description>Firefox spellcheck feature requires some volunteers to fix the&lt;br /&gt;tokenization issue. There are two bugs related to the tokenization&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=434044&quot; rel=&quot;nofollow&quot;&gt;Bug 434044 – The tokenization of words for spellcheck is wrong when there is a ZWJ/ZWNJ/ZWS in the word.&lt;/a&gt; - Reported:  2008-05-16 07:49 PDT by Santhosh Thottingal&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=318040&quot; rel=&quot;nofollow&quot;&gt;Bug 318040 – Spell checker flags words containing full stops (periods) &lt;/a&gt;    Reported:       2005-11-28 12:45 PDT by Joseph Wright 	&lt;/li&gt;&lt;/ol&gt;</description>
  <comments>http://santhoshtr.livejournal.com/11279.html</comments>
  <category>bugs</category>
  <category>firefox</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
</channel>
</rss>

