Santhosh Thottingal

My experiments with Freedom

Previous Entry Share Next Entry
Yahoo search bug
santhoshtr
None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong..
See the below image:





The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. As a result, the word is rendered wrong in the page.
This happens for all languages which use ZWJ, ZWNJ, ZWS etc. It breaks the word just before the zwnj/zwj and puts the end of bold tag to highlight the search result..

I showed this to Gopal and told me that he filed a bug on that.
Tags: ,

There may be some issues related to Normalization as well - I'll blog about that some day.

Re: Normalization

(Anonymous)

2008-12-06 06:43 am (UTC)

Totally askew conversation, but I tend to see more such web experience ugliness crop up along with random side discussions. Isn't there a way to collect all this aside from blogging ?

~sankarshan

Re: Normalization

(Anonymous)

2008-12-14 08:02 am (UTC)

Sure. Just after you find a place where you could collect all your beauty tips - aside from commenting on his blog.

btw...Peter Norvig in this nice talk, brings in some insight as to how segmentation of unicode text works while crawlers parse them.

Re: Normalization

(Anonymous)

2008-12-14 08:03 am (UTC)

ah, the link ...http://www.omnisio.com/startupschool08/peter-norvig-at-startup-school-08

You are viewing santhoshtr