See the below image:
The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. As a result, the word is rendered wrong in the page.
This happens for all languages which use ZWJ, ZWNJ, ZWS etc. It breaks the word just before the zwnj/zwj and puts the end of bold tag to highlight the search result..
I showed this to Gopal and told me that he filed a bug on that.
Normalization
2008-12-05 03:45 pm (UTC)
Re: Normalization
(Anonymous)
2008-12-06 06:43 am (UTC)
~sankarshan
Re: Normalization
(Anonymous)
2008-12-14 08:02 am (UTC)
btw...Peter Norvig in this nice talk, brings in some insight as to how segmentation of unicode text works while crawlers parse them.
Re: Normalization
(Anonymous)
2008-12-14 08:03 am (UTC)