Project

General

Profile

Actions

Defect #6551

open

Highlighting in search results is case sensitive for cyrillic pattern

Added by Alexey Ivlev about 14 years ago. Updated about 11 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Search engine
Target version:
-
Start date:
2010-10-01
Due date:
% Done:

50%

Estimated time:
Resolution:
Affected version:

Description

I am sorry for my persistence, I have published the same problem on forum, but still have no response from there...

When I search any pattern in english everything works fine - highlighting in search results is case insensitive. If I try to search pattern in russian I have case insensitive search output, but highlighting my pattern in that results is case sensitive.

For example, if I try to search "ам" (all in lowercase) pattern I will see the next:

all lowercase symbols - all letter in search output are lowercase and highlighting work fine

one uppercase symbol - but if it is one letter or more is uppercase, highlighting doesn't appear.

In source code of search results page this tag <span class="highlight token-0">ам</span> exist for first image and does not exists for the second.

I use MySQL 5.1.41 database with utf8_general_ci encoding and apache + passenger on Ubuntu 10.04, rails-2.3.5, ruby 1.8.6. Please help me to remove this little issue. Thanks!


Files

with_highlighting.png (550 Bytes) with_highlighting.png all lowercase symbols Alexey Ivlev, 2010-10-01 09:57
without_highlighting.png (623 Bytes) without_highlighting.png one uppercase symbol Alexey Ivlev, 2010-10-01 09:57

Related issues

Related to Redmine - Defect #10134: Case insensitive search is not working with postgres 8.4 and umlautsConfirmed

Actions
Blocked by Redmine - Feature #4050: Ruby 1.9 supportClosed2009-10-18

Actions
Actions #1

Updated by Etienne Massip over 13 years ago

  • Target version set to Candidate for next minor release
Actions #2

Updated by Alexey Ivlev over 13 years ago

Thank you very much!

Actions #3

Updated by Etienne Massip over 13 years ago

  • Target version deleted (Candidate for next minor release)

Sorry but the underlying issue seems to be a Ruby Regexp one as Redmine code in SearchHelper#highlight_tokens seems fairly safe in the way it handles case : source:trunk/app/helpers/search_helper.rb#L22.

Added #4050 as blocker.

Actions #4

Updated by Alexey Ivlev over 13 years ago

In other words, the problem will be solved only when the Ruby Regexp will be fixed?

Actions #5

Updated by Etienne Massip over 13 years ago

That's what I think, yes.

Actions #6

Updated by Yuriy Sokolov about 13 years ago

  • % Done changed from 0 to 50

Actually, I made a fix

module SearchHelper
  def highlight_tokens(text, tokens)
    return text unless text && tokens && !tokens.empty?
    re_tokens = tokens.collect {|t| Regexp.escape(t.mb_chars.downcase)}
    regexp = Regexp.new "(#{re_tokens.join('|')})" 
    result = ''
    position = 0
    text = text.mb_chars
    text.downcase.split(regexp).each_with_index do |words, i|
      if result.length > 1200
        # maximum length of the preview reached
        result << '...'
        break
      end
      words = text[position ... (position + words.size)]
      position += words.size
      if i.even?
        result << h(words.length > 100 ? "#{words.slice(0..44)} ... #{words.slice(-45..-1)}" : words)
      else
        t = (tokens.index(words.downcase) || 0) % 4
        result << content_tag('span', h(words), :class => "highlight token-#{t}")
      end
    end
    result
  end
end
Actions #7

Updated by Jean-Philippe Lang about 11 years ago

  • Subject changed from Highlighting in search results is case sensitive for cyrillic pattern to Highlighting in search results is case sensitive for cyrillic pattern
Actions

Also available in: Atom PDF