Project

General

Profile

Actions

Defect #20730

closed

Fix tokenization of phrases with non-ascii chars

Added by Jens Krämer over 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Category:
Search engine
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed
Affected version:

Description

\w only matches ASCII characters, we should either use [:alnum:] instead or simply match all non-" characters for the phrase. Test case included.


Files

Actions #1

Updated by Go MAEDA about 9 years ago

  • Tracker changed from Patch to Defect
  • Target version set to 3.1.2

+1

Search keyword '"日本語 テスト"' (written in Japanese) matches both "日本語 テスト" and "日本語テスト" in the current trunk, but it should not match the latter.

expected:

Redmine::Search::Fetcher.new('"日本語 テスト"', ...).tokens => ['日本語 テスト']

actual:

Redmine::Search::Fetcher.new('"日本語 テスト"', ...).tokens => ['日本語', 'テスト']

This behavior can be fixed by this patch.

Actions #2

Updated by Jean-Philippe Lang about 9 years ago

  • Status changed from New to Closed
  • Assignee set to Jean-Philippe Lang
  • Target version changed from 3.1.2 to 3.0.6
  • Resolution set to Fixed

Patch applied, thanks.

Actions

Also available in: Atom PDF