Project

General

Profile

Actions

Defect #12641

closed

Diff outputs become ??? in some non ASCII words.

Added by Toshi MARUYAMA about 12 years ago. Updated almost 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Toshi MARUYAMA
Category:
I18n
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed
Affected version:

Description

An example is r11052 in #12640#note-2.


Files

diff-r11052.png (20 KB) diff-r11052.png Toshi MARUYAMA, 2012-12-19 09:30
unified_diff.rb.diff (787 Bytes) unified_diff.rb.diff Correct UTF-8 parsing Filou Centrinov, 2013-03-05 00:16
unified_diff.rb.2.diff (621 Bytes) unified_diff.rb.2.diff Set utf-8 encoding Filou Centrinov, 2013-03-05 12:54

Related issues

Related to Redmine - Patch #12640: Russian "about_x_hours" translation changeClosed

Actions
Actions #1

Updated by Filou Centrinov almost 12 years ago

The Problem is, that for example the following diff-lines

- часа" 
+ часов" 

are parsed in Redmine as UTF-8 like this:

\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xBE\xD0\xB2</span>&quot;

This is wrong, because the leading byte \xD0 is part of the cyrillic 2-Byte character "а" in the <span>-tag, but it's actually outside of the <span>-tag. Therefore charaters will be misinterpreted and will be displayed with "?".

Correct UTF-8 would be:

\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xBE\xD0\xB2</span>&quot;

So we have for the first line "...<span>\xD0\xB0</span>..." instead of "...\xD0<span>\xB0</span>...". The attached patch searchs for the last leading byte, if the unmatching byte is a continuation byte (and not a leading byte or a single character byte).

A continuation byte has the binary format 10xxxxxx, so we can determine it with myContinuationByte.ord.between?(128, 191)

This problem occurs always, when the first determined difference between two bytes are continuation bytes. An other example in japanese you find in #13350.

Actions #2

Updated by Filou Centrinov almost 12 years ago

A much better way to fix this problem is to set an UTF-8 encoding. :-)

Actions #3

Updated by Filou Centrinov almost 12 years ago

The affected version is also 2.3 (devel)

Actions #4

Updated by Toshi MARUYAMA almost 12 years ago

  • Category set to I18n
  • Assignee set to Toshi MARUYAMA
  • Target version set to 2.4.0
Actions #5

Updated by Toshi MARUYAMA almost 12 years ago

  • Target version changed from 2.4.0 to 2.3.0
Actions #6

Updated by Toshi MARUYAMA almost 12 years ago

  • Status changed from New to Closed
  • Resolution set to Fixed

Committed in, thanks.

Actions

Also available in: Atom PDF