https://www.redmine.org/https://www.redmine.org/favicon.ico?16793021292013-03-04T23:17:07ZRedmineRedmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=463262013-03-04T23:17:07ZFilou Centrinov
<ul><li><strong>File</strong> <a href="/attachments/9216">unified_diff.rb.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/9216/unified_diff.rb.diff">unified_diff.rb.diff</a> added</li></ul><p>The Problem is, that for example the following diff-lines<br /><pre><code class="diff syntaxhl"><span class="gd">- часа"
</span><span class="gi">+ часов"
</span></code></pre></p>
<p>are parsed in Redmine as UTF-8 like this:<br /><pre><code>\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xBE\xD0\xB2</span>&quot;
</code></pre></p>
<p>This is wrong, because the <em>leading byte</em> <code>\xD0</code> is part of the cyrillic 2-Byte character "<code>а</code>" in the <span>-tag, but it's actually outside of the <span>-tag. Therefore charaters will be misinterpreted and will be displayed with "?".</p>
<p>Correct UTF-8 would be:</p>
<pre><code>\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xBE\xD0\xB2</span>&quot;
</code></pre>
<p>So we have for the first line "<code>...<span>\xD0\xB0</span>...</code>" instead of "<code>...\xD0<span>\xB0</span>...</code>". The attached patch searchs for the last <em>leading byte</em>, if the unmatching byte is a <em>continuation byte</em> (and not a <em>leading byte</em> or a single character byte).</p>
<p>A <em>continuation byte</em> has the binary format 10xxxxxx, so we can determine it with <code>myContinuationByte.ord.between?(128, 191)</code></p>
<p>This problem occurs always, when the first determined difference between two bytes are <em>continuation bytes</em>. An other example in japanese you find in <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Defect: Japanese mistranslation fix (Closed)" href="https://www.redmine.org/issues/13350">#13350</a>.</p> Redmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=463312013-03-05T11:56:40ZFilou Centrinov
<ul><li><strong>File</strong> <a href="/attachments/9218">unified_diff.rb.2.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/9218/unified_diff.rb.2.diff">unified_diff.rb.2.diff</a> added</li></ul><p>A much better way to fix this problem is to set an UTF-8 encoding. :-)</p> Redmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=463332013-03-05T19:23:16ZFilou Centrinov
<ul></ul><p>The affected version is also 2.3 (devel)</p> Redmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=463782013-03-07T01:36:26ZToshi MARUYAMA
<ul><li><strong>Category</strong> set to <i>I18n</i></li><li><strong>Assignee</strong> set to <i>Toshi MARUYAMA</i></li><li><strong>Target version</strong> set to <i>2.4.0</i></li></ul> Redmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=463912013-03-07T08:28:04ZToshi MARUYAMA
<ul><li><strong>Target version</strong> changed from <i>2.4.0</i> to <i>2.3.0</i></li></ul> Redmine - Defect #12641: Diff outputs become ??? in some non ASCII words.https://www.redmine.org/issues/12641?journal_id=464472013-03-07T23:13:31ZToshi MARUYAMA
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li><li><strong>Resolution</strong> set to <i>Fixed</i></li></ul><p>Committed in, thanks.</p>