Project

General

Profile

Actions

Defect #24616

closed

Should not replace all invalid utf8 characters (e.g in mail)

Added by Pavel Rosický almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
I18n
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed
Affected version:

Description

Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?

1) Failure:
MailHandlerTest#test_invalid_utf8 [/test/unit/mail_handler_test.rb:548]:
Expected: "Здравствуйте?" 
  Actual: "?????????????" 

I changed Redmine::CodesetUtil.replace_invalid_utf8(str) and Redmine::CodesetUtil.to_utf8(str, encoding)

        str = str.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")

to
        str = str.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')

all tests are passing with this change.
  Redmine version                3.3.1.stable
  Ruby version                   2.1.5-p273 (2014-11-13) [x64-mingw32]
  Rails version                  4.2.7.1
  Environment                    production
  Database adapter               Mysql2
SCM:
  Git                            2.10.1
  Filesystem
Redmine plugins:
  no plugin installed

Files

invalid_utf8_test.patch (1.49 KB) invalid_utf8_test.patch spec Pavel Rosický, 2016-12-15 01:42
defect-24616.diff (1.73 KB) defect-24616.diff fix + tests (generated from Pavel Rosický's contribution) Go MAEDA, 2016-12-29 07:52
Actions #1

Updated by Go MAEDA almost 8 years ago

Looks good to me.

# valid UTF-8 string
text = "こんにちは" 
p text.valid_encoding?  # => true

# making invalid UTF-8 string
text.force_encoding('ASCII-8BIT')
text[-1] = 0xff.chr
text.force_encoding("UTF-8")
p text.valid_encoding?  # => false
p text                  # => "こんにち\xE3\x81\xFF" 

# Current code of Redmine
p text.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")
# => "??????" 

# Fixed code by Pavel Rosický
p text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
# => "こんにち??" 
Actions #2

Updated by Toshi MARUYAMA almost 8 years ago

  • Target version deleted (3.3.2)

Did you run whole tests?
Especially this test.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb

Actions #3

Updated by Toshi MARUYAMA almost 8 years ago

Pavel Rosický wrote:

Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?

You can see this function purpose.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68

Actions #4

Updated by Pavel Rosický almost 8 years ago

Thanks Toshi, I rechecked it again and all tests are passing.

source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68
In this case, my change has no effect on the result, because the string contains just one invalid utf-8 character.

s1.encode('us-ascii', :invalid => :replace, :undef => :replace, :replace => '?').encode('utf-8')
"Texte encod? en ISO-8859-1." 
# patched
s1.encode('utf-8', :invalid => :replace, :undef => :replace, :replace => '?')
"Texte encod? en ISO-8859-1." 

but a combination of valid and invalid utf-8 chars (non-ascii-compatible) will result both characters are stripped. Try out GO Media's example.

Actions #5

Updated by Toshi MARUYAMA almost 8 years ago

$ irb
1.9.3-p551 :001 > text = "こんにち\xE3\x81\xFF" 
 => "こんにち\xE3\x81\xFF" 
1.9.3-p551 :002 > text =  text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
 => "こんにち\xE3\x81\xFF" 
1.9.3-p551 :003 > text.valid_encoding?
 => false 
$ irb
2.3.3 :001 > text = "こんにち\xE3\x81\xFF" 
 => "こんにち\xE3\x81\xFF" 
2.3.3 :002 > text =  text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
 => "こんにち??" 
2.3.3 :003 > text.valid_encoding?
 => true 
Actions #6

Updated by Toshi MARUYAMA almost 8 years ago

Pavel Rosický wrote:

Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?

Because of Ruby 1.8.7 behavior compatibility.
source:tags/2.6.9/lib/redmine/codeset_util.rb

Actions #7

Updated by Toshi MARUYAMA almost 8 years ago

  • Subject changed from encoding error if email contains an invalid utf8 character to Should not replace all invalid utf8 characters
  • Category changed from Email receiving to I18n
Actions #8

Updated by Toshi MARUYAMA almost 8 years ago

  • Subject changed from Should not replace all invalid utf8 characters to Should not replace all invalid utf8 characters (e.g in mail)
Actions #9

Updated by Toshi MARUYAMA almost 8 years ago

  • Status changed from New to Closed
  • Target version set to 3.4.0
  • Resolution set to Fixed

I have committed r16273 to pass on Ruby 1.9.3.
I don't want to change behavior on stable.

Actions

Also available in: Atom PDF