Defect #24616
closedShould not replace all invalid utf8 characters (e.g in mail)
0%
Description
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
1) Failure: MailHandlerTest#test_invalid_utf8 [/test/unit/mail_handler_test.rb:548]: Expected: "Здравствуйте?" Actual: "?????????????"
I changed Redmine::CodesetUtil.replace_invalid_utf8(str) and Redmine::CodesetUtil.to_utf8(str, encoding)
str = str.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")
to
str = str.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
all tests are passing with this change.
Redmine version 3.3.1.stable Ruby version 2.1.5-p273 (2014-11-13) [x64-mingw32] Rails version 4.2.7.1 Environment production Database adapter Mysql2 SCM: Git 2.10.1 Filesystem Redmine plugins: no plugin installed
Files
Updated by Go MAEDA almost 8 years ago
- File defect-24616.diff defect-24616.diff added
- Target version set to 3.3.2
Looks good to me.
# valid UTF-8 string
text = "こんにちは"
p text.valid_encoding? # => true
# making invalid UTF-8 string
text.force_encoding('ASCII-8BIT')
text[-1] = 0xff.chr
text.force_encoding("UTF-8")
p text.valid_encoding? # => false
p text # => "こんにち\xE3\x81\xFF"
# Current code of Redmine
p text.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")
# => "??????"
# Fixed code by Pavel Rosický
p text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
# => "こんにち??"
Updated by Toshi MARUYAMA almost 8 years ago
- Target version deleted (
3.3.2)
Did you run whole tests?
Especially this test.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb
Updated by Toshi MARUYAMA almost 8 years ago
Pavel Rosický wrote:
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
You can see this function purpose.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68
Updated by Pavel Rosický almost 8 years ago
Thanks Toshi, I rechecked it again and all tests are passing.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68
In this case, my change has no effect on the result, because the string contains just one invalid utf-8 character.
s1.encode('us-ascii', :invalid => :replace, :undef => :replace, :replace => '?').encode('utf-8') "Texte encod? en ISO-8859-1."
# patched s1.encode('utf-8', :invalid => :replace, :undef => :replace, :replace => '?') "Texte encod? en ISO-8859-1."
but a combination of valid and invalid utf-8 chars (non-ascii-compatible) will result both characters are stripped. Try out GO Media's example.
Updated by Toshi MARUYAMA almost 8 years ago
$ irb 1.9.3-p551 :001 > text = "こんにち\xE3\x81\xFF" => "こんにち\xE3\x81\xFF" 1.9.3-p551 :002 > text = text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?') => "こんにち\xE3\x81\xFF" 1.9.3-p551 :003 > text.valid_encoding? => false
$ irb 2.3.3 :001 > text = "こんにち\xE3\x81\xFF" => "こんにち\xE3\x81\xFF" 2.3.3 :002 > text = text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?') => "こんにち??" 2.3.3 :003 > text.valid_encoding? => true
Updated by Toshi MARUYAMA almost 8 years ago
Pavel Rosický wrote:
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
Because of Ruby 1.8.7 behavior compatibility.
source:tags/2.6.9/lib/redmine/codeset_util.rb
Updated by Toshi MARUYAMA almost 8 years ago
- Subject changed from encoding error if email contains an invalid utf8 character to Should not replace all invalid utf8 characters
- Category changed from Email receiving to I18n
Updated by Toshi MARUYAMA almost 8 years ago
- Subject changed from Should not replace all invalid utf8 characters to Should not replace all invalid utf8 characters (e.g in mail)
Updated by Toshi MARUYAMA almost 8 years ago
- Status changed from New to Closed
- Target version set to 3.4.0
- Resolution set to Fixed
I have committed r16273 to pass on Ruby 1.9.3.
I don't want to change behavior on stable.