Defect #24616
closedShould not replace all invalid utf8 characters (e.g in mail)
Description
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
1) Failure: MailHandlerTest#test_invalid_utf8 [/test/unit/mail_handler_test.rb:548]: Expected: "Здравствуйте?" Actual: "?????????????"
I changed Redmine::CodesetUtil.replace_invalid_utf8(str) and Redmine::CodesetUtil.to_utf8(str, encoding)
str = str.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")
to
str = str.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
all tests are passing with this change.
Redmine version 3.3.1.stable Ruby version 2.1.5-p273 (2014-11-13) [x64-mingw32] Rails version 4.2.7.1 Environment production Database adapter Mysql2 SCM: Git 2.10.1 Filesystem Redmine plugins: no plugin installed
Files
Updated by Go MAEDA about 9 years ago
- File defect-24616.diff defect-24616.diff added
- Target version set to 3.3.2
Looks good to me.
# valid UTF-8 string
text = "こんにちは"
p text.valid_encoding? # => true
# making invalid UTF-8 string
text.force_encoding('ASCII-8BIT')
text[-1] = 0xff.chr
text.force_encoding("UTF-8")
p text.valid_encoding? # => false
p text # => "こんにち\xE3\x81\xFF"
# Current code of Redmine
p text.encode("US-ASCII", :invalid => :replace, :undef => :replace, :replace => '?').encode("UTF-8")
# => "??????"
# Fixed code by Pavel Rosický
p text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
# => "こんにち??"
Updated by Toshi MARUYAMA about 9 years ago
- Target version deleted (
3.3.2)
Did you run whole tests?
Especially this test.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb
Updated by Toshi MARUYAMA about 9 years ago
Pavel Rosický wrote:
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
You can see this function purpose.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68
Updated by Pavel Rosický about 9 years ago
Thanks Toshi, I rechecked it again and all tests are passing.
source:tags/3.3.1/test/unit/lib/redmine/codeset_util_test.rb#L68
In this case, my change has no effect on the result, because the string contains just one invalid utf-8 character.
s1.encode('us-ascii', :invalid => :replace, :undef => :replace, :replace => '?').encode('utf-8')
"Texte encod? en ISO-8859-1."
# patched
s1.encode('utf-8', :invalid => :replace, :undef => :replace, :replace => '?')
"Texte encod? en ISO-8859-1."
but a combination of valid and invalid utf-8 chars (non-ascii-compatible) will result both characters are stripped. Try out GO Media's example.
Updated by Toshi MARUYAMA about 9 years ago
$ irb
1.9.3-p551 :001 > text = "こんにち\xE3\x81\xFF"
=> "こんにち\xE3\x81\xFF"
1.9.3-p551 :002 > text = text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
=> "こんにち\xE3\x81\xFF"
1.9.3-p551 :003 > text.valid_encoding?
=> false
$ irb
2.3.3 :001 > text = "こんにち\xE3\x81\xFF"
=> "こんにち\xE3\x81\xFF"
2.3.3 :002 > text = text.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => '?')
=> "こんにち??"
2.3.3 :003 > text.valid_encoding?
=> true
Updated by Toshi MARUYAMA about 9 years ago
Pavel Rosický wrote:
Hello,
I've an email, that is encoded in utf8, but it contains an invalid character. In this case, redmine converts the content to us-ascii and then to utf8. This step will replace non-ascii compatible chars to "?". Why?
Because of Ruby 1.8.7 behavior compatibility.
source:tags/2.6.9/lib/redmine/codeset_util.rb
Updated by Toshi MARUYAMA about 9 years ago
- Subject changed from encoding error if email contains an invalid utf8 character to Should not replace all invalid utf8 characters
- Category changed from Email receiving to I18n
Updated by Toshi MARUYAMA about 9 years ago
- Subject changed from Should not replace all invalid utf8 characters to Should not replace all invalid utf8 characters (e.g in mail)
Updated by Toshi MARUYAMA about 9 years ago
- Status changed from New to Closed
- Target version set to 3.4.0
- Resolution set to Fixed
I have committed r16273 to pass on Ruby 1.9.3.
I don't want to change behavior on stable.