Feature #2371
closedcharacter encoding for attachment file
100%
Description
As r814, default encoding for repository can be configured.
diff or patch attachment requires similar configuration.
- default encoding for diff or patch attachment (Admin -> Settings -> Attachment -> diff/patch encodings ?).
- follow encoding of repository. (source:/trunk/app/helpers/repositories_helper.rb@1900#L109)
I thinks 2nd option may be enough and useful.
Files
Related issues
Updated by Yuya Nishihara over 14 years ago
youngseok yi wrote:
- follow encoding of repository.
Attached patch implements it with minimal changes. attachment-encoding.patch
Proper solution will be something like:- move
to_utf8
to separate module, e.g.RepoFilesHelper
- make
AttachmentsHelper
andRepositoriesHelper
include RepoFilesHelper
Updated by Toshi MARUYAMA over 13 years ago
- Target version set to Candidate for next major release
Updated by Toshi MARUYAMA over 13 years ago
- Target version changed from Candidate for next major release to 1.3.0
Updated by Toshi MARUYAMA about 13 years ago
- Subject changed from encoding for diff or patch attachment file to encoding for attachment file
Updated by Toshi MARUYAMA about 13 years ago
- Subject changed from encoding for attachment file to character encoding for attachment file
Updated by Etienne Massip about 13 years ago
Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
Updated by Toshi MARUYAMA about 13 years ago
- File general-settings.png general-settings.png added
Etienne Massip wrote:
Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
This feature issue goal is that attachment file and patch encoding are converted by repositories setting.
Updated by Etienne Massip about 13 years ago
I'm not sure this is a good idea; repositories may return data using a specific encoding, but attachments are usually stored on FS without transformation, so assuming that they're "very likely to be encoded the same way data in SCM is" is not necessarily true.
For example, my encoding list starts with UTF-8 and my locale (Fr) would assume that files uploaded by users are probably encoded in ISO-8859-15/CP1252; so assuming that the text files uploaded are in UTF-8 mean that they will be rendered stripped and that I will probably often loose some chars, which is the actual situation.
I would prefer to be able to specify a distinct default encoding for text attachments which would be ISO-8859-15/CP1252 (could be defaulted to default server encoding) and render with something like bom_present?(str) ? str : Iconv.conv('UTF-8', Setting.default_encoding)
.
Updated by Toshi MARUYAMA about 13 years ago
UTF-8 is very strict.
It is very rare case that miss understanding ISO-8859-1 characters as UTF-8.
http://groups.google.com/group/thg-dev/browse_thread/thread/6c258628e3fce8/09e9dbe4a030e51d
Updated by Toshi MARUYAMA about 13 years ago
Redmine 1.2.2 repository converting encoding is this line.
source:tags/1.2.2/app/helpers/repositories_helper.rb#L140
In case of "UTF-8,ISO-8859-1",
if converting error in "UTF-8", Redmine converts from ISO-8859-1.
Japanese use three encoding, UTF-8, EUC-JP and Shift-JIS (CP932).
This Redmine feature is big advantage in Japan.
Updated by Etienne Massip about 13 years ago
So if I understand well, according to encoding list order, it will try and fail to convert the ISO-8859-1 file from UTF-8 to UTF-8 and then will try and success to convert it from ISO-8859-1 to UTF-8?
Guess it will work...
Updated by Etienne Massip about 13 years ago
What if the administrator does not set UTF-8 at the start of the list?
Can't you str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)
?
Updated by Toshi MARUYAMA about 13 years ago
Etienne Massip wrote:
repositories may return data using a specific encoding,
It is not true.
SCMs does not have encoding information (meta data) of file contents.
http://mercurial.selenic.com/wiki/EncodingStrategy?action=recall&rev=21#Unknown_byte_strings
Updated by Etienne Massip about 13 years ago
Toshi MARUYAMA wrote:
It is not true.
SCMs does not have encoding information (meta data) of file contents.
Well, that's why I said may :-)
Updated by Toshi MARUYAMA about 13 years ago
Etienne Massip wrote:
What if the administrator does not set UTF-8 at the start of the list?
This is very rare case in Japan.
It is popular "UTF-8,EUC-JP,Shift_JIS in Japan.
This order is strict order.
If Single Byte Character Set (e.g. ISO-8859-1) is the start of the list, all characters are converted to UTF-8.
But, I think this is very rare case in the whole world.
Can't you
str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)
?
Default repository encoding setting is empty.
This is equivalent that default is UTF-8.
And I think it is better that administrator set UTF-8 in the start of the list explicitly.
Updated by Mischa The Evil about 13 years ago
Updated by Toshi MARUYAMA about 13 years ago
- Status changed from New to Closed
- Resolution set to Fixed
Committed in r7885.