Feature #1341
closedkeep consistency between browser encoding and mysql database encoding
0%
Description
Hello,
after trying to lazily import issue directly in mysql database (I know it's very bad to do like this, it's better using ruby importation script via redmine API) I see issue subject (and description too) badly utf-8 encoded :
if I import record via SQL using
INSERT INTO `issues` (`tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES (4, 1, 'é', 'é', NULL, NULL, 1, NULL, 4, 5, 3, 0, '2008-05-30 14:19:43', '2008-05-30 14:19:43', '2008-05-30', 0, NULL);
the resulting database dump for this record is
INSERT INTO `issues` (`id`, `tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES (234, 4, 1, 0xc3a9, 0xc3a9, NULL, NULL, 1, NULL, 4, 5, 3, 0, '2008-05-30 14:19:43', '2008-05-30 14:19:43', '2008-05-30', 0, NULL);
and the result on browser show a '�' char in place of 'é'
If I insert an issue via browser with subject='é' and description='é' the dumped database is
INSERT INTO `issues` (`id`, `tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES (235, 1, 1, 0xc383c2a9, 0xc383c2a9, NULL, NULL, 1, NULL, 4, NULL, 3, 0, '2008-06-01 13:14:20', '2008-06-01 13:14:20', '2008-06-01', 0, NULL);
=> the 'é' char was coded in hex c3 83 c2 a9 (the correct encoding is c3 a9)
This produce "é" in place of "é" in mysql database dump but a correct é char in issue
My knowledge in ruby are not sufficient to reproduce this kind of string encoding interpretation but I do it in python :
first I encode 'é' char in utf-8 by:
>>> unicode("é","utf-8").encode("utf-8") '\xc3\xa9'
If i take each value, declare it as unicode string and recode it in utf-8, I have the same bad coding behavior
>>> (u"\xc3").encode("utf-8") '\xc3\x83' >>> (u"\xa9").encode("utf-8") '\xc2\xa9'
so perhaps there is a double encoding conversion somewhere between what is send from browser to what is write in database ?
Once again importing directly in database is a very bad idea (this is a perfect example) but meanwhile this inconsistency between database coding and page rendering can be source of problem in future ...