Feature #31921
closedChanges to properly support 4 byte characters (emoji) when database is MySQL
0%
Description
Currently, Redmine with MySQL as database doesn't support very well 4 byte characters (for ex: emojis) and you need some experience with MySQL in order to properly configure MySQL (and Redmine) to support those characters. Because of this, there are reported plenty of tickets that I'm going to relate to this issue later.
I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.
For now, I've created two patches:
1. 0001-Update-default-database-config-for-MySQL.patch
- Adds to the database.yml.example
the required encoding and collation, including a note to inform users that are safe only for new installations.
- Adds a test to ensure that an issue can be created using a emoji in description
- Updates some old instructions
- These settings will be default in Rails 6: https://github.com/rails/rails/pull/33608
2. attachment:0002-Task-to-check-mysql-support-for-utf8mb4.patch
This task checks the MySQL configuration (innodb_file_per_table
, innodb_large_prefix
, innodb_file_format
) and ENGINE
, ROW_FORMAT
and TABLE_COLLATION
for Redmine tables. Any feedback or ideas to improve this task are welcome!
I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.
Files
Related issues
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #24242: Adding comments to ticket returns internal server error added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #28774: Internal Error when Submit the Description with Vietnamese in Unicode fonts added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #27406: Internal Server Error while posting smile in issue's description added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #25959: "Smile of the death" problem added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #24992: MailHandler: an unexpected error occurred when receiving email: invalid byte sequence in UTF-8 added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #23586: Create index on mysql exceed limits added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #22119: Error: Data too long for column 'notes' when copy paste pictures added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #21398: Mysql: 500 server error when submitting 4 bytes utf8 (to be saved in the 'notes' field) added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #20143: Mailhandler cannot handle 4-byte characters added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Patch #19742: RedmineInstall: MySQL: collation_database added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #18866: MySQL: disappear after 4-Byte UTF-8 added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #30848: Error when creating issue with emoji in description added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #27984: Arabic Support Issues added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #27803: Can't post a smiley face in issues description added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #27361: Failed when using Emoji added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #27238: Mysql Error after upgrading Redmine from 2.5 to 3.* added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #26386: Mysql: Unable to update ticket with Emoji added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #24030: When SVN or Git repository has a commit comment include an emoji (4 bytes charactor), error occurs added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #23557: Special (micro) character in message field causes internal server error added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #22618: subject utf-8 char vs mysql2 added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #19334: Error 500 when uploading file with umlaut in filename from Mac added
Updated by Marius BĂLTEANU over 5 years ago
- Related to Defect #10772: 4-byte utf-8 characters added
Updated by Go MAEDA over 5 years ago
As mentioned in #19742#note-3, the collation utf8mb4_unicode_ci is really problematic for Japanese users. I think it is better to leave it to users without setting a default value. At least, we should not set utf8mb4_unicode_ci as the default value. Or, setting utf8mb4_general_ci or utf8mb4_bin is better.
The problem of utf8mb4_unicode_ci is that it treats many different kinds of characters as the same when comparing. For example, the following combinations of Japanese words are treated as the same words.
- "はは" (means mother, pronounced as "haha") and "パパ" (means dad, pronounced as "papa")
- "からす" (means a crow, pronounced as "karasu") and "ガラス" (means glass, pronounced as "garasu")
- "カメラ" (means a camera, pronounced as "kamera") and "ガメラ" (means Gamera, pronounced as "gamera")
- "パリ" (menas Paris, pronounced as "pari") and "バリ" (means Bali in Indonesia, pronounced as "bari")
Imagine that searching the word "からす" hits not only "からす" (a crow) but also "ガラス" (glass). Setting the collation to utf8mb4_unicode_ci makes issues filter and full-text search unusable in some languages.
Updated by Marius BĂLTEANU over 5 years ago
- File 0001-Update-default-database-config-for-MySQL.patch added
- File 0002-Task-to-check-mysql-support-for-utf8mb4.patch added
Thanks for your detailed response, I totally missed that note and now I understand the problem.
I've removed collation
from default settings.
Updated by Marius BĂLTEANU over 5 years ago
- File deleted (
0001-Update-default-database-config-for-MySQL.patch)
Updated by Marius BĂLTEANU over 5 years ago
- File deleted (
0002-Task-to-check-mysql-support-for-utf8mb4.patch)
Updated by Marius BĂLTEANU over 5 years ago
- File deleted (
0001-Update-default-database-config-for-MySQL.patch)
Updated by Marius BĂLTEANU over 5 years ago
- File deleted (
0002-Task-to-check-mysql-support-for-utf8mb4.patch)
Updated by Marius BĂLTEANU over 5 years ago
- File 0001-Update-default-database-config-for-MySQL.patch 0001-Update-default-database-config-for-MySQL.patch added
Go Maeda, it is ok from you point of view to set only the encoding to encoding: utf8mb4
?
Updated by Go MAEDA over 5 years ago
Marius BALTEANU wrote:
Go Maeda, it is ok from you point of view to set only the encoding to
encoding: utf8mb4
?
Yes, I think it is OK.
But I am not sure whether the comment "# Remove encoding for existing installation on MySQL" is necessary.
It would be even better if there is a comment like "Use "utf8" instead of "utf8mb4" if you use MySQL earlier than #{VERSION}".
Updated by Go MAEDA over 5 years ago
I think the change against test/unit/issue_test.rb should not be merged because it may break tests on the official CI server if the encoding of the MySQL database on the CI server is not utf8mb4.
http://www.redmine.org/builds/
No one except Jean-Philippe Lang can update configurations of the CI server.
Updated by Go MAEDA over 5 years ago
I updated Marius's patch:
- Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
- Removed the test not to break the CI server
- Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
- Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting
diff --git a/config/database.yml.example b/config/database.yml.example
index 57bc51605..727b4d89b 100644
--- a/config/database.yml.example
+++ b/config/database.yml.example
@@ -1,4 +1,4 @@
-# Default setup is given for MySQL with ruby1.9.
+# Default setup is given for MySQL 5.7.7 or later.
# Examples for PostgreSQL, SQLite3 and SQL Server can be found at the end.
# Line indentation must be 2 spaces (no tabs).
@@ -8,7 +8,8 @@ production:
host: localhost
username: root
password: ""
- encoding: utf8
+ # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+ encoding: utf8mb4
development:
adapter: mysql2
@@ -16,7 +17,8 @@ development:
host: localhost
username: root
password: ""
- encoding: utf8
+ # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+ encoding: utf8mb4
# Warning: The database defined as "test" will be erased and
# re-generated from your development database when you run "rake".
@@ -27,7 +29,8 @@ test:
host: localhost
username: root
password: ""
- encoding: utf8
+ # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+ encoding: utf8mb4
# PostgreSQL configuration example
#production:
Updated by Marius BĂLTEANU over 5 years ago
- Assignee deleted (
Marius BĂLTEANU) - Target version changed from Candidate for next major release to 4.1.0
Go MAEDA wrote:
I updated Marius's patch:
- Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
- Removed the test not to break the CI server
- Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
- Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting
Thanks for updating my patch, I agree with your changes. Regarding the test, I'll create a new ticket assigned to Jean-Philippe to update the CI server. Until then, I propose to deliver this in 4.1.0.
Updated by Go MAEDA over 5 years ago
- Status changed from New to Closed
- Assignee set to Go MAEDA
- Resolution set to Fixed
Committed the updated patch in #31921#note-33. Thanks.
Updated by Marius BĂLTEANU over 5 years ago
- Related to Patch #32054: Add test for 4 byte characters (emoji) support added
Updated by Deoren Moor about 5 years ago
Marius BALTEANU wrote:
I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.
...
I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.
I understand that this feature has only just released today in v4.1.0, but is there a new page somewhere covering the conversion steps? I looked for it, but didn't spot it.
If the page doesn't exist yet, would you be willing to stub out such a page so that others could opt-in for (Watch) notifications when the page is updated?
Many thanks for taking the time to push this forward.
Updated by Gérard L almost 5 years ago
Deoren Moor wrote:
Marius BALTEANU wrote:
I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.
...
I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.
I understand that this feature has only just released today in v4.1.0, but is there a new page somewhere covering the conversion steps? I looked for it, but didn't spot it.
If the page doesn't exist yet, would you be willing to stub out such a page so that others could opt-in for (Watch) notifications when the page is updated?
Many thanks for taking the time to push this forward.
+1
Updated by Martin Rys almost 5 years ago
With the help of this article I've managed to convert from utf8 to utf8mb4 on a 4.1.0 bitnami install
http://blog.andriylesyuk.com/4-byte-utf-8-problem-in-redmine-and-how-to-fix-it/
Both database.yml and my.cnf were using utf8mb4, so there was nothing to do there for me.
I've already had row_format set to DYNAMIC (meaning InnoDB was already Barracuda and not Antelope, verified via SHOW TABLE STATUS), so I only listed all tables in the database
SELECT table_name, CCSA.character_set_name, CCSA.collation_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_coll ation AND T.table_schema = "bitnami_redmineplusagile";
And converted every single table from utf8 to utf8mb4 with the appropriate collation (Czech here since this is a Czech install)
... ALTER TABLE wikis CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci; ALTER TABLE workflows CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci;
After that I verified that everything works by posting a comment in an issue with an emoji
EDIT: Ironically it seems that posting an emoji in here will cut the text off.
Updated by Holger Just over 4 years ago
- Has duplicate Defect #33649: Ascii code in the 'Description' causes 500 error added
Updated by Marius BĂLTEANU almost 4 years ago
- Related to Defect #34235: Losing data due to unsupported characters added
Updated by Marius BĂLTEANU almost 4 years ago
- Has duplicate Defect #34574: Email Fetcher catch no Mails with smiley added