CommonMark Markdown Text Formatting
This patch introduces a new text formatting named CommonMark Markdown (GitHub Flavored). It is based on CommonMarker and HTMLPipeline. The formatter was extracted from Planio where it will soon become the default for new accounts.
We built this instead of going with the existing RedCarpet Markdown implementation for a number of reasons:
- From time to time users who are using the current Markdown formatter ask for a spec / formal list of all supported features. No such thing exists for RedCarpet. There is CommonMark but RedCarpet isn't going to support it in the short to medium term (see next point).
- The future development of RedCarpet is uncertain. Few excerpts from a GitHub issue , a year ago:
Commonmark won't be supported anytime soon
A general message about the project for people skimming this thread: I'm sorry Redcarpet isn't really active anymore but my resources are pretty limited since I'm committed to other open source projects and still a student. Feel free to use any existing alternative ; this project isn't the great tool it used to be when it was maintained by Vicent.
- With CommonMark evolving as a Markdown spec that is supported by many implementations and endorsed by organizations like Gitlab and GitHub (which both did the switch from RedCarpet to CommonMarker a while ago), it quickly becomes what users expect when they hear 'Markdown'.
- Migrating existing Textile content is a bit easier since Pandoc has a dedicated Github Flavored Markdown writer module.
- The HTML pipeline approach encourages splitting up the formatting process into it's different aspects (html generation, sanitizing, syntax highlighting etc) which allows for better testability and has potential for future re-use of components with other text formatters. Further, HTML pipeline filters work on actual DOM nodes, making tasks like adding classes to links etc much more straight forward and less prone to bugs than doing so with regular expressions.
Last but not least, this formatter solves a number of currently open issues regarding the RedCarpet based Markdown Formatter:
- #19880 (Incorrect syntax for links in Markdown)
- #20497 (Markdown formatting supporting HTML)
- #20841 (Bare URLs in Markdown don't have "external" class)
- #29172 (Markdown: External links broken)
The main reason why we want to introduce this as a third formatting option and not just replace the existing Markdown formatter is line endings - this formatter does not insert hard breaks (
<br>) for simple newlines, but requires either a
\ or two spaces at the end of the line to do so. Over time the new formatter could become the default and the RedCarpet based one might be renamed to Markdown (legacy) or similar.
English help files are included, and the patch makes sure these are delivered for any other languages as well until corresponding localized versions are available.
- Target version changed from Candidate for next major release to 4.2.0
I agree that Redcarpet is not active anymore. The number of commits made in this year is only 3. And I think many Markdown users expect Redmine to behave as CommonMark/GFM compliant because many apps/services that support it.
Let's start discussion to deliver this in 4.2.0.
#9 Updated by Martin Cizek 2 months ago
- File 0002-attachments_helper-commonmark.patch added
Love you guys! We are playing with markup format conversions and preparing bulletproof arguments to integrate commonmarker + HTML sanitization for almost half a year. :) Jens did this job perfectly and the only unused argument regarding Redcarpet was #32563. I even tried to make a bulletproof pull request to solve an unpleasant Redcarpet issue observed in our Textile->Markdown migration (since 2013) to test if the project is really dead. It is.
Just a very small improvement is attached.
I'd sign every word written by Jens and would like to share a few remarks in subsequent comments, hope they help.
Go MAEDA, would you mind adding #22323 to related issues regarding the line breaks?
#10 Updated by Martin Cizek 2 months ago
Temporary workaround before the patch is merged¶
At the moment, it is possible to use redmine_common_mark plugin. It also allows for configuring commonmarker, but it has no HTML sanitizing implemented. Actually I found this patch when I was about to offer a pull request with
html-pipeline to the plugin's author.
We’d still appreciate merging this patch ASAP.
#11 Updated by Martin Cizek 2 months ago
These are the options used by GitLab. The options in the patch are the same (thumbs up!).GitHub's configs have a few differences (it does not mean that we want them):
STRIKETHROUGH_DOUBLE_TILDEis not used in repos.
- It uses
HARDBREAKSin their issues, which is inconsistent with their wiki and repository rendering.
- They have the
tasklistextension enabled. I'd consider enabling it in Redmine, as
tasklitsis an officially documented GFM feature.
I can imagine that some Redmine users would like other options than us, as we have seen before in Redmine. And it would be quite legit e.g. for the tasklists mentioned above.
Making commonmarker configurable can mean way too many config options, which is difficult to support. A good compromise might be to make a hook for plugins, so that they can change the config.
I can create a follow-up ticket after getting some feedback.
#12 Updated by Martin Cizek 2 months ago
HTML sanitizing is currently embedded in the commonmark formatter in the patch. That's a good start.
As #807 suggests, sanitizing should rather be a shared concept, eventually with small differences for different formats. GitLab did it this way, their base_sanitization_filter.rb contains common sanitization setup and CommonMark sanitization_filter.rb customizes the
HTML::Pipeline::SanitizationFilter on top of that.
I can create a follow-up ticket after getting some feedback.
#13 Updated by Martin Cizek 2 months ago
Textile to Markdown migration¶
Pandoc is actually bad at this job. We tried Jens'es redmine_convert_textile_to_markdown (thanks for that!), we ran rendering comparison tests1 covering hundreds of thousands of strings and the results were poor.
1 Rendering comparison test = grab all rendered issues and wiki pages from the Textile Redmine instance and a converted-to-Markdown Redmine instance, normalize HTML, compare the HTMLs.
So we forked it, and similarly to Jens, we were adding more and more preprocessors to make Pandoc happy and postprocessors to render it correctly. This is the
latest version of the fork. Later, we reworked it completely to a new project, which we'll publish soon.
But if we were doing it again, we would get rid of Pandoc completely. The preprocessing is done by partial rendering using code adapted from Redmine / Redcloth3. The amount of the code is comparable to normal rendering and invoking Pandoc just makes it slow.
Pandoc is a great tool, but this use case is just too specific for a universal format converter.
So the message is just: a good converter exists for Redmine. :)
#14 Updated by Jens Krämer 2 months ago
Hi, thanks for the feedback!
Nice catch about the rendering of Markdown files, that indeed should be switched to commonmark as well. Also great work on the Migrator :)
General use of HTML Pipeline
It would really make sense to use HTML pipeline for Textile rendering as well, as it will open up a lot of options for refactoring and sharing code (i.e. for HTML sanitization) between the different pipelines via filters. I intentionally stopped before doing this to gauge interest and get commonmark in as quickly as possible. We can (and imho should) rework the Textile rendering at a later point, as well as do some cleanup by moving rendering related code from helpers to the pipeline. The same could be done to the Redcarpet based markdown formatter if we decide to still keep that around at this time.
For the sake of not overcomplicating this already complex patch I went with what I think are sensible defaults. I have no idea why Github is so inconsistent with their line break rendering and would definitely stay away from doing the same.
Introducing a way for configuring the rendering pipeline through plugins makes sense, maybe as part of the refactoring when we move over all formatters to the pipeline?
Regarding the task lists - to be honest I left this out since at Planio this would collide with the checklists plugin we have as a standard feature. Also this would require additional code to handle the checking / unchecking of the checkboxes in the rendered text (everywhere markdown is rendered) again making the patch more complex. I feel like this also really only makes sense in issue descriptions and (maybe) wiki pages. So this would be a case where we need slightly different pipeline config depending on the context.
#16 Updated by Mischa The Evil 2 months ago
Hans Riekehof wrote in #note-15:
[...] I know its always hard to say but is there any roadmap when this patch is available in an official release ?
Given all the parameters (size/scope/state/complexity of the patch, current Redmine release cycle, current issue scheduling) I'd say not earlier than before the end of 2020. However, this does not mean that it won't be available on the Redmine trunk earlier.
NB: Please keep this issue clean, focused, on-topic and free from superfluous comments...
Thank you for posting the patch.
I think we should remove Redcarpet instead of adding the second Markdown formatter. Although I understand that CommonMaker is not fully compatible with Redcarpet, having two different Markdown formatter introduces some problems.
- It is confusing. Maybe some users cannot understand why Redmine has two Markdown formatter and cannot determine which they should use
- Consumes more memory
Someday after adding support for CommonMaker, we need to remove Redcarpet. In my opinion, when the time we add CommonMaker is the best chance to remove Redcarpet. Replacing Redcarpet with CommonMaker is a simple and understandable story for everyone.
#19 Updated by Marius BALTEANU 11 days ago
Jens Krämer wrote:
I agree regarding complexity / confusion of users. So in order to move this forward, should I proceed to change the patch so it directly replaces Redcarpet with CommonMarker for the
I'm my opinion, it is a good move to replace Redcarpet with CommonMarker, but I don't think that we can do it without helping users to migrate their existing content. Gitlab did this kind of migration and they had a very nice process: https://about.gitlab.com/blog/2019/06/13/how-we-migrated-our-markdown-processing-to-commonmark/
#20 Updated by Jens Krämer 11 days ago
Well Gitlab is a hosted platform with paying customers, so they had to be extra careful. I don't know if/how that gradual release of commonmark also happened for users of their opensource version on their own servers. From reading that article it appears to me that they did not encounter many issues with existing content since they do not mention having done any automated conversions of Redcarpet data to commonmark.
It would certainly up complexity quite a bit if we were to add support for different formatters at the same time (i.e., rendering old content with RedCarpet and only new content with CommonMarker)...
Gitlab's diff tool could indeed be used as inspiration for a plugin that people could install before upgrading that could give them an idea about what percentage of their contents will render differently at all if they were to upgrade. I wouldn't build an automated converter however - it's really just few corner cases and chances are such a tool breaks more than it fixes.