Markdown text sections broken by thematic breaks (horizontal rules)
A thematic break composed of hyphens (e.g. "
---") breaks the division of Markdown text into individually editable sections.
Steps to reproduce¶
markdown text formatting, create a Wiki page in a web browser and enter the following content:
# Title ## Heading 2 Preceding CRLF is the default for web-submitted data. --- End of thematic breaks. ## Heading 2 Nulla nunc nisi, egestas in ornare vel, posuere ac libero.
More in the unit tests in the enclosed patch.
The reason is that it is confused with a setext heading. Although the current regexp in
extract_sections actually tries to restrict setext headings in a way that it must follow a non-empty line, it does not account for a whitespace-only line or even plain CRLF. And as long as the text originates from a web browser, there is always a CRLF. So the problem is pretty common even for carefully formated text.
Attaching a patch with a fix and corresponding unit tests.
The current approach to section extraction is inherently fragile - as shown in the other (skipped) unit test enclosed in the patch. I'd suggest to keep the skipped test there to mark it as a known issue. Will create a dedicated issue for this.
#1 Updated by Martin Cizek about 1 month ago
#2 Updated by Martin Cizek about 1 month ago
Just to make it clear - the skipped test address an already existing error, which is probably not worth fixing with the current approach in
extract_sections implementations. See #35037 for more details.
But the particular and common error in thematic break mistreatment is solved by the patch "as is", it can be just applied. :)