Defect #40020
closedScmData.binary? incorrectly considers UTF-8 text as binary
0%
Description
Currently, the binary?
method in Redmine::Scm::Adapters::ScmData
often misclassifies Unicode text as binary. This is because the method actually checks whether the given data is ASCII text or not.
The new implementation in the attached patch checks for control characters excluding tabs, newlines, and carriage returns, and calculates their proportion in the data. It ensures accurate detection of binary data while properly handling Unicode text.
Files
Updated by Go MAEDA 11 months ago
- File 0001-Fix-ScmData.binary-method-not-to-consider-UTF-8-text.patch 0001-Fix-ScmData.binary-method-not-to-consider-UTF-8-text.patch added
I have lowered the percentage of control characters in the data that the method considers to be binary from 0.3 to 0.1.
Since the number of control characters is approximately 11% of the 256 values that can be expressed in a single byte, I believe that a threshold value of 0.1 is enough.