Tighten up tests for `merge_whitesp` job
Now that we've had chance to see how this job functions with real world MRs, rather than just with artificial test cases, it's apparent that it is far too sensitive and generates far too many false positives to be useful as it currently stands. Problems include:
When two nearby groups of lines are swapped over, the underlying
git diffcommands can make different decisions about which group is fixed and which was removed from one place and inserted into another. When our job script finds those lines unchanged in the
-bdiff and changed in the non-
-bdiff, it incorrectly assumes the presence of a whitespace change.
Even minor indentation changes get flagged up as an error. Even something as simple as changing
if (foo) bar();
- In order to pass
head_whitespon a component that hasn't been touched recently, it's necessary to strip the trailing whitespace from the
VersionNumfile even though once the MR is accepted, this will happen automatically. Yet doing this gets picked up as a whitespace error change by
merge_whitesp, so it's impossible to simultaneously pass all jobs!
We could just throw away this job, but there is still an important case that ideally we do want to catch: where a source file was originally imported from an upstream project which includes whitespace errors - meaning that we want to preserve them to keep tracking the upstream project easy - and when one of our contributors has accidentally stripped all trailing whitespace from the file. We want this to be caught at the review stage - better still by the contributor, before any reviewers have to get involved. And a CI job can still serve this purpose well.
To distinguish these different cases, we can take into account the fact that the accidental space-stripping scenario will typically alter many lines a long way away from where the new contributor was working. Thus, the job script is tightened as follows:
- Lines "nearby" a diff
-bhunk are discounted, where "nearby" is arbitrarily defined as within +/- 20 lines.
- At least one line in the "removed" half of the non-
-bhunk must have contained a whitespace error. It's impossible to exactly match up the individual "removed" and "added" lines in the general case, because there are often different numbers of lines in each half of a hunk.
- A threshold number of flagged lines (currently 10) must be detected across the whole project before the test is considered to have failed, although we still print any lines that we find.
- The test for introduction of whitespace errors is removed, since this
duplicates the functionality of the