Commit aec716b3 authored by Ben Avison's avatar Ben Avison
Browse files

Tighten up tests for `merge_whitesp` job

Now that we've had chance to see how this job functions with real world MRs,
rather than just with artificial test cases, it's apparent that it is far too
sensitive and generates far too many false positives to be useful as it
currently stands. Problems include:

* When two nearby groups of lines are swapped over, the underlying `-b` and
  non-`-b` `git diff` commands can make different decisions about which group
  is fixed and which was removed from one place and inserted into another.
  When our job script finds those lines unchanged in the `-b` diff and
  changed in the non-`-b` diff, it incorrectly assumes the presence of a
  whitespace change.

* Even minor indentation changes get flagged up as an error. Even something
  as simple as changing
    if (foo)

* In order to pass `head_whitesp` on a component that hasn't been touched
  recently, it's necessary to strip the trailing whitespace from the
  `VersionNum` file even though once the MR is accepted, this will happen
  automatically. Yet doing this gets picked up as a whitespace error change
  by `merge_whitesp`, so it's impossible to simultaneously pass all jobs!

We could just throw away this job, but there is still an important case that
ideally we do want to catch: where a source file was originally imported from
an upstream project which includes whitespace errors - meaning that we want
to preserve them to keep tracking the upstream project easy - and when one of
our contributors has accidentally stripped all trailing whitespace from the
file. We want this to be caught at the review stage - better still by the
contributor, before any reviewers have to get involved. And a CI job can
still serve this purpose well.

To distinguish these different cases, we can take into account the fact that
the accidental space-stripping scenario will typically alter many lines a
long way away from where the new contributor was working. Thus, the job
script is tightened as follows:

* Lines "nearby" a diff `-b` hunk are discounted, where "nearby" is
  arbitrarily defined as within +/- 20 lines.
* At least one line in the "removed" half of the non-`-b` hunk must have
  contained a whitespace error. It's impossible to exactly match up the
  individual "removed" and "added" lines in the general case, because there
  are often different numbers of lines in each half of a hunk.
* A threshold number of flagged lines (currently 10) must be detected across
  the whole project before the test is considered to have failed, although we
  still print any lines that we find.
* The test for introduction of whitespace errors is removed, since this
  duplicates the functionality of the `head_whitesp` job.
parent efdce52e
Pipeline #3568 passed with stage
in 4 seconds