• Ben Avison's avatar
    Tighten up tests for `merge_whitesp` job · aec716b3
    Ben Avison authored
    Now that we've had chance to see how this job functions with real world MRs,
    rather than just with artificial test cases, it's apparent that it is far too
    sensitive and generates far too many false positives to be useful as it
    currently stands. Problems include:
    
    * When two nearby groups of lines are swapped over, the underlying `-b` and
      non-`-b` `git diff` commands can make different decisions about which group
      is fixed and which was removed from one place and inserted into another.
      When our job script finds those lines unchanged in the `-b` diff and
      changed in the non-`-b` diff, it incorrectly assumes the presence of a
      whitespace change.
    
    * Even minor indentation changes get flagged up as an error. Even something
      as simple as changing
        bar();
      to
        if (foo)
          bar();
    
    * In order to pass `head_whitesp` on a component that hasn't been touched
      recently, it's necessary to strip the trailing whitespace from the
      `VersionNum` file even though once the MR is accepted, this will happen
      automatically. Yet doing this gets picked up as a whitespace error change
      by `merge_whitesp`, so it's impossible to simultaneously pass all jobs!
    
    We could just throw away this job, but there is still an important case that
    ideally we do want to catch: where a source file was originally imported from
    an upstream project which includes whitespace errors - meaning that we want
    to preserve them to keep tracking the upstream project easy - and when one of
    our contributors has accidentally stripped all trailing whitespace from the
    file. We want this to be caught at the review stage - better still by the
    contributor, before any reviewers have to get involved. And a CI job can
    still serve this purpose well.
    
    To distinguish these different cases, we can take into account the fact that
    the accidental space-stripping scenario will typically alter many lines a
    long way away from where the new contributor was working. Thus, the job
    script is tightened as follows:
    
    * Lines "nearby" a diff `-b` hunk are discounted, where "nearby" is
      arbitrarily defined as within +/- 20 lines.
    * At least one line in the "removed" half of the non-`-b` hunk must have
      contained a whitespace error. It's impossible to exactly match up the
      individual "removed" and "added" lines in the general case, because there
      are often different numbers of lines in each half of a hunk.
    * A threshold number of flagged lines (currently 10) must be detected across
      the whole project before the test is considered to have failed, although we
      still print any lines that we find.
    * The test for introduction of whitespace errors is removed, since this
      duplicates the functionality of the `head_whitesp` job.
    aec716b3
refresh.awk 45 KB