Reasons and Reasoning: What’s the right level of record keeping of evaluation decisions?

Summary: Has a sensible middle-ground been reached for evaluation and moderation record keeping?

The roots of the obligation to keep sufficiently transparent records of the evaluation and moderation process are in the Public Contracts Regulations 2015.Contracting authorities must “document the progress of all procurement procedures” and “To that end… ensure that they keep sufficient documentation to justify decisions taken in all stages of the procurement procedure, such as documentation on….communications with economic operators and internal deliberations….[and] selection and award of the contract.” [1]

The challenge – both for authorities seeking to comply with the regulations, and potential challengers seeking disclosure of decision-making materials or considering their pleaded case on transparency – exists around:

what amounts to “sufficient documentation to justify decisions”; and
whether the Authority is properly reconciling the principles of transparency and proportionality in respect of documenting that decision-making.

Recent judicial commentary in the Braceurself litigation, when considered alongside the trajectory of preceding cases, is the latest in a line of cases seeking to establish a middle-ground of evaluation record keeping that may be legally compliant.

In summary, it’s clear that there must be a written record of each stage of evaluation process: some form of ‘audit trail’. However, the extent of record keeping at each stage should represent a “sensible balance” between transparency and the administrative (and potentially strategic) burden which increases the more detailed these records become. For example, in some cases, even very limited notes may suffice provided that they “capture the essence” of how a score was arrived at. So there is some latitude about how the Authority documents the journey that leads to the final rationales and scores. What appears to have remained constant is that those final rationales and scores must include a clear and unequivocal statement of all of the reasons for the final moderated score for each requirement. A failure here is far more likely to breach obligations of transparency.

In this article, we trace through the series of recent cases which have considered these points in detail.

Healthcare at Home and Lancashire CC: Where is the line?

The Supreme Court’s commentary in Healthcare at Home, is a useful starting point for any assessment of the required level of transparency for evaluation and moderation decisions. In that case, the court approved the following statement from the European Strabag Benelux case:

“The reasoning followed by the Authority which adopted the measure must be disclosed in a clear and unequivocal fashion so as, on the one hand, to make the persons concerned aware of the reasons for the measure and thereby enable them to defend their rights and, on the other, to enable the court to exercise its supervisory Jurisdiction.”

However, authorities and challengers may well be more familiar with the following extract from the Lancashire case, which is often cited in support of the proposition that detailed notes should be taken of the evaluation and moderation reasons and reasoning (and will therefore be disclosable in any later litigation):

“Where…. the Authority relies upon [moderation meeting] notes as setting out the written reasons for the evaluators’ decisions, it is to those notes that the Court must look for the reasons and reasoning adopted by the Authority.

“I am satisfied that the notes [in this case] do not provide a full, transparent, or fair summary of the discussions that led to the consensus scores sufficient to enable the Trusts to defend their rights or the Court to discharge its supervisory jurisdiction. First ….reasons [that] were in play and are not reflected in the notes. Second, pervasively there is no or no sufficient account of the reasoning and reasons that led panel members to resolve their differences (if they did) so as to arrive at consensus scores.”

This appeared to make clear that transparent record keeping concerned both the discussions / reasoning of the evaluators (i.e. the journey) and the reasons eventually agreed in consensus justifying the relevant score (i.e. the destination). However, as to what would amount to a ‘sufficient account of the reasons and reasoning’ the court provided only an elusive summary:

“Lest there be any doubt, I am not suggesting that it was necessary to keep a complete record of what was said or a comprehensive note of every point that was made. I also accept that the amount of detail that an Authority is required to provide when giving its reasons may vary from contract to contract, depending on all the circumstances relevant to the contract in question.”

In one sense this is clear: the Authority’s obligations of transparency do not require comprehensive notes of every part of the evaluation and moderation process. In another sense it’s opaque: the amount of detail required for an Authority to meet its obligations of transparency will vary from case-to-case. So this case does not provide a final answer to the question of ‘how much transparency is required?’, although that is understandable given the wider context of the case. [2]

EnergySolutions and RFL’21: Cases at each end of the spectrum

There was some concern after the EnergySolutions case that the requirements of transparency and documented decision-making might extend to requiring authorities to make highly detailed notes of each stage of individual evaluation, consensus evaluation and moderation. In EnergySolutions, such notes were dissected word-by-word as a means of unravelling the evaluators’ rationale in order to prove manifest errors in the evaluation.

At the opposite end of the spectrum, a decision on what minimalist records might nevertheless meet the Authority’s relevant legal obligations was arrived at in the Rail Franchising Litigation (“RFL’21”). Here, Stuart-Smith J (shortly prior to his elevation to the Court of Appeal) found that the Disqualification Letter sent to the challenger was a sufficiently detailed, clear and transparent account of the authorities reasons for disqualifying the claimants, despite noting that there was a “major dispute of fact between the parties about whether the reasons set out in the disqualification letters were in fact the reasons adopted by the Secretary of State in reaching his decision”. The Court held:

“The reasons and reasoning as expressed in the disqualification letter were concise, clear and sufficient to enable the Claimants to know that they had been disqualified for serious non-compliance on pensions, which was the actual reason for disqualification. Although the letters were prepared without reference back to the Minister, their concentration on the Defendant’s need to comply with its obligations of fairness, equal treatment and transparency reflected the Secretary of State’s reasons and reasoning as I have found them to be…”

The RFL’21 judgment must be seen in context, as somewhat case specific. Here, the non-compliances in the bidders’ responses which formed the rationale for the Authority’s disqualification decisions were obvious, and reasons for disqualification were capable of simple and clear explanation. Whether an Award Letter alone would suffice to meet the Authority’s decision-making transparency obligations in other cases (particularly ‘scoring’ rather than ‘disqualification’ cases) must be doubted. That suspicion appears to be confirmed by what was said in later cases

The Rail Franchising Litigation: Defending against suggestions of justification after the event

For present purposes, a more important part of the court’s reasoning in the RFL’21 case may be Mr Justice Stuart-Smith’s expansion of what he meant by his statements (quoted above) in the Lancashire case:

“It remains my view that a procurement in which the contracting Authority cannot explain the reasons for its decision fails the most basic standard of transparency. That said, there is no requirement that the reasons and reasoning must all be contained in one document (whether that be the document conveying the decision or otherwise), though the later the purported explanation, the greater the scrutiny that will be required to ensure that what is being provided is in fact the reasons or reasoning that prevailed at the relevant time and not merely an ex post facto justification.”

This is an important point. Again we see the court ruling out the extremity: there is no requirement for compendious and/or verbatim notes of the process of reasoning (i.e. the journey) which the evaluators went on during their consensus scoring and moderation. However, this judgment clarifies that where evaluators have changed their position (on scoring or rationale), the court will want to more closely scrutinise that journey to satisfy itself that was is now being said in the heat of litigation (in support of the Authority’s defence of a fair, transparent process which was without manifest error) was it fact what happened at the time.

This illustrates that contemporaneous evidence to support the change of scoring and/or reasoning is therefore very helpful to guard against the risk of a judge finding that the evidence now given by the Authority’s witnesses is (potentially self-serving) justification developed after the event. Put conversely, it will be harder for an Authority to defend against such an accusation the less detailed those notes are. However:

that is a comment on potential benefits of different levels of record keeping. It leaves open the separate question of the extent to which such documentation is legally required as opposed to merely evidentially useful (i.e. the legal minimum standard should not be confused with what an Authority can do to reflect best practice and reduce legal risk); and
whilst more detailed notes may provide the benefit of a defence against accusations of after-the-event justification of a decision, they also carry the converse risk of creating a noose for the Authority’s neck if they either: (i) do not reflect what the evaluators and/or moderators really meant to say; or (ii) they are an accurate reflection of what was said, but what was said suggests a manifest error in the evaluation. Nonetheless, it cannot be good practice for an Authority to hope that unclear records will protect it from challenge.

Bechtel: Proportionate record keeping

In Bechtel v HS2, Fraser J reviewed and approved the following Authority-drafted summary of ‘Key Messages from the EnergySolutions’ litigation.

“[An Authority should:].

Apply published evaluation methodology and follow your declared process.
Treat all Tenderers in a consistent manner.
Keep written records of evaluation process.
Keep [a] full audit trail of changes in scores and reasons for changes.
Rationale to include all the reasons for the score.
Devote sufficient time and resources to the evaluation process to make the above possible.” (Our emphasis)

As shown by the emphasised text in bold, at first sight this appears to be endorsing the ‘full audit trail’ approach from the EnergySolutions litigation. However, in the explanation that follows, Fraser J appears to step back from that position (see box), instead explaining that the individual evaluation stage of the evaluation represented simply ‘draft’ or ‘initial’ views of the evaluators (the journey, or as he called them “merely steps along the way”) and that the key issue was whether any manifest error is evidenced by the final reasoning and score reached in consensus discussions between the evaluators. [3]

Mr Justice Fraser went on to consider a specific case where both the rationales and scores has changed through the evaluation and moderation process:

“[One evaluator] started with higher draft scores and moved down; [the other] started with lower draft scores and moved up. They arrived at a joint final score different to the ones they started with at the draft stage. This is a good example of the process of moderation working as it was intended to, in my judgment.”

“These are… assessors working to achieve a single score in consensus, after discussion between them, in circumstances where their initial views were somewhat different. I find that this is exactly what was envisaged in the design of the scoring approach and moderation.”

These quotations capture a number of important concepts:

First, it is permissible (and understandable) that different evaluators, when undertaking initial scoring in isolation, may come up with different scores and rationale without either of them being in error. This might result from the fact that “some factors within a question were more in the area of experience of their co-evaluator than their own”. In any event, this is only an initial step along the way.
If the draft scores and reasoning are only initial steps along the way, it should matter less if records of these interim stages are more skeletal. This idea was subsequently reinforced in Braceurself (see below).
The point at which detailed and transparent record keeping is most vital is therefore in the final consensus scores and rationale, and this idea is reinforce by judicial statements as in the Healthcare at Home case, as quoted above.

The Court in Bechtel did not go as far as suggesting how skeletal individual evaluator scores and rationales could be, or how detailed the notes of the moderation leading to a final consensus score and rationale should be. Again, the Court provided a slightly more abstract statement:

“Bechtel complain in their written Closing Submissions that “The Moderation Minutes generally do not record what has been said by Moderators”. This is not required. Short of tape-recording every hour of moderation – which would be entirely disproportionate – minutes of moderation will inevitably not amount to a verbatim note. But no contracting Authority is required to take a verbatim note of all such moderation and evaluation sessions. There must be a sensible limit to what is required of contracting authorities in terms of recording its evaluations. The court’s role is one of supervisory jurisdiction, not one of micro-managing. … The principle of proportionality means a sensible balance, and limit, is what is required.”

As with the passage quote from Lancashire, in one sense this is clear: the Authority’s obligations of transparency do not require comprehensive notes of every part of the evaluation and moderation process. Specifically, transparency does not require verbatim notes or tape recording of the moderation leading to the consensus rationale and score. However, again, in another sense this statement is opaque: the amount of detail required for an Authority to meet its obligations of transparency will vary from case-to-case based on “proportionality” and a “sensible balance”.

Braceurself: Proportionality confirmed

The recent decision in Braceurself can be viewed as a continuation of the reasoning in Bechtel around the proportionality of record keeping of both the developing reasoning (the journey) and the eventual rationale and score (the destination). The court found that, for the purposes of exercising its supervisory jurisdiction, it was sufficient that the Authority had:

kept notes which captured the essence of the reasoning (the journey); and
had set out the evaluation rationale and score in sufficient detail in the Debrief Letter to the disappointed bidder (the destination).

More specifically, the Court stated:

“[T]he Claimant submitted that the notes of discussions at moderation meetings were very limited and differed in some respects from the debrief letter. Having heard the evidence, I am satisfied that, in general terms, there was sufficient correspondence between the notes and the feedback provided in the debrief letter. The notes did not purport to be a verbatim record and the purpose of both the notes and the spreadsheet compiled by the moderator were intended to capture the essence of having arrived at the moderated score. The Claimant has not begun to satisfy me that there was any error of process here, still less the respects in which any such error might have led to the Defendant having made a manifest error, or that it amounted to breach of the requirements for transparency or equal treatment.”

There remains, perhaps, some room for doubt as to the extent to which this is context specific. The court in this particular case was commenting in the context where the extent of note taking in the moderation was sufficient for the court’s purposes: the notes disclosed that the Authority had slipped into error at the moderation stage. So the notes were sufficient in that the court was able to follow (even if only by skeletal notes expanded in later witness evidence) the process by which the evaluators fell into manifest error. This case therefore serves to reiterate the risks of more detailed note keeping. They have the benefit of providing a potential defence against accusations of after-the-event justification of a decision, but carry the risk of being a black-and-white recording of the process by which the Authority fell into error.

So where is the line? - Key takeaways

As recent cases illustrate, the Courts have been astute to avoid setting any hard and fast rules as to the point at which the right balance is struck between the requirements of transparency, and the substantial administrative burden and increased litigation risks which come with more granular record keeping. There is at least one obvious reason for this: in determining whether an Authority has acted lawfully, the court has been well assisted in cases where the Authority has kept compendious notes of the full decision-making process; so it is understandable they would not want to dissuade this by drawing a clear line. However, the following points emerge:

There must be a written record of the evaluation process: some form of ‘audit trail’ of the decision making process, from initial individual evaluator stage, through to the final moderated consensus scores and rationale, including the interim stage of changes in scores and reasons for changes.
The records at each stage do not need to be audio-recorded and they do not need to be verbatim. In some cases, even very limited notes may suffice provided that they “capture the essence” of how the moderated score was arrived at. What the court expects is that they will represent a “sensible balance” in ensuring an accurate record of the reasoning at each stage.
However, the more skeletal the notes, the more the Authority will have to flesh out in correspondence and (ultimately) witness evidence should there be a challenge, and to the extent this expansion looks to be deviating or adding new points to the recorded reasons, the Authority runs the risk of being accused of justification of their decisions after the event.
Whilst there should be a full audit trail, there seems to be greater latitude in the extent of notetaking required for the earliest stage of the evaluation process, because these are recognised as being merely preliminary stepping stones towards a final consensus decision.
However, the more skeletal the notes, the more the Authority will have to flesh out in correspondence and (ultimately) witness evidence should there be a challenge, and to the extent this expansion looks to be deviating or adding new points to the recorded reasons, the Authority runs the risk of being accused of justification of their decisions after the event.
Whilst there should be a full audit trail, there seems to be greater latitude in the extent of notetaking required for the earliest stage of the evaluation process, because these are recognised as being merely preliminary stepping stones towards a final consensus decision.
Conversely, the eventual moderation rationale and score must include a clear and unequivocal statement of all of the reasons for the final moderated score for each requirement. A failure here is likely to breach obligations of transparency.
The court appears to make some allowance for the fact that there may not be a perfect correlation between the Authority’s internal records of all the reasons for the final moderated scores for each requirement, and how these are eventually expressed in the Award / Decision Letter. However, that latitude presumably has fairly tight boundaries, as any material discrepancies will call for explanation if challenged.

Overall, whilst these cases provided a useful guide to the required extent of record keeping (and are at least clear on the extremes at each end of the maximum and minimum spectrum of detail) there are still risk-based judgment calls to be made as to where to draw the line. The Authority’s dilemma here was recognised by Mr Justice Fraser in the EnergySolutions case when he said:

“Serious consideration seems to have been given [by the Authority] to restricting the keeping of contemporaneous records of evaluation, because it was known these would be disclosable in any litigation. Of course, had the evaluation process been performed in accordance with … the Regulations, disclosure of the records of that process would present no danger to the [Authority] (assuming the evaluation was done without manifest error) because they would constitute what was described as an "audit trail" of the [Authority’s] collective decision-making."

It is, of course, the assumption in brackets which potentially causes greatest difficulty when an Authority is deciding to what extent it should be documenting its reasoning at each stage of the evaluation and moderation process.

Relevance to the Procurement Bill

It is not clear to what extent this line of decisions will continue to apply to the new regime of the Procurement Bill, when it becomes law (possibly in late 2023). The Bill does not contain the same principles of transparency as the current regime. In particular, Section 11 of the Bill (as current drafted) uses novel language of contracting authorities needing to “have regard to the importance of …sharing information for the purpose of allowing suppliers and others to understand the Authority’s procurement … decisions”.

Presently, this remains open to interpretation. However, the drafters of the Bill have confirmed that they consider transparency will remain a cornerstone of the new Act, with finer detail of the transparency obligations on authorities to be set out in secondary legislation made under the Act. In particular, the Act will include a new notice regime where the extent of information which must be given to bidders is likely to differ from current Award / Decision Letters. What exactly will be required has yet to be published by Parliament. The devil is likely to be in the detail.

In the meantime it is reasonable to assume that the court will want to continue to see sufficient records to give the court adequate insight into the reasons and reasoning. This is to enable to court to exercise its supervisory jurisdiction over whether the decisions were fair, transparent and without manifest error. This is therefore likely to remain a relevant (and contentious) issue, notwithstanding the changes implemented by the new Act. In particular, whatever the new Act requires under its notice regime, there will remain a key question of the extent to which the court will order disclosure of the Authority’s (usually more detailed) evaluation and moderation records that sit behind those notices, as under the current legal challenge regime.

Laura Wisdom and Patrick Parkin are Partners and Lloyd Nail is a Senior Associate at Burges Salmon.

[1] In this case of the Public Contract Regulations this is Regulation 84(7) to 84(9)

[2] The question of “how much transparency is required?’ did not need to be answered in this Lancashire case. That is because the case was not decided on a fine distinction of whether or not the notes were “sufficiently comprehensive”; this was a case where an almost complete lack of contemporaneous evaluation records led the court to concluded that a “procurement in which the contracting Authority cannot explain why it awarded the scores which it did fails the most basic standard of transparency.” Therefore, the Claimant succeeded, not on the basis of showing the contracting Authority had made manifest errors shown by consideration of its evaluation records, but by satisfying the court that the evaluation records were so insufficient and flawed that “the reasons given were not sufficient in law”. The question of ‘how much transparency is required?’ was therefore not clearly determined.

[3] Bechtel v HS2: The distinction between initial evaluator reasoning and consensus rationale:

“As will be seen when the individual areas of complaint are examined, in some cases each of the assessors had arrived at an initial score which then moved to a different score, after moderation had taken place….In some cases, both draft scores were the same as one another, yet the final score was different. Bechtel maintains that there were failures in transparency… involved in these changes from draft to final scores.

“However, although changes of score without explanation might initially look puzzling, when one considers that the original scores were only initial draft scores, and were reached by evaluators in isolation, the fact that the final score was different simply demonstrates, in my judgment, how carefully the evaluators were performing their task. Their initial draft scores were never intended to be more than drafts. Even if their draft scores were the same as one another, they properly considered the factors and correct scores, and were prepared to move from their draft initial scores to a different score reached consensually.

“….The final moderated score was reached much more carefully, and after discussion with their co-evaluator, whereas the initial draft score was precisely that – a draft. Further, some factors within a question were more in the area of experience of their co-evaluator than their own. In my judgment, one benefit of requiring draft scores first, reached independently, was that each evaluator would be fully prepared for the meeting with their co-evaluator. But the evaluation and moderation process was not intended only to require one score reached in isolation by each evaluator. It was designed to achieve a final score that was jointly agreed by the two evaluators, following discussion and agreement between them. The draft scores were merely steps along the way to achieve that.”