What’s happened with the A-level results?

This year’s A-level results have been the most controversial ever, by a long way. But what exactly has happened? And what can we do about it?

How were the grades calculated?

When the Secretary of State announced on 18th March that schools would close, he also announced that exams were cancelled, but that “we will work with the sector and Ofqual [the exams regulator] to ensure that children get the qualifications that they need.” Detailed guidance followed.

Teachers were asked to provide a “centre assessed grade.” In the Ofqual guidance it says: “we asked schools and colleges to use their professional experience to make a fair and objective judgement of the grade they believed a student would have achieved had they sat their exams this year.” These grades were then moderated by the exam boards, using an algorithm designed by Ofqual, to ensure that grades in 2020 were similar (or “comparable”) to previous years.

Why were teacher recommendations so high?

Some parts of the media have accused teachers of assessing too generously, or trying to unfairly boost their own schools’ results. All of this is wrong. Firstly, no data on schools’ overall results is being collected or published this year. There are no performance tables – a welcome move, which has allowed teachers to focus on what really matters: the students and their results.

But, if teachers’ recommended grades had been accepted without moderation, nationally results would have risen: there would have been a 13% rise in A-levels awarded grade A*-B, which is an “implausibly high” increase. Why has this happened?

Put simply, teachers were asked to assess what they believed students to be capable of. Real exams assess how students actually perform on the day. If a teacher believed a student was capable of achieving an A in the summer, then they assessed that student at an A. If that student had sat the real exam, they may have achieved that A. But, if there was a particularly tricky question, or they managed their time badly, or they had a mental blank in the exam, they might not have done. They might have ended up with a B. So the teacher recommended grades were always going to be higher – that was baked into the system, and it is why some form of moderation was needed.

So how did the algorithm work?

The standardisation and moderation process is explained in Ofqual’s interim technical report, published on A-level results day. The report is 319 pages long, which gives you some idea of how complex the process is. It is called the Direct Centre Performance model (DCP). In Ofqual’s own words, it “works by predicting the distribution of grades for each individual school or college. That prediction is based on the historical performance of the school or college in that subject taking into account any changes in the prior attainment of candidates entering this year compared to previous years.”

What does this mean? If we take A-level Maths as an example, the exam board would look at what distribution of grades students from Churchill Academy & Sixth Form had achieved in A-level Maths over recent years. It adjusts that distribution based on the prior attainment (GCSE and other results) of the students taking A-level Maths at Churchill in 2020, and then makes a prediction of what grades it expects to see from Churchill based on that information. The algorithm then adjusts the teacher recommended grades from Churchill to fit the “expected” or predicted distribution of grades.

This is where one of the major problems has arisen. Whilst the algorithm is actually very sensible at a whole cohort level, it forgets that individual candidates are human beings and don’t necessarily fit the statistical prediction. They can surprise us – and, as a teacher, I know that they do, every single day. The algorithm doesn’t account for which students are really revising hard, which students have really pushed themselves, which students have suddenly found a new passion and understanding for a subject…it cannot possibly do this. So, instead, it irons out the students into the distribution that the algorithm suggests, almost completely ignoring the teacher recommended grades. The consequences are explained really well by Alex Weatherall in this thread on Twitter.

It also means that schools which have historically performed well at A-level are at an advantage over those which have not. So students that were recommended A* can end up with a C. And, even more cruelly, students that were recommended to pass an A-level can end up with a U grade – failing an exam they hadn’t even sat. Unfairness and injustice is baked into the system.

What about small groups?

An additional unfairness in the system is that statistical models can’t be applied fairly to small groups. In Ofqual’s own words:

“Where schools and colleges had a relatively small cohort for a subject – fewer than 15 students when looking across the current entry and the historical data – the standardisation model put more weight on the CAGs…there is no statistical model that can reliably predict grades for particularly small groups of students. We have therefore used the most reliable evidence available, which is the CAGs.”

From Ofqual’s Interim Report Executive Summary here.

If you happen to have taken a popular A-level which more than 15 students took at your school, you will have been subject to the algorithm. If your A-level choices were less popular, and fewer than 15 students took that subject at your school, greater emphasis was placed on the teacher recommended grades. Still more unfairness and injustice.

A particular example here is Maths (which a lot of people take) and Further Maths (which many fewer people take). This has resulted in many students nationally getting A-level Maths grades adjusted down, whilst their Further Maths grades go through as recommended, creating nonsensical combinations like a C grade for Maths and an A* for Further Maths.

A further inequality here is that in smaller sixth forms, you are more likely to have smaller cohorts of under fifteen taking subjects. Whereas in larger sixth forms – and especially in large sixth form colleges – cohorts are always larger than 15. Therefore the smaller the sixth form, the fewer adjustments have been made to the grades. So it isn’t even necessarily about which subjects you have chosen, but which school or college you happened to be studying them at.

What about appeals?

If you are unhappy with your grade, you have the option of mounting an appeal. This can be done if:

  1. There is an administrative error and the wrong grade has been put into the system. [We haven’t found a single example of this at Churchill].
  2. If your mock exam result shows that you are capable of achieving a higher grade than your final result.

At the moment, that’s it – there are no other grounds for challenging your result, unless you feel you were discriminated against. Mock exams are not the same from subject to subject, much less from school to school – they don’t always assess the full A-level content, they are much more about finding out what candidates need to focus their revision on in the run-up to the real exams than providing a solid grade. We expect mock results to be lower than final results – of course. In some cases, this route will help – but by no means in all.

The only other option open is to sit the full A-level exam in a special Autumn exam series. But who, honestly, could get a higher grade in October or November, without having been in a classroom since March? This is the longest of long shots.

So what can be done?

Currently, the government is saying nothing will change – but surely this can’t stand. The injustices are too great. I think the options are as follows:

  1. Look again at the algorithm and improve the level of “tolerance” around the grade boundaries so that it prioritises the teacher recommendation when a student is being downgraded, especially if they are being downgraded by more than one grade, or moved down from a passing grade to a U.
  2. Just scrap the whole thing and go back to the teacher recommended grades, like Scotland did. Although this would solve the human cost of all the disappointments, it would devalue the 2020 grades compared to previous and following years. An A grade from 2020 would simply not be worth the same as an A grade from another year. As Ofqual said themselves, the teacher recommendations on their own are “implausibly high” for all the reasons outlined above. It would solve the immediate problem – but create another one for the future.
  3. Open up an additional appeals route for candidates who feel an injustice has been done, but whose mocks don’t help them. Again, a tempting route, but what evidence could be used to support such an appeal? In the end, it comes back to the teacher recommendation, and this route very quickly ends up the same as option 2.

My feeling is that Ofqual need to go back and look again at the algorithm, and account for the human cost of squeezing individual candidates into a statistical model that does not account for their unpredictability, their uniqueness, and their actual performance to date. They might have time to do this ahead of GCSE results next week. But, for some A-level candidates, it is already too late – their university places have gone on the basis of results from exams they didn’t even sit.

Who is to blame?

Fundamentally, this is a government decision. As Laura McInerney said in her column for the Guardian today:

“Ultimately, young people have been caught in a farce presided over by an education secretary who let an obviously problematic results day go ahead with no clear plan and no appeals process. How did that happen? Civil servants busy on Brexit? On holiday? Did the exams watchdog not have the bottle to flag problems? I can’t fathom it.

But none of these questions help the Lilys, Matts, or Aatiyahs, or any one of thousands of young people, to understand how a baffling set of grades tanked their future and they weren’t given a clear way to challenge it.”

Laura McInerney, writing in the Guardian here.

I feel deeply aggrieved for those individuals whose futures have been decided not by their own work ethic, revision, effort and learning, but by an algorithm. We will continue to make the case that what has happened is wrong, unfair, and unjust – and hope that the government listens.