AI Examiners, are they really fairer?

April 23, 2025

While international English examines go more and more into replacing human examiners with AI, placing the future of tens of thousands of international students and immigrants in the hands of AI and technology companies, we need to discuss, are they and are AI, all that is promised? Are companies like Pearson and Duo Lingo just setting a model that will make them more money, whilst dazzling us with fast delivery that is not necessarily accurate?

Beyond English language test, AI is proliferating quickly but even more quickly are the social media posts promising us the world of creativity and efficiency with no effort. Pearson and Duo Lingo are promising results with much less efforts.

Personally, I believe AI has an important role in making what we do faster and more accurate if we use it in the right way. But there are significant risks and myths with AI examiners and AI in general right now.

I’ve made a list of 5 myths about AI and how they apply to English testing.

1. AI is infallible, AI doesn’t make mistakes like humans.

And so it goes, humans make mistakes, so we are no longer capable of important decision from English scores to legal decisions to running our countries. Meanwhile AI doesn’t make mistake. However, even the tech developers admit that AI frequently ‘hallucinates.’

I love that term, every time AI stuffs up, it’s an ‘hallucination’. Next time you make an error in a test, sorry teacher, it was an ‘hallucination’, my test was actually perfect.

An AI hallucination is when an AI model generates information that is incorrect, fabricated, or nonsensical, yet presents it as if it were accurate and coherent. This can happen for a variety of reasons, such as the AI making incorrect inferences from incomplete or ambiguous data, or limitations in its training.

A human makes an error when they give an incorrect answer, a fabric answer or just make stuff up. This can happen for a variety of reasons such as make an illogical conclusion because they mixed up information, didn’t learn the required information or enough of it, learn the wrong information or lie because they don’t know.

Same, same… not different?

2. AI is never biased.

It’s called racial profiling, and it’s based on data.

It’s a massive issue in airports around the world and police forces in some places, that a person with a certain surname, colour or religious dress, is pulled aside on suspicious but on racial profiling.

Even amongst native accents in the US and the UK there have traditionally been prejudice based on class and region, though this has dissipated in recent generation it seems AI has taken us back to the dark ages. In the study, a Global Performance Disparities Between English-Language Accents in Automated Speech Recognition, systems exhibit biases against certain English accents, particularly those associated with marginalized groups as well as non-native speakers.

So, if a certain government decides that they want to keep a certain nationality out of their immigration process, it would be hugely dangerous to ask human examiners to be prejudice. It takes thousands of examiners to do the job of AI, and there would be whistleblowers. But it would be extremely easy for companies who run examiners to tweak their AI against certain accents or regions. How would we know?

3. AI is egalitarian and evens the playing field for minorities.

Algorithm favour the majority over the minority. If the majority of test-takers are Indian and Chinese, and the majority are from Latvia and Estonia, how does that effect the way it trains on data.

According to Pearson literature, they compare “:

Anecdotally, I can say that I meet Indians and Nepalese that did the test 20 times to get a 90, and their English is about a 6.5 in IELTS. I struggled with their accent, and yet, these are the accents that Pearson has a lot of data on at 90 level. Meanwhile, I can cite an Italian scientist who could not get 55 on Pearson. He has a strong Italian accent, but he could easily have a sophisticated conversation in English and had lived and worked in Australia for years. Yet, being older, accent correction was going to be difficult, so he just went and did IELTS.

If most 90s are Indians using techniques rather than taking the test naturally, the data is screwed.

And Pearson have listened to feedback from university and immigration departments about the failure of their speaking test and recently added human overview of certain parts of the test. Because the AI algorithms haven’t been working without human oversight.

4. Human emotion and instincts interfere with truth and efficiency; we must be rid of them.

The idea that only data can lead to good decisions leaves humanity open to manipulation. Our best defence has always been instinct but already we are being told we can’t trust our own instincts we need to rely on AI and data. And as I’ve mentioned that answer can’t be questions.

The first time I gave a 9 in the IELTs test was about a month into examining. The candidate gave such an engaging 2-minute monolog that it was easy to follow up with questions. At the end of the test, it felt like the 13 minutes was about 2 minutes.

When the candidate walked out the room, I remembered I had forgotten to take mental notes of grammar and vocabulary to justify my score because I was involved in their story and enjoyed the candidate’s conversation. I said to myself, it must be 9.

(Many people don’t know, but examiners are not supposed to listen back to the recording, they are supposed to be grading in progress. But I was nervous and listened back, and yep, every element was a 9).

Many of us have met people who tell us the right thing, and yet we know they are lying. And later you find out they ripped off a bunch of your friends who followed his words not his accent.

This happened a few times until I was confident to know that often instincts are a short cut to the mechanical way of doing things.

And I must say, once, I had such an effortless speaking whose stories I did not enjoy. (The guy had a malicious attitude to women, and I felt a little scared to be alone in the room with him). I knew all the way through his painful opinions, he was a 9. I checked back on the recording, looking for reasons not to give him 9, but I couldn’t. I hope he failed at another point in the immigration process as I feel he was dangerous for women. But I had to give him a 9. So, instinct it does not bias towards people with nice personality, but you do need to check on your emotions as well.

And I must say, once, I had such an effortless candidate whose stories I did not enjoy (this guy had a malicious attitude to women, and I felt a little scared to be alone in the room with him). I knew his English level was 9 despite his character quota being zero. I needed to give him the score according to the rubric because I would very likely be rechecked if I gave the wrong score and lose work.

5. Humans are corrupt, AI is pure.

But, humans can also be punished for taking bribes; AI is untouchable and there is no consequence for its corrupt decision or for those who programmed such decisions.

Our instincts, our conscience and our fear of consequences keep most of us honest most of the time.

AI and social media have brought highly creative scammers using AI to impersonate, manipulate, after spying secretly into phones and our homes. So, don’t tell me that corrupt individuals associated with AI examine companies, just like they have corrupted employees of the IDP IELTS subcontractors in India. Accept, when Planet Edu employees get arrested and charged for taking bribes from immigration agents, AI cannot be punished, and the criminals can hide offshore.

I recently went to a Duo Lingo presentation in Sydney where they went into all the layers of security and the way they are hire online monitors in India at a high wage to look further over all these layers. But no matter how high the salary (it’s not that high) individuals in places like India see remote work as insecure and that they can wake up tomorrow and find their contracted ended, so the temptation to work with scammers is still huge.

At the end of the day, AI is creating jobs in AI security to monitor and check the integrity of test, while replacing the human elements of the test.

The inevitable AI future

My final bit? Well, I actually think AI test and human tests have their place. I think that AI tests are probably better for non-native who study the likes of science and technology and are most likely to work remotely and not ever need or want to integrate into an English-speaking country.

In these jobs, communication can be done through technology, and the candidate doesn’t have to waste time in understanding language they will never use.

Meanwhile, for those who are in profession that require soft skills and hoping to integrate into a new English-speaking community, a human test will make a softer landing in the new country.

Bias is perhaps something we will never be able to avoid in real life and the cyberworld. Perhaps we just have to be aware of this and make choices about how to minimise that bias.