Researchers have found that despite the hype, generative AI models struggle with complex questions on an undergraduate law exam, suggesting they are not yet ready to replace human capabilities in intellectually demanding tasks. The findings have important implications for the future of education and professional standards. Artificial intelligence should be viewed as a tool to enhance human abilities, rather than a replacement for human expertise.

Generative AI in Academia
So the basic premise of the study that was carried out by a researcher at the University of Wollongong, was to examine whether generative AI models are any closer to possessing human-level academic abilities than they were last year by seeing how well (or not) some were performing on an undergraduate criminal law final exam. There were two parts to the exam — one in which students had to assess a case study about criminal offences and the prospects of successful prosecution, and another with an essay and short answer questions.
Using different AI models, the researcher generated 10 different responses to the exam questions (five from pasting the exam questions into an AI tool without any prompts and five with detailed prompts and related legal content). AI-created answers were then interspersed with the genuine exam responses of students, and all questions were graded anonymously by a group of five different tutors who did not know that a portion of the answers had been created by AI.
Discussing AI and complex legal reasoning
Although the AI excelled in answering the essay-style question, it was rated much lower when it came to those more complex questions requiring sophisticated legal analysis. On average, these answers generated by the AI without any prompting only outperformed 4.3% of students: Two scores were just good enough and three actually failed.Commit myself to this kind of writing work if I want to be a philosopher.interfaces:none; With help from fillers and legal material, the AI-generated answers won over an average of 39.9 percent of students at best (three papers reach a mediocre score, while only two do quite well).
The authors argue that while future is palpable and it looks promising, things should not go overboard as in case of this law exam humans are still safe! A separate study by the same author explains that while AI can replicate the writing style of a human, it does not replicate nuanced SJT reasoning.
Consequences for education and joint work
The findings have profound implications for the future of education and professional performance. They argue AI should be thought of less as a being and more as a tool that can augment human capabilities when properly used. Schools and universities should stop assuming that AI can provide answers, but they might use it instead to give every student effective collaboration skills with the technology proven to improve what kids actually know.
Secondly, the conventional understandings of teaching and test will also have to be reviewed so that AI is able to collaborate with students. For instance, if you postulate that a student who prompts AI created work and goes through the process of verification and editing with an honor code agreement does not become part of the learning ecosystem, you need to re-evaluate what IS integral in education.