GPT-4 and Claude 2 both scored above the approximate passing threshold for the MPRE, with GPT-4 even outperforming the average human test-taker.
In a groundbreaking development, two generative AI large language models (LLMs) successfully passed the Multistate Professional Responsibility Exam (MPRE), marking a significant milestone in legal technology. According to a new study conducted by contract review and drafting startup LegalOn Technologies, both OpenAI’s GPT-4 and Anthropic’s Claude 2 were tested without any specific training about legal ethics.
The MPRE is a two-hour, 60-question multiple-choice examination that is administered three times per year. Developed by the National Conference of Bar Examiners (NCBE), the MPRE is required for admission to the bars of all but two U.S. jurisdictions. The exam is designed to measure the examinee’s knowledge and understanding of established standards related to a lawyer’s professional conduct.
The Study
The study, conducted by Gabor Melli, PhD, VP of AI at LegalOn Technologies, Daniel Lewis, JD, US CEO of LegalOn Technologies and Professor Dru Stevenson, JD, South Texas College of Law Houston, tested OpenAI's GPT-4 and GPT-3.5, Anthropic's Claude 2, and Google's PaLM 2 Bison to 100 simulated exams.
Among these, GPT-4 performed best, answering 74% of questions correctly, an estimated 6% better than the average human test-taker. GPT-4 and Claude 2 both scored above the approximate passing threshold for the MPRE, estimated to range between 56-64% depending on the jurisdiction. GPT-3.5 and PaLM 2 Bison both scored below the estimated passing threshold.
Additionally, LLMs were found to have performed better in certain subject areas. For example, GPT-4 scored higher on topics related to conflicts of interest and client relationships, and lower in areas like safekeeping of funds.
GPT-4 Passes UBE
The results of the LegalOn Technologies study follow earlier research this year that tested a preliminary version of GPT-4 against prior generations of GPT on the entire Uniform Bar Examination (UBE). The UBE not only includes the multiple-choice Multistate Bar Examination (MBE), but also the open-ended Multistate Essay Exam (MEE) and Multistate Performance Test (MPT) components. This comprehensive evaluation allowed the researchers to assess the AI’s ability to handle a variety of question types and legal scenarios.
On the MBE, GPT-4 significantly outperformed both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas. This is a remarkable achievement, as it shows that the AI was able to understand and correctly answer complex legal questions.
The MEE and MPT, which have not previously been evaluated by scholars, posed a different challenge. These components required the AI to generate longer, more detailed responses. Despite this, GPT-4 scored an average of 4.2/6.0, as compared to much lower scores for ChatGPT. This suggests that the AI has made significant strides in its ability to generate coherent, relevant, and accurate legal arguments.
When graded across the UBE components, in the manner in which a human test-taker would be, GPT-4 scored approximately 297 points. This is significantly in excess of the passing threshold for all UBE jurisdictions. These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.
Implications and Future Directions
The success of ChatGPT in passing the MPRE opens up exciting possibilities for the use of AI in the legal field. AI chatbots could potentially assist in legal research, document review, and even provide basic legal advice, thereby increasing efficiency and reducing costs.
“This research advances our understanding of how AI can assist lawyers and helps us assess its current strengths and limitations,” LegalOn Technologies CEO Daniel Lewis said in a press release. “We are not suggesting that AI knows right from wrong or that its behavior is guided by moral principles, but these findings do indicate that AI has potential to support ethical decision-making.”
As AI takes center stage in legal decision-making, the ethical terrain becomes increasingly intricate to navigate. At its core, the ethical framework of transparency, fairness, and accountability serves as the guiding light to ensure that AI-infused decisions align with the exacting ethical standards expected within the legal domain.
As we move forward, it will be crucial to continue exploring the potential applications of AI in law, while also addressing the ethical and practical challenges that arise.
Published weekly on Friday, the Legal.io Newsletter covers the latest in legal, talent & tech.
SEC announced its enforcement results for fiscal year 2023, reporting the following key metrics and highlights: 3% increase in number of actions filed, $4.949 billion in financial remedies, individual accountability remains a priority, continued focus on meaningful cooperation, and a record-breaking year for Whistleblower Program.
In-house legal professionals share their advice for recent law school graduates in their their first in-house positions.
Published weekly on Friday, the Legal.io Newsletter covers the latest in legal, talent & tech.
Reddit consists of a huge number of communities that host vivid discussion on a wide variety of topics. Here are twenty subreddits frequented by legal professionals, and those interested in the law.
Thomson Reuters announced the close of its acquisition of Casetext.
Understanding the challenges and embracing the opportunities of an ever-changing legal landscape.
Newly released data by the EEOC shows significant gaps in national median pay between men and women.
Starting February 2, 2025, AI literacy is required under the EU AI Act. If your company uses AI system outputs in the EU or your company is providing AI systems there, now’s the time to prepare.