Corey Morris – Medium

Home

About

Pinned

Corey Morris

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

In examining the low performance of large language models on the Moral Scenarios task, part of the widely-used MMLU benchmark by Hendrycks…

6 min read·Sep 27, 2023

--

1

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

--

1

Pinned

Corey Morris

Preliminary Analysis of MMLU-by-task: Insights from the Evaluation of Over 500 Open Source Models

Recently Hugging face released a dataset of evaluation results for the Measuring Massive Multitask Language Understanding (MMLU)…

5 min read·Aug 7, 2023

--

Preliminary Analysis of MMLU-by-task: Insights from the Evaluation of Over 500 Open Source Models

--

Corey Morris

Corey Morris

Machine Learning Engineer with a strong interest in AI Safety

Following

AI2
Snipd
DeepMind Safety Research
Thomas Simonini
comma ai

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams