Corey Morris – Medium

Corey Morris

Pinned

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

In examining the low performance of large language models on the Moral Scenarios task, part of the widely-used MMLU benchmark by Hendrycks…

Sep 27, 2023

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

Sep 27, 2023

Pinned

Preliminary Analysis of MMLU-by-task: Insights from the Evaluation of Over 500 Open Source Models

Recently Hugging face released a dataset of evaluation results for the Measuring Massive Multitask Language Understanding (MMLU)…

Aug 7, 2023

Preliminary Analysis of MMLU-by-task: Insights from the Evaluation of Over 500 Open Source Models

Aug 7, 2023

Corey Morris

Corey Morris

Machine Learning Engineer with a strong interest in AI Safety

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech