MMLU (Massive Multitask Language Understanding)
The Massive Multitask Language Understanding (MMLU) benchmark for knowledge-intensive question answering across 57 domains (Hendrycks et al., 2021).
- Task: question answering
- What: ?
- When: ?
- Who: ?
- Language: English
EM
ECE (10-bin)
EM (Robustness)
EM (Fairness)
Denoised inference time (s)
# eval
# train
truncated
# prompt tokens
# output tokens
# trials