MMLU (Massive Multitask Language Understanding)

The Massive Multitask Language Understanding (MMLU) benchmark for knowledge-intensive question answering across 57 domains (Hendrycks et al., 2021).

  • Task: question answering
  • What: ?
  • When: ?
  • Who: ?
  • Language: English
  1. EM

  2. ECE (10-bin)

  3. EM (Robustness)

  4. EM (Fairness)

  5. Denoised inference time (s)

  6. # eval

  7. # train

  8. truncated

  9. # prompt tokens

  10. # output tokens

  11. # trials