Knowledge

Targeted evaluation of knowledge (e.g. factual, cultural, commonsense).

  1. Mean win rate

  2. NaturalQuestions (closed-book) - F1

  3. HellaSwag - EM

  4. OpenbookQA - EM

  5. TruthfulQA - EM

  6. MMLU - EM

  7. WikiFact - EM