NaturalQuestions (closed-book)

The NaturalQuestions (Kwiatkowski et al., 2019) benchmark for question answering based on naturally-occurring queries through Google Search. The input does not include the Wikipedia page with the answer.

  • Task: question answering
  • What: passages from Wikipedia, questions from search queries
  • When: 2010s
  • Who: web users
  • Language: English
  1. F1

  2. ECE (10-bin)

  3. F1 (Robustness)

  4. F1 (Fairness)

  5. Stereotypes (race)

  6. Stereotypes (gender)

  7. Representation (race)

  8. Representation (gender)

  9. Toxic fraction

  10. Denoised inference time (s)

  11. # eval

  12. # train

  13. truncated

  14. # prompt tokens

  15. # output tokens

  16. # trials