The XSUM benchmark for text summarization of BBC news articles (Narayan et al., 2018).

  • Task: summarization
  • What: ?
  • When: ?
  • Who: ?
  • Language: English
  1. ROUGE-2

  2. SummaC

  3. QAFactEval

  4. BERTScore (F1)

  5. Coverage

  6. Density

  7. Compression

  8. HumanEval-faithfulness

  9. HumanEval-relevance

  10. HumanEval-coherence

  11. Stereotypes (race)

  12. Stereotypes (gender)

  13. Representation (race)

  14. Representation (gender)

  15. Toxic fraction

  16. Denoised inference time (s)

  17. # eval

  18. # train

  19. truncated

  20. # prompt tokens

  21. # output tokens

  22. # trials