CNN/DailyMail
The CNN/DailyMail benchmark for text summarization (Hermann et al., 2015; Nallapati et al.,2016).
- Task: summarization
- What: ?
- When: ?
- Who: ?
- Language: English
ROUGE-2
SummaC
QAFactEval
BERTScore (F1)
Coverage
Density
Compression
HumanEval-faithfulness
HumanEval-relevance
HumanEval-coherence
Stereotypes (race)
Stereotypes (gender)
Representation (race)
Representation (gender)
Toxic fraction
Denoised inference time (s)
# eval
# train
truncated
# prompt tokens
# output tokens
# trials