Holistic Evaluation of Language Models (HELM) - Group: RAFT (Real-world Annotated Few-Shot)

RAFT (Real-world Annotated Few-Shot)

The Real-world annotated few-shot (RAFT) meta-benchmark of 11 real-world text classification tasks (Alex et al., 2021).

Task: text classification
What: ?
When: ?
Who: ?
Language: English

EM
ECE (10-bin)
EM (Robustness)
EM (Fairness)
Stereotypes (race)
Stereotypes (gender)
Representation (race)
Representation (gender)
Toxic fraction
Denoised inference time (s)
# eval
# train
truncated
# prompt tokens
# output tokens
# trials