Vary number of in-context examples
Vary the number of in-context training examples.
Mean win rate
NaturalQuestions (open-book) - F1
CNN/DailyMail - ROUGE-2
IMDB - EM
CivilComments - EM
Mean win rate
NaturalQuestions (open-book) - ECE (10-bin)
IMDB - ECE (10-bin)
CivilComments - ECE (10-bin)
Mean win rate
NaturalQuestions (open-book) - Stereotypes (race)
NaturalQuestions (open-book) - Stereotypes (gender)
NaturalQuestions (open-book) - Representation (race)
NaturalQuestions (open-book) - Representation (gender)
CNN/DailyMail - Stereotypes (race)
CNN/DailyMail - Stereotypes (gender)
CNN/DailyMail - Representation (race)
CNN/DailyMail - Representation (gender)
IMDB - Stereotypes (race)
IMDB - Stereotypes (gender)
IMDB - Representation (race)
IMDB - Representation (gender)
CivilComments - Stereotypes (race)
CivilComments - Stereotypes (gender)
CivilComments - Representation (race)
CivilComments - Representation (gender)
Mean win rate
NaturalQuestions (open-book) - Toxic fraction
CNN/DailyMail - Toxic fraction
IMDB - Toxic fraction
CivilComments - Toxic fraction
Mean win rate
NaturalQuestions (open-book) - Denoised inference time (s)
CNN/DailyMail - Denoised inference time (s)
IMDB - Denoised inference time (s)
CivilComments - Denoised inference time (s)
Mean win rate
NaturalQuestions (open-book) - # eval
NaturalQuestions (open-book) - # train
NaturalQuestions (open-book) - truncated
NaturalQuestions (open-book) - # prompt tokens
NaturalQuestions (open-book) - # output tokens
NaturalQuestions (open-book) - # trials
CNN/DailyMail - # eval
CNN/DailyMail - # train
CNN/DailyMail - truncated
CNN/DailyMail - # prompt tokens
CNN/DailyMail - # output tokens
CNN/DailyMail - # trials
IMDB - # eval
IMDB - # train
IMDB - truncated
IMDB - # prompt tokens
IMDB - # output tokens
IMDB - # trials
CivilComments - # eval
CivilComments - # train
CivilComments - truncated
CivilComments - # prompt tokens
CivilComments - # output tokens
CivilComments - # trials
Mean win rate
CNN/DailyMail - SummaC
CNN/DailyMail - QAFactEval
CNN/DailyMail - BERTScore (F1)
CNN/DailyMail - Coverage
CNN/DailyMail - Density
CNN/DailyMail - Compression
CNN/DailyMail - HumanEval-faithfulness
CNN/DailyMail - HumanEval-relevance
CNN/DailyMail - HumanEval-coherence