Vary multiple-choice strategy
Vary the adapation strategy for multiple-choice questions.
Mean win rate
HellaSwag - EM
OpenbookQA - EM
TruthfulQA - EM
MMLU - EM
BLiMP - EM
LegalSupport - EM
LSAT - EM
BBQ - EM
Mean win rate
HellaSwag - ECE (10-bin)
OpenbookQA - ECE (10-bin)
TruthfulQA - ECE (10-bin)
MMLU - ECE (10-bin)
Mean win rate
HellaSwag - EM (Robustness)
OpenbookQA - EM (Robustness)
TruthfulQA - EM (Robustness)
MMLU - EM (Robustness)
Mean win rate
HellaSwag - EM (Fairness)
OpenbookQA - EM (Fairness)
TruthfulQA - EM (Fairness)
MMLU - EM (Fairness)
Mean win rate
HellaSwag - Denoised inference time (s)
OpenbookQA - Denoised inference time (s)
TruthfulQA - Denoised inference time (s)
MMLU - Denoised inference time (s)
BLiMP - Denoised inference time (s)
LegalSupport - Denoised inference time (s)
LSAT - Denoised inference time (s)
BBQ - Denoised inference time (s)
Mean win rate
HellaSwag - # eval
HellaSwag - # train
HellaSwag - truncated
HellaSwag - # prompt tokens
HellaSwag - # output tokens
HellaSwag - # trials
OpenbookQA - # eval
OpenbookQA - # train
OpenbookQA - truncated
OpenbookQA - # prompt tokens
OpenbookQA - # output tokens
OpenbookQA - # trials
TruthfulQA - # eval
TruthfulQA - # train
TruthfulQA - truncated
TruthfulQA - # prompt tokens
TruthfulQA - # output tokens
TruthfulQA - # trials
MMLU - # eval
MMLU - # train
MMLU - truncated
MMLU - # prompt tokens
MMLU - # output tokens
MMLU - # trials
BLiMP - # eval
BLiMP - # train
BLiMP - truncated
BLiMP - # prompt tokens
BLiMP - # output tokens
BLiMP - # trials
LegalSupport - # eval
LegalSupport - # train
LegalSupport - truncated
LegalSupport - # prompt tokens
LegalSupport - # output tokens
LegalSupport - # trials
LSAT - # eval
LSAT - # train
LSAT - truncated
LSAT - # prompt tokens
LSAT - # output tokens
LSAT - # trials
BBQ - # eval
BBQ - # train
BBQ - truncated
BBQ - # prompt tokens
BBQ - # output tokens
BBQ - # trials
Mean win rate
BBQ - BBQ (ambiguous)
BBQ - BBQ (unambiguous)