Language
Targeted evaluation of linguistic capabilities.
Mean win rate
The Pile - BPB
TwitterAAE - BPB
ICE - BPB
BLiMP - EM
The Pile - Denoised inference time (s)
TwitterAAE - Denoised inference time (s)
ICE - Denoised inference time (s)
BLiMP - Denoised inference time (s)
The Pile - # eval
The Pile - # train
The Pile - truncated
The Pile - # prompt tokens
The Pile - # output tokens
The Pile - # trials
TwitterAAE - # eval
TwitterAAE - # train
TwitterAAE - truncated
TwitterAAE - # prompt tokens
TwitterAAE - # output tokens
TwitterAAE - # trials
ICE - # eval
ICE - # train
ICE - truncated
ICE - # prompt tokens
ICE - # output tokens
ICE - # trials
BLiMP - # eval
BLiMP - # train
BLiMP - truncated
BLiMP - # prompt tokens
BLiMP - # output tokens
BLiMP - # trials