Calibration
Extended calibration metrics.
Mean win rate
MMLU - Max prob
MMLU - ECE (1-bin)
MMLU - ECE (10-bin)
MMLU - Selective Acc
MMLU - Acc@10%
MMLU - Platt-scaled ECE (1-bin)
MMLU - Platt-scaled ECE (10-bin)
MMLU - Platt Coef
MMLU - Platt Intercept
IMDB - Max prob
IMDB - ECE (1-bin)
IMDB - ECE (10-bin)
IMDB - Selective Acc
IMDB - Acc@10%
IMDB - Platt-scaled ECE (1-bin)
IMDB - Platt-scaled ECE (10-bin)
IMDB - Platt Coef
IMDB - Platt Intercept
RAFT - Max prob
RAFT - ECE (1-bin)
RAFT - ECE (10-bin)
RAFT - Selective Acc
RAFT - Acc@10%
RAFT - Platt-scaled ECE (1-bin)
RAFT - Platt-scaled ECE (10-bin)
RAFT - Platt Coef
RAFT - Platt Intercept
CivilComments - Max prob
CivilComments - ECE (1-bin)
CivilComments - ECE (10-bin)
CivilComments - Selective Acc
CivilComments - Acc@10%
CivilComments - Platt-scaled ECE (1-bin)
CivilComments - Platt-scaled ECE (10-bin)
CivilComments - Platt Coef
CivilComments - Platt Intercept