Calibration

Extended calibration metrics.

  1. Mean win rate

  2. MMLU - Max prob

  3. MMLU - ECE (1-bin)

  4. MMLU - ECE (10-bin)

  5. MMLU - Selective Acc

  6. MMLU - Acc@10%

  7. MMLU - Platt-scaled ECE (1-bin)

  8. MMLU - Platt-scaled ECE (10-bin)

  9. MMLU - Platt Coef

  10. MMLU - Platt Intercept

  11. IMDB - Max prob

  12. IMDB - ECE (1-bin)

  13. IMDB - ECE (10-bin)

  14. IMDB - Selective Acc

  15. IMDB - Acc@10%

  16. IMDB - Platt-scaled ECE (1-bin)

  17. IMDB - Platt-scaled ECE (10-bin)

  18. IMDB - Platt Coef

  19. IMDB - Platt Intercept

  20. RAFT - Max prob

  21. RAFT - ECE (1-bin)

  22. RAFT - ECE (10-bin)

  23. RAFT - Selective Acc

  24. RAFT - Acc@10%

  25. RAFT - Platt-scaled ECE (1-bin)

  26. RAFT - Platt-scaled ECE (10-bin)

  27. RAFT - Platt Coef

  28. RAFT - Platt Intercept

  29. CivilComments - Max prob

  30. CivilComments - ECE (1-bin)

  31. CivilComments - ECE (10-bin)

  32. CivilComments - Selective Acc

  33. CivilComments - Acc@10%

  34. CivilComments - Platt-scaled ECE (1-bin)

  35. CivilComments - Platt-scaled ECE (10-bin)

  36. CivilComments - Platt Coef

  37. CivilComments - Platt Intercept