Groups

JSON
GroupDescriptionAdaptation method# instances# references# prompt tokens# completion tokens# models
Core scenariosThe scenarios where we evaluate all the models.ranking_binary, multiple_choice_separate_calibrated, multiple_choice_separate_original, generation, multiple_choice_joint258.6843.50692516043.484856340.56344.000
Targeted evaluationsTargeted evaluation of specific skills (e.g., knowledge, reasoning) and risks (e.g., disinformation, memorization/copyright).multiple_choice_separate_calibrated, multiple_choice_separate_original, generation, multiple_choice_joint, language_modeling270.9943.07824937569.2013241888.92446.000