RAFT (Real-world Annotated Few-Shot)

The Real-world annotated few-shot (RAFT) meta-benchmark of 11 real-world text classification tasks (Alex et al., 2021).

  • Task: text classification
  • What: ?
  • When: ?
  • Who: ?
  • Language: English
  1. EM

  2. ECE (10-bin)

  3. EM (Robustness)

  4. EM (Fairness)

  5. Stereotypes (race)

  6. Stereotypes (gender)

  7. Representation (race)

  8. Representation (gender)

  9. Toxic fraction

  10. Denoised inference time (s)

  11. # eval

  12. # train

  13. truncated

  14. # prompt tokens

  15. # output tokens

  16. # trials

035070010501400J1-Jumbo v1(178B)J1-Large v1(7.5B)J1-Grande v1(17B)J1-Grande v2beta (17B)Jurassic-2 Jumbo(178B)Jurassic-2Grande (17B)Jurassic-2 Large(7.5B)Luminous Base(13B)LuminousExtended (30B)LuminousSupreme (70B)Anthropic-LMv4-s3 (52B)BLOOM (176B)T0pp (11B)Cohere xlargev20220609(52.4B)Cohere largev20220720(13.1B)Cohere mediumv20220720(6.1B)Cohere smallv20220720(410M)Cohere xlargev20221108(52.4B)Cohere mediumv20221108(6.1B)CohereCommand beta(6.1B)CohereCommand beta(52.4B)GPT-J (6B)GPT-NeoX (20B)T5 (11B)UL2 (20B)OPT (175B)OPT (66B)TNLG v2 (530B)TNLG v2 (6.7B)davinci (175B)curie (6.7B)babbage (1.3B)ada (350M)text-davinci-003text-davinci-002text-curie-001text-babbage-001text-ada-001gpt-3.5-turbo-0301RedPajama-INCITE-Base-v1(3B)GLM (130B)InstructPalmyra(30B)Palmyra X (43B)YaLM (100B)