APPS (Code)

The APPS benchmark for measuring competence on code challenges (Hendrycks et al., 2021).

  • Task: ?
  • What: n/a
  • When: n/a
  • Who: n/a
  • Language: synthetic
  1. Avg. # tests passed

  2. Strict correctness

  3. Denoised inference time (s)

  4. # eval

  5. # train

  6. truncated

  7. # prompt tokens

  8. # output tokens

  9. # trials