APPS (Code)
The APPS benchmark for measuring competence on code challenges (Hendrycks et al., 2021).
- Task: ?
- What: n/a
- When: n/a
- Who: n/a
- Language: synthetic
Avg. # tests passed
Strict correctness
Denoised inference time (s)
# eval
# train
truncated
# prompt tokens
# output tokens
# trials