MATH

The MATH benchmark for measuring mathematical problem solving on competition math problems (Hendrycks et al., 2021).

  • Task: ?
  • What: n/a
  • When: n/a
  • Who: n/a
  • Language: synthetic
  1. Equivalent

  2. Denoised inference time (s)

  3. # eval

  4. # train

  5. truncated

  6. # prompt tokens

  7. # output tokens

  8. # trials