MATH (chain-of-thoughts)
The MATH benchmark for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning (Hendrycks et al., 2021).
- Task: ?
- What: n/a
- When: n/a
- Who: n/a
- Language: synthetic
Equivalent (chain of thought)
Denoised inference time (s)
# eval
# train
truncated
# prompt tokens
# output tokens
# trials