AI Systems Solve Just 2% Of Advanced Maths Problems In New Benchmark Test

Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week.

The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath’s problems, which span computational number theory to algebraic geometry and require complex reasoning.

“These are extremely challenging. […] The only way to solve them is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages,” Tao said. The problems are designed to be “guessproof,” with large numerical answers or complex mathematical objects as solutions, making it nearly impossible to solve without proper mathematical reasoning.

Further reading: New secret math benchmark stumps AI models and PhDs alike.

Read more of this story at Slashdot. Read More

“Uddhav Thackeray Abandoned Bal Thackeray’s Ideology”: Eknath Shinde To NDTV

“No Promoting Drugs”: Diljit Dosanjh Gets Notice Ahead Of Hyderabad Concert

Pollution Chokes Delhi, New Restrictions In Place, Schools Go Online

Meet Evo, the DNA-trained AI That Creates Genomes From Scratch

Loves Football, 6 Feet 9 Inches Tall: What You Need To Know About Barron Trump

AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test

Comments

Leave a Reply Cancel reply

“Uddhav Thackeray Abandoned Bal Thackeray’s Ideology”: Eknath Shinde To NDTV

“No Promoting Drugs”: Diljit Dosanjh Gets Notice Ahead Of Hyderabad Concert

Pollution Chokes Delhi, New Restrictions In Place, Schools Go Online

Meet Evo, the DNA-trained AI That Creates Genomes From Scratch