Close Menu
Elon Musk Monitor
  • Home
  • Elon Musk
  • AI
  • Cybertruck
    • DOGE & Cryptocurrency
    • Financial & Business
  • Grok
    • Hyperloop & Urban Mobility
    • Innovations & Future Projects
  • Mars Colonization
  • Neuralink
    • Philanthropy & Humanitarian Efforts
    • Public Perception & Cultural Impact
    • SolarCity & Renewable Energy
  • SpaceX
  • Starlink
  • Tesla
    • The Boring Company
  • X

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Ethereum Weekly Candle Hints At Pre-Tower Top Formation – Details

June 15, 2025

NASA sets new potential launch date for Ax-4 mission to ISS

June 15, 2025

Bitcoin Tests Critical $104K Support – Eyes On $97K If It Breaks

June 15, 2025
Facebook X (Twitter) Instagram
Elon Musk Monitor
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • Home
  • Elon Musk
  • AI
  • Cybertruck
    • DOGE & Cryptocurrency
    • Financial & Business
  • Grok
    • Hyperloop & Urban Mobility
    • Innovations & Future Projects
  • Mars Colonization
  • Neuralink
    • Philanthropy & Humanitarian Efforts
    • Public Perception & Cultural Impact
    • SolarCity & Renewable Energy
  • SpaceX
  • Starlink
  • Tesla
    • The Boring Company
  • X
Elon Musk Monitor
Home » Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems
Grok

Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems

elonmuskBy elonmuskJune 9, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link


Apple published a research paper on Saturday, where researchers examine the strengths and weaknesses of recently released reasoning models. Also known as large reasoning models (LRMs), these are the models that “think” by utilising additional compute to solve complex problems. However, the paper found that even the most powerful models struggle with a complexity issue. Researchers said that when a problem is highly complex, the models experience a total collapse and give up on the problem instead of using more compute, which is something they’re trained to do.

Apple says Reasoning Models Aren’t Really Reasoning Beyond a Level

In a paper titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” published on Apple’s website, the researchers claim both LRMs and large language models (LLMs) without thinking capability behave differently when faced with three regimes of complexity.

The paper has described three regimes of complexity which are low complexity tasks, medium complexity tasks, and high complexity tasks. To test how LLMs and LRMs function when dealing with a wide range of complexities, the researchers decided to use several puzzles that can have an increasing level of difficulty. One puzzle in particular was the Tower of Hanoi.

The Tower of Hanoi is a mathematical puzzle with three pegs and several disks. Disks are arranged in a decreasing order of size to create a pyramid-like shape. The objective of the puzzle is to shift the disks from the leftmost peg to the rightmost peg, while moving one disk at a time. There is a catch — at no time should a larger disk be placed on top of a smaller disk. It is not a very difficult puzzle, and it is often targeted at children between the ages of six and 15.

apple experiment puzzles Apple Research Paper

Mathematical puzzles solved by reasoning models
Photo Credit: Apple

 

Apple researchers chose two reasoning models and their non-reasoning counterparts for this experiment. The LLMs chosen were Claude 3.7 Sonnet and DeepSeek-V3, while the LRMs were Claude 3.7 Sonnet with Thinking and DeepSeek-R1. The thinking budget was maximised at 64,000 tokens each. The aim of the experiment was not just to check the final accuracy, but also the accuracy in logic in choosing the steps to solve the puzzle.

In the low complexity task, up to three disks were added, whereas for the medium complexity task, disk sizes were kept between four to 10. Finally, in the high complexity task, there were between 11-20 disks.

The researchers noted that both LLMs and LRMs displayed equal aptitude in solving the low complexity task. When the difficulty was increased, reasoning models were able to solve the puzzle more accurately, given the extra budget of compute. However, when the tasks reached the high complexity zone, it was found that both models showed a complete collapse of reasoning.

The same experiment was also said to be repeated with more models and more puzzles, such as Checkers Jumping, River Crossing, and Blocks World.

Apple’s research paper highlights the concerns that several others in the artificial intelligence (AI) space have already expressed. While reasoning models can generalise within their distributed datasets, whenever any problem falls beyond them, the models struggle in “thinking,” and either try to take shortcuts in finding the solution, or completely give up and collapse.

“Current evaluations primarily focus on established mathematical and coding benchmarks, emphasising final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality,” the company said in a post.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
elonmusk
  • Website

Related Posts

Android 16 QPR1 Beta 2 Update for Pixel Reportedly Brings New Launch Animation for Gemini Overlay

June 15, 2025

Google, Scale AI’s Largest Customer, Said to Plan Split After Meta Deal

June 14, 2025

Meta AI Discovery Feed Is Reportedly Filled With Users’ Seemingly Private Chats

June 13, 2025
Leave A Reply Cancel Reply

Don't Miss
Cybertruck

Tesla Cybertruck police truck donor revealed

A batch of Tesla Cybertrucks were recently revealed to be a donation to the Las…

Tesla upgrades its ridiculous Cybertruck wiper after owners report issue

February 27, 2025

Tesla Cybertruck contract with State Dept. may have been modified after Biden admin

February 26, 2025

This Tesla Cybertruck feature helped it earn a ‘Best Tech’ award

February 25, 2025
Top Posts

Ethereum Weekly Candle Hints At Pre-Tower Top Formation – Details

June 15, 2025

Bitcoin Tests Critical $104K Support – Eyes On $97K If It Breaks

June 15, 2025

Inverse Head And Shoulders Signals Quiet Surge Ahead

June 15, 2025

On-Chain Analysis Disputes BTC’s $50K Crash

June 15, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Elon Musk Monitor, your go-to source for comprehensive, up-to-date information on the life, work, and innovations of one of the most influential figures in the world today—Elon Musk. Our mission is to keep you informed about Musk’s ventures and projects, ranging from electric vehicles to space exploration, and everything in between. Whether you’re a tech enthusiast, investor, or simply curious about Musk’s impact on the world, we’ve got you covered.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Ethereum Weekly Candle Hints At Pre-Tower Top Formation – Details

June 15, 2025

Bitcoin Tests Critical $104K Support – Eyes On $97K If It Breaks

June 15, 2025

Inverse Head And Shoulders Signals Quiet Surge Ahead

June 15, 2025
Most Popular

How I met my partner on X/Twitter

February 8, 2025

DOGE staffer resigns after racist posts uncovered. Elon Musk might bring him back.

February 9, 2025

OpenAI accuses DeepSeek of stealing data, internet digs into the ‘irony’

February 9, 2025
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 elonmuskmonitor. Designed by elonmuskmonitor.

Type above and press Enter to search. Press Esc to cancel.