A New Apple Study Shows AI Reasoning Has Critical Flaws

Front page > AI > A New Apple Study Shows AI Reasoning Has Critical Flaws

A New Apple Study Shows AI Reasoning Has Critical Flaws

Published on 2024-11-04

Browse:294

It’s no surprise that AI doesn’t always get things right. Occasionally, it even hallucinates. However, a recent study by Apple researchers has shown even more significant flaws within the mathematical models used by AI for formal reasoning.

✕ Remove Ads

As part of the study, Apple scientists asked an AI Large Language Model (LLM) a question, multiple times, in slightly varying ways, and were astounded when they found the LLM offered unexpected variations in the answers. These variations were most prominent when numbers were involved.

Apple's Study Suggests Big Problems With AI's Reliability

A New Apple Study Shows AI Reasoning Has Critical Flaws

The research, published by arxiv.org, concluded there was “significant performance variability across different instantiations of the same question, challenging the reliability of current GSM8K results that rely on single point accuracy metrics.” GSM8K is a dataset which includes over 8000 diverse grade-school math questions and answers.

✕ Remove Ads

Apple researchers identified the variance in this performance could be as much as 10%. And even slight variations in prompts can cause colossal problems with the reliability of the LLM’s answers.

In other words, you might want to fact-check your answers anytime you use something like ChatGPT. That's because, while it may sometimes look like AI is using logic to give you answers to your inquiries, logic isn’t what’s being used.

AI, instead, relies on pattern recognition to provide responses to prompts. However, the Apple study shows how changing even a few unimportant words can alter that pattern recognition.

One example of the critical variance presented came about through a problem regarding collecting kiwis over several days. Apple researchers conducted a control experiment, then added some inconsequential information about kiwi size.

✕ Remove Ads

Both Meta and OpenAI Models Showed Issues

Meta’s Llama, and OpenAI’s o1, then altered their answers to the problem from the control despite kiwi size data having no tangible influence on the problem’s outcome. OpenAI’s GPT-4o also had issues with its performance when introducing tiny variations in the data given to the LLM.

Since LLMs are becoming more prominent in our culture, this news raises a tremendous concern about whether we can trust AI to provide accurate answers to our inquiries. Especially for issues like financial advice. It also reinforces the need to accurately verify the information you receive when using large language models.

That means you'll want to do some critical thinking and due diligence instead of blindly relying on AI. Then again, if you're someone who uses AI regularly, you probably already knew that.

✕ Remove Ads

Release Statement This article is reproduced at: https://www.makeuseof.com/apple-study-reveals-ai-reasoning-critical-flaws/ If there is any infringement, please contact [email protected] to delete it

Latest tutorial More>

Gender Detection with OpenCV and Roboflow in Python - Analytics Vidhya
Introduction Gender detection from facial images is one of the many fascinating applications of computer vision. In this project, we combine OpenCV fo...

AI Posted on 2025-04-29
Machine Thinking First: The Rise of Strategic AI
STRATEGIC AI Prologue 11. May 1997, New York City. It was a beautiful spring day in New York City. The skies were clear, and temperatures were climbin...

AI Posted on 2025-04-29
8 essential free and paid API recommendations for LLM
Harnessing the Power of LLMs: A Guide to APIs for Large Language Models In today's dynamic business landscape, APIs (Application Programming Inter...

AI Posted on 2025-04-21
User Guide: Falcon 3-7B Instruct Model
TII's Falcon 3: A Revolutionary Leap in Open-Source AI TII's ambitious pursuit of redefining AI reaches new heights with the advanced Falcon 3...

AI Posted on 2025-04-20
DeepSeek-V3 vs. GPT-4o and Llama 3.3 70B: The Strongest AI Model Revealed
The evolution of AI language models has set new standards, especially in the coding and programming landscape. Leading the c...

AI Posted on 2025-04-18
Top 5 AI intelligent budgeting tools
Unlocking Financial Freedom with AI: Top Budgeting Apps in India Are you tired of constantly wondering where your money goes? Do bills seem to devour...

AI Posted on 2025-04-17
Detailed explanation of Excel SUMPRODUCT function - School of Data Analysis
Excel's SUMPRODUCT Function: A Data Analysis Powerhouse Unlock the power of Excel's SUMPRODUCT function for streamlined data analysis. This ve...

AI Posted on 2025-04-16
In-depth research is fully open, ChatGPT Plus user benefits
OpenAI's Deep Research: A Game-Changer for AI Research OpenAI has unleashed Deep Research for all ChatGPT Plus subscribers, promising a significan...

AI Posted on 2025-04-16
Amazon Nova Today Real Experience and Review - Analytics Vidhya
Amazon Unveils Nova: Cutting-Edge Foundation Models for Enhanced AI and Content Creation Amazon's recent re:Invent 2024 event showcased Nova, its ...

AI Posted on 2025-04-16
5 ways to use ChatGPT timing task function
ChatGPT's New Scheduled Tasks: Automate Your Day with AI ChatGPT recently introduced a game-changing feature: Scheduled Tasks. This allows users ...

AI Posted on 2025-04-16
Which of the three AI chatbots respond to the same prompt is the best?
With options like Claude, ChatGPT, and Gemini, choosing a chatbot can feel overwhelming. To help cut through the noise, I put all three to the test us...

AI Posted on 2025-04-15
ChatGPT is enough, no dedicated AI chat machine is needed
In a world with new AI chatbots launching daily, it can be overwhelming to decide which one is the right “one.” But in my experience, ChatGPT handles ...

AI Posted on 2025-04-14
Indian AI Moment: Competition with China and the United States in Generative AI
India's AI Ambitions: A 2025 Update With China and the US heavily investing in Generative AI, India is accelerating its own GenAI initiatives. Th...

AI Posted on 2025-04-13
Automating import of CSV to PostgreSQL using Airflow and Docker
This tutorial demonstrates building a robust data pipeline using Apache Airflow, Docker, and PostgreSQL to automate data transfer from CSV files to a ...

AI Posted on 2025-04-12
Swarm Intelligence Algorithms: Three Python Implementations
Imagine watching a flock of birds in flight. There's no leader, no one giving directions, yet they swoop and glide together in perfect harmony. It may...

AI Posted on 2025-03-24