AI isn’t ready to replace human coders for debugging, researchers say

Agents using debugging tools drastically outperformed those that didn’t, but their success rate still wasn’t high enough.
ائتمان:

Microsoft Research

While AI models perform better when paired with debugging tools, their overall success rate remains too low to fully replace human coders—especially in debugging tasks—according to a new study.

Microsoft Research examined the performance of various AI agents using the SWE-bench benchmark. The results showed that debugging tools significantly boosted success rates: Claude 3.7, for example, achieved a 48.4% success rate with debugging (compared to 37.2% without), while OpenAI’s models also showed improvements, with OpenAI 3-mini jumping from 8.5% to 22.1%—a 160% increase. Still, none of the models achieved performance levels that would make them reliable stand-ins for human developers.

The study suggests that current AI models struggle in part because their training data isn’t well suited to sequential decision-making tasks like debugging. Moreover, these tools don’t yet fully understand how to optimally use the debugging information provided.

The report emphasizes that this is just the beginning. The next step involves developing more refined “info-seeking models” that are better at gathering relevant information to solve bugs. In cases where using large models incurs high computational costs, smaller models could be used to gather essential details before handing the task off to a larger AI system.

This isn’t the first time AI’s limitations have been highlighted. While AI tools can sometimes generate seemingly functional code for narrow use cases, they often introduce bugs and security flaws—and typically lack the capability to fix them.

Researchers agree that the future of AI coding agents lies in tools that assist developers rather than replace them. The most realistic goal for now is to build agents that save developers significant time, not ones that can independently handle all aspects of software de

velopment.

المصدر

ذات صلة

Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

IGN hit by layoffs as parent company Ziff Davis cuts costs

A Hiker Was Missing for Nearly a Year—Until an AI System Recognized His Helmet

جمع

Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

IGN hit by layoffs as parent company Ziff Davis cuts costs

A Hiker Was Missing for Nearly a Year—Until an AI System Recognized His Helmet

Today's NYT Mini Crossword Answers for Tuesday, Aug. 5

iPhone 17 Pro, Max and Air: Release dates, colors and everything else you need to know about Apple's new phones

AI isn’t ready to replace human coders for debugging, researchers say

روابط مهمة

الأكثر شهرة

Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

IGN hit by layoffs as parent company Ziff Davis cuts costs

A Hiker Was Missing for Nearly a Year—Until an AI System Recognized His Helmet

أحدث المقالات

Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

IGN hit by layoffs as parent company Ziff Davis cuts costs

A Hiker Was Missing for Nearly a Year—Until an AI System Recognized His Helmet