AI isn’t ready to replace human coders for debugging, researchers say

Agents using debugging tools drastically outperformed those that didn’t, but their success rate still wasn’t high enough.
Credit:

Microsoft Research

While AI models perform better when paired with debugging tools, their overall success rate remains too low to fully replace human coders—especially in debugging tasks—according to a new study.

Microsoft Research examined the performance of various AI agents using the SWE-bench benchmark. The results showed that debugging tools significantly boosted success rates: Claude 3.7, for example, achieved a 48.4% success rate with debugging (compared to 37.2% without), while OpenAI’s models also showed improvements, with OpenAI 3-mini jumping from 8.5% to 22.1%—a 160% increase. Still, none of the models achieved performance levels that would make them reliable stand-ins for human developers.

The study suggests that current AI models struggle in part because their training data isn’t well suited to sequential decision-making tasks like debugging. Moreover, these tools don’t yet fully understand how to optimally use the debugging information provided.

The report emphasizes that this is just the beginning. The next step involves developing more refined “info-seeking models” that are better at gathering relevant information to solve bugs. In cases where using large models incurs high computational costs, smaller models could be used to gather essential details before handing the task off to a larger AI system.

This isn’t the first time AI’s limitations have been highlighted. While AI tools can sometimes generate seemingly functional code for narrow use cases, they often introduce bugs and security flaws—and typically lack the capability to fix them.

Researchers agree that the future of AI coding agents lies in tools that assist developers rather than replace them. The most realistic goal for now is to build agents that save developers significant time, not ones that can independently handle all aspects of software de

velopment.

source

ذات صلة

Tim Cook reportedly tells employees Apple ‘must’ win in AI

Today I’m toying with

WIRED Roundup: ChatGPT Goes Full Demon Mode

جمع

Tim Cook reportedly tells employees Apple ‘must’ win in AI

Today I’m toying with

WIRED Roundup: ChatGPT Goes Full Demon Mode

Today's NYT Connections Hints, Answers and Help for Aug. 3, #784

Five children see HIV viral loads vanish after taking antiretroviral drugs

AI isn’t ready to replace human coders for debugging, researchers say

روابط مهمة

الأكثر شهرة

Tim Cook reportedly tells employees Apple ‘must’ win in AI

Today I’m toying with

WIRED Roundup: ChatGPT Goes Full Demon Mode

أحدث المقالات

Tim Cook reportedly tells employees Apple ‘must’ win in AI

Today I’m toying with

WIRED Roundup: ChatGPT Goes Full Demon Mode