بقعة_صورة

ذات صلة

جمع

Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

Indian startup Jeh Aerospace founders Vishal Sanghavi and Venkatesh...

IGN hit by layoffs as parent company Ziff Davis cuts costs

“The company has told us that the reason for...

Today's NYT Mini Crossword Answers for Tuesday, Aug. 5

هل تبحث عن أحدث إجابة للكلمات المتقاطعة المصغرة؟ انقر هنا للحصول على إجابة اليوم...

AI isn’t ready to replace human coders for debugging, researchers say

Agents using debugging tools drastically outperformed those that didn’t, but their success rate still wasn’t high enough.
ائتمان:

Microsoft Research

While AI models perform better when paired with debugging tools, their overall success rate remains too low to fully replace human coders—especially in debugging tasks—according to a new study.

Microsoft Research examined the performance of various AI agents using the SWE-bench benchmark. The results showed that debugging tools significantly boosted success rates: Claude 3.7, for example, achieved a 48.4% success rate with debugging (compared to 37.2% without), while OpenAI’s models also showed improvements, with OpenAI 3-mini jumping from 8.5% to 22.1%—a 160% increase. Still, none of the models achieved performance levels that would make them reliable stand-ins for human developers.

The study suggests that current AI models struggle in part because their training data isn’t well suited to sequential decision-making tasks like debugging. Moreover, these tools don’t yet fully understand how to optimally use the debugging information provided.

The report emphasizes that this is just the beginning. The next step involves developing more refined “info-seeking models” that are better at gathering relevant information to solve bugs. In cases where using large models incurs high computational costs, smaller models could be used to gather essential details before handing the task off to a larger AI system.

This isn’t the first time AI’s limitations have been highlighted. While AI tools can sometimes generate seemingly functional code for narrow use cases, they often introduce bugs and security flaws—and typically lack the capability to fix them.

Researchers agree that the future of AI coding agents lies in tools that assist developers rather than replace them. The most realistic goal for now is to build agents that save developers significant time, not ones that can independently handle all aspects of software de

velopment.

المصدر

بقعة_صورةبقعة_صورة