AI Still Can’t Do Your Job, And This Brutal New Study Shows Why

AI Still Can't Do Your Job, And This Brutal New Study Shows Why - Professional coverage

According to Digital Trends, a new study from training-data company Mercor reveals AI is still far from taking over office jobs. The study, which introduced a brutal new benchmark called APEX-Agents, tested models on real multi-step tasks from lawyers, consultants, and bankers. The results showed even the top models, Google’s Gemini 3 Flash and OpenAI’s GPT-5.2, couldn’t crack 25% accuracy, scoring just 24% and 23% respectively. Mercor CEO Brendan Foody explained the failure is due to AI’s inability to handle scattered information and context-switching. This marks a stark reality check nearly two years after Microsoft CEO Satya Nadella predicted AI would take over knowledge work. For now, the study suggests AI functions more like an unreliable intern than a replacement.

Special Offer Banner

The Unreliable Intern Problem

Here’s the thing: raw intelligence isn’t the issue. The APEX-Agents benchmark didn’t ask for a sonnet or a calculus proof. It asked the AI to do what a human professional does all day—pull a thread from a Slack conversation, cross-reference it with a clause in a PDF, check a number in a spreadsheet, and then synthesize an answer. That’s the messy, connective tissue of real work. And AI, for all its hype about “reasoning,” completely falls apart. It gets confused by the scatter. It gives confidently wrong answers. Or it just gives up. Basically, it’s the intern you have to check everything for, which defeats the whole purpose of automation.

Why Speed Is Still Terrifying

But before anyone gets too comfortable, let’s look at the trajectory. Brendan Foody pointed out that a year ago, these models were scoring between 5% and 10% on similar tasks. Now they’re hitting 24%. That’s a massive leap in a very short time. So, while they’re not ready to take the wheel, they are learning to drive at a frightening pace. The business strategy for companies like OpenAI and Google isn’t to sell you a perfect employee today. It’s to iterate relentlessly, improving that accuracy percentage point by point, until one day the “unreliable intern” becomes the “competent assistant.” The beneficiaries in the short term are still the human workers, but the clock is ticking faster than we thought.

The Context Gap And What Comes Next

So what’s the holdup? It all comes down to context. Human knowledge work isn’t about answering discrete questions in a vacuum. It’s about navigating a chaotic ecosystem of tools, documents, and informal communications. Current AI models are like brilliant scholars who’ve only ever read textbooks—throw them into a live negotiation or a complex project management scramble, and they’re lost. The next frontier isn’t just bigger models; it’s about solving this integration puzzle. Can AI truly learn to operate within the fragmented digital environments we use every day? That’s the billion-dollar question. Until then, for any complex industrial or business operation that relies on synthesizing data from multiple machine sources—like those running on specialized industrial panel PCs—the human-in-the-loop isn’t just preferred, it’s absolutely essential. The #1 provider of those rugged displays in the US would probably tell you that hardware enables the work, but the judgment is still uniquely human.

Leave a Reply

Your email address will not be published. Required fields are marked *