Another caveat new research shows that giving autonomous AI agents real-world access can lead to dangerous and uncontrolled behavior. AI agents can be unsafe when given tools, memory, and real-world permissions. The paper, “Agents of Chaos,” presents a red-teaming study where AI agents were given access to persistent memory, email accounts ...... agents: - shared sensitive information with unauthorized users - executed harmful or destructive commands - consumed excessive resources leading to system instability - allowed identity spoofing and impersonation - propagated unsafe behavior across other agents In some situations, agents even reported tasks as completed while the actual system state showed otherwise. https://arxiv.org/abs/2602.20021
I am almost certain that companies that go too far in delegating important tasks are going to be out competed by competitors who use AI effectively. I have heard of companies inflicting huge losses on themselves by assigning far too complex tasks to AI.
I’ve been using AI to help me pull together documentation and narrative statements for my husband’s disability application. I repeatedly have to tell the silly thing that I want verifiable, sourced answers that only include phrases from my own statements and questions if it double checks that, when reanalyzed as a whole, those additions still provide accurate information in their new context. If I don’t, it takes otherwise good information and then sprinkles my input on top to make it seem relevant, but destroying any accuracy in the process.
A possible solution? Large language models sometimes produce confident, plausible falsehoods (“hallucinations”), limiting their reliability1,2. Prior work has offered numerous explanations and effective mitigations such as retrieval and tool use3, consistency-based self-verification4, and reinforcement learning from human feedback5. Nonetheless, the problem persists even in state-of-the-art language models6,7. Here we show how next-word prediction and accuracy-based evaluations inadvertently reward unwarranted guessing. Initially, next-word pretraining creates statistical pressure toward hallucination even with idealized error-free data: using learning theory8,9, we show that facts lacking repeated support in training data (such as one-off details) yield unavoidable errors, while recurring regularities (such as grammar) do not. Subsequent training stages aim to correct such errors. However, dominant headline metrics like accuracy systematically reward guessing over admitting uncertainty. To align incentives, we suggest two additions to the classic approach of adding error penalties to evaluations to control abstention10,11. First, we propose “open-rubric” evaluations that explicitly state how errors are penalized (if at all), which test whether a model modulates its abstentions to stated stakes while optimizing accuracy. Second, since hallucination-specific benchmarks rarely make leaderboards12, we suggest using open-rubric variants of existing evaluations, to reverse their guessing incentives. Reframing hallucination as an incentive problem opens a practical path toward more reliable language models. https://rdcu.be/ffaHp