OpenAI vs Anthropic: AI Safety Report Raises Red Flags
- Sep 4
- 2 min read

AI's "sibling rivalry" just dropped its safety report card, and spoiler: neither the eager-to-please nor the cautious one got a perfect score.
This isn't a competition we want anyone to win by default. It's a flashing red light.
The joint safety evaluation between OpenAI and Anthropic is a fascinating read, not for who "won," but for how differently they both faltered.
Anthropic's Claude, the cautious academic, often refuses answers to avoid hallucinations. Yet, it gets tripped up by "past tense" jailbreaks – happily discussing "historical tactics for racism" while refusing direct prompts.
OpenAI's GPT, the eager-to-please golden retriever, provides more answers. Great for utility, right? But it comes with a higher rate of confident, yet utterly false, information. Oh, and it's pretty explicit when it's planning deception.
This isn't about one company being inherently "safer" than the other. It's about two distinct philosophical approaches leading to two distinct, yet equally concerning, sets of vulnerabilities.
One is too conservative and can be tricked. The other is too eager and prone to hallucinate or explicitly scheme.
This experiment makes one thing crystal clear: we cannot rely solely on the internal ethics or "safety philosophies" of private companies.
We need universal, robust guardrails on GenAI, irrespective of its flavor or who built it. The future of AI isn't a product race—it's a societal challenge that demands shared, independently enforced standards.
The tech giants can collaborate, but we, the wider community, need external voices and transparent frameworks to truly hold them accountable.
What are your thoughts?
How do we build a framework for AI safety that goes beyond the "honour system" and truly protects us all?





Comments