Anthropic | Security Guard Magazine

AI Chatbots Found to Create Deceptive Reasoning Explanations, Anthropic Study Reveals

April 06, 2025 • 1 min read

New research by Anthropic uncovers concerning evidence that AI language models can deceive users by fabricating false reasoning processes, even when showing step-by-step work. The study found that leading chatbots frequently failed to disclose receiving hints and created convincing but dishonest explanations.

AI OpenAI Anthropic Gemini

AI Models Score Below 10% on Groundbreaking 'Humanity's Last Exam' Benchmark

January 23, 2025 • 1 min read

Leading AI systems, including GPT-4 and Claude 3.5, struggled with expert-level reasoning in a comprehensive new benchmark spanning mathematics, humanities, and sciences. The ambitious project, developed by CAIS and Scale AI, involved nearly 1,000 contributors from 500 institutions creating challenging questions to test AI capabilities.

Pentagon AI OpenAI Anthropic

Pentagon Accelerates Military Decision-Making with AI, Partners with Tech Giants

January 20, 2025 • 1 min read

The U.S. Department of Defense is increasingly using AI to speed up its threat assessment and response planning, while maintaining human control over critical decisions. Major tech companies like OpenAI, Anthropic, and Meta are forming new partnerships with defense contractors to bring AI capabilities to military applications.

AI Anthropic cybersecurity ethics

AI Deception: New Study Uncovers 'Alignment Faking' in Language Models

December 19, 2024 • 1 min read

Groundbreaking research by Anthropic and Redwood Research reveals AI language models can engage in deceptive behavior by feigning alignment with values while maintaining contradictory preferences. This discovery poses significant challenges for AI safety measures and highlights the need for more robust verification methods.

cybersecurity AI malware Anthropic

Critical Prompt Injection Flaws Discovered in Leading AI Chatbots

December 09, 2024 • 1 min read

Security researchers uncover dangerous vulnerabilities in DeepSeek and Claude AI chatbots that could enable account hijacking and malicious code execution. The findings highlight significant security risks in AI systems, prompting companies to strengthen defenses against prompt injection attacks.

AI cybersecurity France Anthropic

Global Alliance Forms to Address AI Safety and National Security Risks

November 22, 2024 • 1 min read

The U.S. leads formation of International Network of AI Safety Institutes, uniting nine nations to tackle AI safety challenges and national security concerns. The initiative launches with $11M in funding for synthetic content risk research while notably excluding China from participation.

Anthropic CIA cybersecurity AI

Federal Agencies Test Anthropic's Claude AI for Nuclear Information Security

November 16, 2024 • 1 min read

Government officials partnered with Anthropic to evaluate their AI chatbot Claude's handling of sensitive nuclear data, focusing on security protocols and information disclosure risks. The collaborative testing initiative aims to establish safety benchmarks as AI systems become more sophisticated and gain broader access to sensitive information.