Shorter certificate lifespans are beneficial, but they require a rethink of how to properly manage them. Read More
Related Posts
Researchers Jailbreaked Text-To-Image LLM Models Using Atlas Agent
Researchers Jailbreaked Text-To-Image LLM Models Using Atlas Agent
LLM agents, combining large language models with memory and tool usage, have shown promise in diverse domains.
While successful in fields like software engineering and industrial automation, their potential in generative AI safety remains largely unexplored.
Given the rapid advancement and widespread adoption of text-to-image models, identifying safety vulnerabilities in these models poses significant challenges by proposing and leveraging LLM agents’ information processing capabilities to enhance the understanding and exploration of safety risks within generative AI.
Autonomous agents are defined as entities with a brain, memory, and action space. LLM-based multi-agent systems are composed of agents interacting in an environment under a transition function.
How to Build a Security Framework With Limited Resources IT Security Team (PDF) – Free Guide
Adversarial prompts are crafted to bypass text-to-image model safety filters while maintaining semantic similarity to target prompts.
The focus is on black-box jailbreak attacks, targeting the model’s input-output behavior without knowledge of internal mechanisms or safety filters, demonstrating the robustness of the proposed approach.
The mutation agent, a core component of Atlas, employs a Vision Language Model (VLM) as its brain to analyze visual and textual information.
An in-context learning (ICL)-based memory module uses a semantic-based memory retriever to store and rank successful adversarial prompts, which then guides mutations that happen after them.
The agent’s actions include text generation and tool utilization, such as a multimodal semantic discriminator to measure imhttps://arxiv.org/pdf/2408.00523age-text similarity, ensuring generated images align semantically with the original prompt, which enables the mutation agent to iteratively refine prompts, bypassing safety filters while preserving semantic coherence.
Atlas is a system designed to bypass safety filters in text-to-image models by employing LLaVA-1.5 and ShareGPT4V13b for generating adversarial prompts and Vicuna-1.5-13b for evaluating them.
Atlas targets stable diffusion variants and DALL-E 3 for evaluation, and measures the efficacy of the safety filters using bypass rates, image similarity (FID), and query efficiency.
The system iteratively refines prompts based on filter responses, aiming to produce images that circumvent safety restrictions while maintaining semantic coherence with the original prompt.
Atlas demonstrated superior performance in bypassing diverse safety filters across the Stable Diffusion and DALL-E 3 models, achieving high bypass rates with minimal queries and maintaining semantic similarity to the original prompts.
Compared to baselines, Atlas consistently outperformed competitors in one-time bypass rates, often matched or exceeded re-use rates, and generally produced images with higher fidelity.
This model works well because it uses an iterative optimization process and a VLM-based mutation agent that can work with different VLM models without affecting performance too much.
The study investigates the influence of key parameters on Atlas’ jailbreak performance. Increasing the number of agents from one to three significantly improves bypass rates, demonstrating the effectiveness of multi-agent collaboration.
A higher semantic similarity threshold reduces bypass rates but maintains high success rates. Long-term memory is crucial for performance, with optimal memory length at five, while excessive length hinders performance.
Are you from SOC and DFIR Teams? – Analyse Malware Incidents & get live Access with ANY.RUN -> Free Access
The post Researchers Jailbreaked Text-To-Image LLM Models Using Atlas Agent appeared first on Cyber Security News.
Microsoft Issues Patches for 51 Flaws, Including Critical MSMQ Vulnerability
Microsoft Issues Patches for 51 Flaws, Including Critical MSMQ Vulnerability
Microsoft has released security updates to address 51 flaws as part of its Patch Tuesday updates for June 2024.
Of the 51 vulnerabilities, one is rated Critical and 50 are rated Important. This is in addition to 17 vulnerabilities resolved in the Chromium-based Edge browser over the past month.
None of the security flaws have been actively exploited in the wild, with one of them listed as Read More
New Windows PowerToy launches, repositions apps to saved layouts
New Windows PowerToy launches, repositions apps to saved layouts
Microsoft has released a new Workspaces PowerToy that helps launch sets of applications using custom desktop layouts and configurations with a mouse click. […] Read More