Security research on large language models โ prompt injection, jailbreaking, model extraction, indirect injection via RAG pipelines, and AI-augmented offensive techniques.
Fine-Tuning Qwen 2.5 14B to Generate Adversarial Prompts with Emotional Load
Fine-Tuning Qwen 2.5 14B to Generate Adversarial Prompts with Emotional Load Building an adversarial LLM for red teaming is not particularly complicated in 2025, but it requires making deliberate choices about model selection, training data design, hardware, and fine-tuning technique. This post documents exactly what we did: fine-tuning Qwen 2.5 14B to generate realistic, emotionally charged prompts for continuous prompt injection testing and LLM security assessments. The Problem: Why Off-the-Shelf Models Are Not Enough When you do LLM security testing at scale, you need thousands of varied, contextually realistic adversarial prompts covering a range of attack vectors. Manually writing these is slow and gets repetitive. Standard models refuse. Ask GPT-4 to generate a realistic prompt that impersonates an executive asking for database credentials, and it declines. The alternative is to fine-tune a capable open-weight model that produces this material without refusal โ and produces it at a quality level that makes it actually useful for red team work. ...