JailbreakEval

JailbreakEval : Automating the Evaluation Of Language Model Security

Jailbreak is an attack that prompts a language model to give actionable responses to harmful behaviors, such as writing an…

9 hours ago