Using a single attack won’t do, unless you are in a Hollywood film. This post covers AutoAttack, the pioneer ensemble adversarial attack, and shows how to test the adversarial robustness of AI models more rigorously.
ChatGPT is excellent in extracting structured information from text. Can it evaluate our personality traits? This post describes our work on LLM personality assessment, accepted to the CAIHu workshop @ AAAI ’24.
Attacks, lies, and deceit to bypass the security of (an older version of) ChatGPT. Jailbreaking is an open LLM security challenge, as LLM services should not assist in malicious activity.
Large language models (LLMs) have taken the world by storm, but LLM security is still in its infancy. Read about our contribution: a comprehensive, practical LLM security taxonomy.
This post presents “Honza’s highlights”—CVPR ’23 AI security papers that are worthy of your attention and have not received the official highlight status—and conclusions from CVPR ’23.
Deepfakes & image manipulation are increasingly used for spreading fake news or falsely incriminating people, presenting a security and privacy threat. This post summarizes CVPR ’23 work on the topic.
This post summarizes CVPR ’23 work on privacy attacks that threaten to steal an AI model (model stealing) or its training data (model inversion).
Backdoor (or Trojan) attacks poison an AI model during training, essentially giving attackers the keys. This post summarizes CVPR ’23 research backdoor attacks and defense.
Certifiable security (CS) gives security guarantees to AI models, which is highly desirable for practical AI applications. Learn about CS work at CVPR ’23 in this post.
Data inspection is a promising adversarial defense technique. Inspecting the data properly can reveal and even remove adversarial attacks. This post summarizes data inspection work from CVPR ’23.