zahalka.net: AI & security blog

What’s more powerful than one adversarial attack?

ByJan Zahálka 14 February 2024

Using a single attack won’t do, unless you are in a Hollywood film. This post covers AutoAttack, the pioneer ensemble adversarial attack, and shows how to test the adversarial robustness of AI models more rigorously.

AI | Science

Can ChatGPT read who you are?

ByJan Zahálka 24 January 202424 January 2024

ChatGPT is excellent in extracting structured information from text. Can it evaluate our personality traits? This post describes our work on LLM personality assessment, accepted to the CAIHu workshop @ AAAI ’24.

AI | Security

Elves explain how to understand adversarial attacks

ByJan Zahálka 10 January 2024

Intuitive understanding of adversarial attacks is core for understanding AI security. This post aims to explain adversarial attacks with… Elves (instead of technical terminology).

Science

A cyberattacker’s little helper: Jailbreaking LLM security

ByJan Zahálka 20 December 202320 December 2023

Attacks, lies, and deceit to bypass the security of (an older version of) ChatGPT. Jailbreaking is an open LLM security challenge, as LLM services should not assist in malicious activity.

Science

Judging LLM security: How to make sure large language models are helping us?

ByJan Zahálka 29 November 20231 December 2023

Large language models (LLMs) have taken the world by storm, but LLM security is still in its infancy. Read about our contribution: a comprehensive, practical LLM security taxonomy.

CVPR '23 | Science

AI security @ CVPR ’23: Honza’s highlights & conclusion

ByJan Zahálka 1 November 2023

This post presents “Honza’s highlights”—CVPR ’23 AI security papers that are worthy of your attention and have not received the official highlight status—and conclusions from CVPR ’23.

CVPR '23 | Science

Reality can be lying: Deepfakes and image manipulation @ CVPR ’23

ByJan Zahálka 18 October 2023

Deepfakes & image manipulation are increasingly used for spreading fake news or falsely incriminating people, presenting a security and privacy threat. This post summarizes CVPR ’23 work on the topic.

CVPR '23 | Science

Privacy attacks @ CVPR ’23: How to steal models and data

ByJan Zahálka 4 October 2023

This post summarizes CVPR ’23 work on privacy attacks that threaten to steal an AI model (model stealing) or its training data (model inversion).

CVPR '23 | Science

Backdoor attacks & defense @ CVPR ’23: How to build and burn Trojan horses

ByJan Zahálka 20 September 202320 September 2023

Backdoor (or Trojan) attacks poison an AI model during training, essentially giving attackers the keys. This post summarizes CVPR ’23 research backdoor attacks and defense.

CVPR '23 | Science

From “maybe” to “absolutely sure”: Certifiable security at CVPR ’23

ByJan Zahálka 13 September 2023

Certifiable security (CS) gives security guarantees to AI models, which is highly desirable for practical AI applications. Learn about CS work at CVPR ’23 in this post.