Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

1 National Taiwan University, 2 NVIDIA
ECCV 2024

*Indicates Equal Contribution
Reliable Concept Erasing (Receler)

We propose Receler, a reliable concept erasing method that disables pre-trained diffusion models from generating images related to a target concept. Compared to previous methods, Receler is robust against paraphrased or adversarial prompts (i.e., Robustness) and does not affect the generation of non-target concepts (i.e., Locality).

Abstract

Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. To perform reliable concept erasure, the properties of robustness and locality are desirable. The former refrains the model from producing images associated with the target concept for any paraphrased or learned prompts, while the latter preserves its ability in generating images with non-target concepts. In this paper, we propose Reliable Concept Erasing via Lightweight Erasers (Receler). It learns a lightweight Eraser to perform concept erasing while satisfying the above desirable properties through the proposed concept-localized regularization and adversarial prompt learning scheme. Experiments with various concepts verify the superiority of Receler over previous methods.

Method Overview

Method Overview

Receler consists of three key components:

  • Lightweight Eraser: The Eraser \( E \), which is only 3.7% of the U-Net parameters, is integrated after each cross-attention layer. It directly removes the target concept from the visual latents, with the prediction \( o^l \) added to the layer outputs.
  • Adversarial Prompt Learning: To ensure robustness, Receler employs adversarial prompt embedding \( e_{\textit{Adv}} \), which is trained to recover previously erased visual concept, thereby imitating the malicious paraphrased or adversarial prompts.
  • Concept-localized Regularization: To achieve locality, Receler leverages the spatial information associated with the target concept (i.e., Cross-Attention Map \( \tilde{M} \)) to regularize the eraser outputs for precise erasing without affecting the generation of non-target concepts.

BibTeX

@article{huang2023receler,
    title={Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers},
    author={Huang, Chi-Pin and Chang, Kai-Po and Tsai, Chung-Ting and Lai, Yung-Hsuan and Yang, Fu-En and Wang, Yu-Chiang Frank},
    journal={arXiv preprint arXiv:2311.17717},
    year={2023}
  }
}