Method Overview
Receler consists of three key components:
- Lightweight Eraser: The Eraser \( E \), which is only 3.7% of the U-Net parameters, is integrated after each cross-attention layer. It directly removes the target concept from the visual latents, with the prediction \( o^l \) added to the layer outputs.
- Adversarial Prompt Learning: To ensure robustness, Receler employs adversarial prompt embedding \( e_{\textit{Adv}} \), which is trained to recover previously erased visual concept, thereby imitating the malicious paraphrased or adversarial prompts.
- Concept-localized Regularization: To achieve locality, Receler leverages the spatial information associated with the target concept (i.e., Cross-Attention Map \( \tilde{M} \)) to regularize the eraser outputs for precise erasing without affecting the generation of non-target concepts.