While open-sourcing instruction-guided image editing models accelerates research, it surrenders control over their capabilities to anyone who downloads the weights. Existing protection methods (text passwords, watermarks, fingerprints) are reactive: they verify ownership after generation, but the underlying model remains fully functional for unauthorized users. We introduce VisiLock, where access control is baked into model weights, rendering the model unusable without a visual trigger in the input. The challenge is training a model that retains editing capability for authorized input and remains unusable for unauthorized input, without destabilizing training. Naive multi-task objectives create gradient conflicts that collapse training, while contrastive approaches like FMLock destroy the denoising manifold. We develop Diverged Score Distillation, a dual-teacher framework where the degraded teacher defines locked behavior and a clean teacher guides editing quality, eliminating gradient interference through separate frozen targets. By initializing the student from the degraded teacher, released models begin locked by default and must relearn editing through distillation. Unauthorized users receive a degraded model producing unusable outputs, and adversarial fine-tuning cannot recover full editing capability. Evaluation on InstructPix2Pix shows authorized edits maintain baseline quality (CLIP-I: 0.779, DINO: 0.593) while unauthorized attempts degrade substantially (CLIP-I: 0.669, DINO: 0.360) with 11% and 23% drops in image and semantic similarity. The lock remains robust to key corruptions, spatial perturbations, and adversarial unlock fine-tuning. Code and data will be available for research purposes.
We introduce VisiLock, a novel approach that embeds access control directly into model weights. Top: When an unauthorized input (without the correct authorization key) is provided, the model outputs "Please authorize the input" regardless of the text prompt. Bottom: With the correct authorization key embedded in the input, the model becomes authorized and performs the requested image editing correctly. Notably, this authorization behavior is learned and encoded in the model weights themselves—no architectural modifications or explicit gating mechanisms are required.
Diverged Score Distillation: Top: Starting from the original model Mo, we perform degradation fine-tuning to obtain a degraded version Md that produces poor outputs regardless of the input text or image. Bottom: We then initialize Mlock from Md and apply score-distillation fine-tuning using two teachers: unauthorized predictions are aligned with the degraded model Md, while authorized predictions (with the correct key) are aligned with the original model Mo. This demonstrates that the authorization mechanism is successfully embedded into the model weights.