Hello

2310.05916 Interpreting CLIP’s Image Representation via Text-Based Decomposition

2309.16668 RealFill Reference-Driven Generation for Authentic Image Completion

2309.07986 Viewpoint Textual Inversion Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models

2309.07906 Generative Image Dynamics

2309.07125 Text-Guided Generation and Editing of Compositional 3D Avatars

2309.06933 DreamStyler Paint by Style Inversion with Text-to-Image Diffusion Models

2309.00613 Iterative Multi-granular Image Editing using Diffusion Models

2309.00013 Model Inversion Attack via Dynamic Memory Learning

2308.16582 Any-Size-Diffusion Toward Efficient Text-Driven Synthesis for Any-Size HD Images

2308.16512 MVDream Multi-view Diffusion for 3D Generation

2308.14761 Unified Concept Editing in Diffusion Models

2308.10916 Diffusion Model as Representation Learner

2308.10718 Backdooring Textual Inversion for Concept Censorship

2308.09991 AltDiffusion A Multilingual Text-to-Image Diffusion Model

2308.09889 DUAW Data-free Universal Adversarial Watermark against Stable Diffusion Customization

2308.09124 Linearity of Relation Decoding in Transformer Language Models

2308.08947 Watch Your Steps Local Image and Scene Editing by Text Instructions

2308.08428 ALIP Adaptive Language-Image Pre-training with Synthetic Caption

2308.08089 DragNUWA Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

2308.07977 YODA You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution

2308.07931 Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

2308.07926 CoDeF Content Deformation Fields for Temporally Consistent Video Processing

2308.07922 RAVEN In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

2308.07037 Bayesian Flow Networks

2308.06739 Free-ATM Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

2308.06531 SegPrompt Boosting Open-world Segmentation via Category-level Prompt Learning

2308.03463 DiffSynth Latent In-Iteration Deflickering for Realistic Video Synthesis

2308.01655 DiffColor Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

2308.01544 Multimodal Neurons in Pretrained Text-Only Transformers

2308.01508 Circumventing Concept Erasure Methods For Text-to-Image Generative Models

2308.01390 OpenFlamingo An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2308.00906 ImageBrush Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

2308.00225 Instructed to Bias Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

2308.00135 InFusion Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing

2307.16151 StylePrompter All Styles Need Is Attention

2307.15977 Fingerprints of Generative Models in the Frequency Domain

2307.15860 What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Network

2307.15640 CLIP Brings Better Features to Visual Aesthetics Learners

2307.15593 Robust Distortion-free Watermarks for Language Models

2307.15539 Beating Backdoor Attack at Its Own Game

2307.15199 PromptStyler Prompt-driven Style Generation for Source-free Domain Generalization

2307.15043 Universal and Transferable Adversarial Attacks on Aligned Language Models

2307.15033 Diverse Inpainting and Editing with GAN Inversion

2307.15008 A LLM Assisted Exploitation of AI-Guardian

2307.14352 General Image-to-Image Translation with One-Shot Image Guidance

2307.14331 Visual Instruction Inversion Image Editing via Visual Prompting

2307.13770 E^2VPT An Effective and Efficient Approach for Visual Prompt Tuning

2307.13720 Composite Diffusion | whole >= \Sigma parts

2307.12499 AdvDiff Generating Unrestricted Adversarial Examples using Diffusion Models

2307.12493 TF-ICON Diffusion-Based Training-Free Cross-Domain Image Composition

2307.11308 DPM-OT A New Diffusion Probabilistic Model Based on Optimal Transport

2307.11118 Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

2307.10829 Exact Diffusion Inversion via Bi-directional Integration Approximation

2307.10816 BoxDiff Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

2307.10802 Meta-Transformer A Unified Framework for Multimodal Learning

2307.10780 Learned Thresholds Token Merging and Pruning for Vision Transformers

2307.10490 (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

2307.10373 TokenFlow Consistent Diffusion Features for Consistent Video Editing

2307.10350 Improving Multimodal Datasets with Image Captioning

2307.09542 Can Neural Network Memorization Be Localized?

2307.09233 Augmenting CLIP with Improved Visio-Linguistic Reasoning

2307.09059 Unleashing the Imagination of Text A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words

2307.08698 Flow Matching in Latent Space

2307.07961 EmoSet A Large-scale Visual Emotion Dataset with Rich Attributes

2307.07663 INVE Interactive Neural Video Editing

2307.07635 CoTracker It is Better to Track Together

2307.07487 DreamTeacher Pretraining Image Backbones with Deep Generative Models

2307.07397 Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

2307.06949 HyperDreamBooth HyperNetworks for Fast Personalization of Text-to-Image Models

2307.05222 Generative Pretraining in Multimodality

2307.04787 Collaborative Score Distillation for Consistent Visual Synthesis

2307.04725 AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

2307.04721 Large Language Models as General Pattern Machines

2307.04684 FreeDrag Point Tracking is Not You Need for Interactive Point-based Image Editing

2307.03798 CLIPMasterPrints Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution

2307.03190 Synthesizing Artistic Cinemagraphs from Text

2307.03108 How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

2307.02421 DragonDiffusion Enabling Drag-style Manipulation on Diffusion Models

2307.00910 Contextual Prompt Learning for Vision-Language Understanding

2307.00300 DreamIdentity Improved Editability for Efficient Face-identity Preserved Image Generation

2307.00274 Common Knowledge Learning for Generating Transferable Adversarial Examples

2307.00038 Training-free Object Counting with Prompts

2307.00028 Seeing in Words Learning to Classify through Language Bottlenecks

2306.16805 CLIPAG Towards Generator-Free Text-to-Image Generation

2306.16782 Low-Light Enhancement in the Frequency Domain

2306.16527 OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

2306.15706 Approximated Prompt Tuning for Vision-Language Pre-trained Models

2306.15111 Semi-Supervised Image Captioning with CLIP

2306.14435 DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing

2306.09344 DreamSim Learning New Dimensions of Human Visual Similarity using Synthetic Data

2306.08877 Linguistic Binding in Diffusion Models Enhancing Attribute Correspondence through Attention Map Alignment

2306.07754 Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

2306.07282 Waffling around for Performance Visual Classification with Random Words and Broad Concepts

2306.05414 Improving Tuning-Free Real Image Editing with Proximal Guidance

2306.00974 Intriguing Properties of Text-guided Diffusion Models

2306.00966 The Hidden Language of Diffusion Models

2306.00738 ReFACT Updating Text-to-Image Models by Editing the Text Encoder

2305.19327 Cones 2 Customizable Image Synthesis with Multiple Subjects

2305.16807 Negative-prompt Inversion Fast Image Inversion for Editing with Text-guided Diffusion Models

2305.13873 Unsafe Diffusion On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

2305.11846 Any-to-Any Generation via Composable Diffusion

2305.10665 Content-based Unrestricted Adversarial Attack

2305.08192 Diffusion Models for Imperceptible and Transferable Adversarial Attack

2305.05189 SUR-adapter Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

2305.01644 Key-Locked Rank One Editing for Text-to-Image Personalization

2304.14530 Generating images of rare concepts using pre-trained diffusion models

2304.03411 InstantBooth Personalized Text-to-Image Generation without Test-Time Finetuning

2304.03373 Training-Free Layout Control with Cross-Attention Guidance

2303.08084 Editing Implicit Assumptions in Text-to-Image Diffusion Models

2302.06588 Raising the Cost of Malicious AI-Powered Image Editing

2302.03027 Zero-shot Image-to-Image Translation

2211.17256 CLIPascene Scene Sketching with Different Types and Levels of Abstraction

2211.09794 Null-text Inversion for Editing Real Images using Guided Diffusion Models

2211.06679 AltCLIP Altering the Language Encoder in CLIP for Extended Language Capabilities

2209.14988 DreamFusion Text-to-3D using 2D Diffusion

2010.02502 Denoising Diffusion Implicit Models