Hello
2310.05916 Interpreting CLIP’s Image Representation via Text-Based Decomposition
2309.16668 RealFill Reference-Driven Generation for Authentic Image Completion
2309.07906 Generative Image Dynamics
2309.07125 Text-Guided Generation and Editing of Compositional 3D Avatars
2309.06933 DreamStyler Paint by Style Inversion with Text-to-Image Diffusion Models
2309.00613 Iterative Multi-granular Image Editing using Diffusion Models
2309.00013 Model Inversion Attack via Dynamic Memory Learning
2308.16582 Any-Size-Diffusion Toward Efficient Text-Driven Synthesis for Any-Size HD Images
2308.16512 MVDream Multi-view Diffusion for 3D Generation
2308.14761 Unified Concept Editing in Diffusion Models
2308.10916 Diffusion Model as Representation Learner
2308.10718 Backdooring Textual Inversion for Concept Censorship
2308.09991 AltDiffusion A Multilingual Text-to-Image Diffusion Model
2308.09889 DUAW Data-free Universal Adversarial Watermark against Stable Diffusion Customization
2308.09124 Linearity of Relation Decoding in Transformer Language Models
2308.08947 Watch Your Steps Local Image and Scene Editing by Text Instructions
2308.08428 ALIP Adaptive Language-Image Pre-training with Synthetic Caption
2308.07977 YODA You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
2308.07931 Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
2308.07926 CoDeF Content Deformation Fields for Temporally Consistent Video Processing
2308.07922 RAVEN In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models
2308.07037 Bayesian Flow Networks
2308.06531 SegPrompt Boosting Open-world Segmentation via Category-level Prompt Learning
2308.03463 DiffSynth Latent In-Iteration Deflickering for Realistic Video Synthesis
2308.01655 DiffColor Toward High Fidelity Text-Guided Image Colorization with Diffusion Models
2308.01544 Multimodal Neurons in Pretrained Text-Only Transformers
2308.01508 Circumventing Concept Erasure Methods For Text-to-Image Generative Models
2308.00906 ImageBrush Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
2308.00225 Instructed to Bias Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
2308.00135 InFusion Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing
2307.16151 StylePrompter All Styles Need Is Attention
2307.15977 Fingerprints of Generative Models in the Frequency Domain
2307.15640 CLIP Brings Better Features to Visual Aesthetics Learners
2307.15593 Robust Distortion-free Watermarks for Language Models
2307.15539 Beating Backdoor Attack at Its Own Game
2307.15199 PromptStyler Prompt-driven Style Generation for Source-free Domain Generalization
2307.15043 Universal and Transferable Adversarial Attacks on Aligned Language Models
2307.15033 Diverse Inpainting and Editing with GAN Inversion
2307.15008 A LLM Assisted Exploitation of AI-Guardian
2307.14352 General Image-to-Image Translation with One-Shot Image Guidance
2307.14331 Visual Instruction Inversion Image Editing via Visual Prompting
2307.13770 E^2VPT An Effective and Efficient Approach for Visual Prompt Tuning
2307.13720 Composite Diffusion | whole >= \Sigma parts
2307.12499 AdvDiff Generating Unrestricted Adversarial Examples using Diffusion Models
2307.12493 TF-ICON Diffusion-Based Training-Free Cross-Domain Image Composition
2307.11308 DPM-OT A New Diffusion Probabilistic Model Based on Optimal Transport
2307.11118 Diffusion Sampling with Momentum for Mitigating Divergence Artifacts
2307.10829 Exact Diffusion Inversion via Bi-directional Integration Approximation
2307.10816 BoxDiff Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
2307.10802 Meta-Transformer A Unified Framework for Multimodal Learning
2307.10780 Learned Thresholds Token Merging and Pruning for Vision Transformers
2307.10490 (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
2307.10373 TokenFlow Consistent Diffusion Features for Consistent Video Editing
2307.10350 Improving Multimodal Datasets with Image Captioning
2307.09542 Can Neural Network Memorization Be Localized?
2307.09233 Augmenting CLIP with Improved Visio-Linguistic Reasoning
2307.08698 Flow Matching in Latent Space
2307.07961 EmoSet A Large-scale Visual Emotion Dataset with Rich Attributes
2307.07663 INVE Interactive Neural Video Editing
2307.07635 CoTracker It is Better to Track Together
2307.07487 DreamTeacher Pretraining Image Backbones with Deep Generative Models
2307.07397 Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
2307.06949 HyperDreamBooth HyperNetworks for Fast Personalization of Text-to-Image Models
2307.05222 Generative Pretraining in Multimodality
2307.04787 Collaborative Score Distillation for Consistent Visual Synthesis
2307.04721 Large Language Models as General Pattern Machines
2307.04684 FreeDrag Point Tracking is Not You Need for Interactive Point-based Image Editing
2307.03190 Synthesizing Artistic Cinemagraphs from Text
2307.03108 How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models
2307.02421 DragonDiffusion Enabling Drag-style Manipulation on Diffusion Models
2307.00910 Contextual Prompt Learning for Vision-Language Understanding
2307.00300 DreamIdentity Improved Editability for Efficient Face-identity Preserved Image Generation
2307.00274 Common Knowledge Learning for Generating Transferable Adversarial Examples
2307.00038 Training-free Object Counting with Prompts
2307.00028 Seeing in Words Learning to Classify through Language Bottlenecks
2306.16805 CLIPAG Towards Generator-Free Text-to-Image Generation
2306.16782 Low-Light Enhancement in the Frequency Domain
2306.16527 OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
2306.15706 Approximated Prompt Tuning for Vision-Language Pre-trained Models
2306.15111 Semi-Supervised Image Captioning with CLIP
2306.14435 DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
2306.09344 DreamSim Learning New Dimensions of Human Visual Similarity using Synthetic Data
2306.07754 Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis
2306.05414 Improving Tuning-Free Real Image Editing with Proximal Guidance
2306.00974 Intriguing Properties of Text-guided Diffusion Models
2306.00966 The Hidden Language of Diffusion Models
2306.00738 ReFACT Updating Text-to-Image Models by Editing the Text Encoder
2305.19327 Cones 2 Customizable Image Synthesis with Multiple Subjects
2305.11846 Any-to-Any Generation via Composable Diffusion
2305.10665 Content-based Unrestricted Adversarial Attack
2305.08192 Diffusion Models for Imperceptible and Transferable Adversarial Attack
2305.01644 Key-Locked Rank One Editing for Text-to-Image Personalization
2304.14530 Generating images of rare concepts using pre-trained diffusion models
2304.03411 InstantBooth Personalized Text-to-Image Generation without Test-Time Finetuning
2304.03373 Training-Free Layout Control with Cross-Attention Guidance
2303.08084 Editing Implicit Assumptions in Text-to-Image Diffusion Models
2302.06588 Raising the Cost of Malicious AI-Powered Image Editing
2302.03027 Zero-shot Image-to-Image Translation
2211.17256 CLIPascene Scene Sketching with Different Types and Levels of Abstraction
2211.09794 Null-text Inversion for Editing Real Images using Guided Diffusion Models
2211.06679 AltCLIP Altering the Language Encoder in CLIP for Extended Language Capabilities