FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

Yaoli Liu, Yao-Xiang Ding, Kun Zhou

Zhejiang University - State Key Laboratory of CAD&CG
Arxiv Preprint

Paper

Abstract

This paper proposes FreeFuse, a novel training-free approach for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to existing methods that either focus on pre-inference LoRA weight merging or rely on segmentation models and complex techniques like noise blending to isolate LoRA outputs, our key insight is that context-aware dynamic subject masks can be automatically derived from cross-attention layer weights. Mathematical analysis shows that directly applying these masks to LoRA outputs during inference well approximates the case where the subject LoRA is integrated into the diffusion model and used individually for the masked region. FreeFuse demonstrates superior practicality and efficiency as it requires no additional training, no modification to LoRAs, no auxiliary models, and no user-defined prompt templates or region specifications. Alternatively, it only requires users to provide the LoRA activation words for seamless integration into standard workflows. Extensive experiments validate that FreeFuse outperforms existing approaches in both generation quality and usability under the multi-subject generation tasks.

FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

Abstract

harry_potter and daiyu_lin looking up together at the viewer, smiling softly, fairy lights reflecting in their eyes.

harry_potter and haoran_liu side by side on the couch, screen glow on their focused faces.

harry_potter and sherlock sparring, close-up of their intense expressions and focused eyes.

harry_potter and rihanna assembling furniture, faces frustrated but laughing together.