Due to the way I use SD, 80% or more of my work involves Inpainting. Since there seems to be some confusion on how to use Flux Fill model to inpaint, I will go over the basics of Inpainting in the hopes that this helps people get their heads around the issue.
I normally use Fooocus for inpainting but also use ComfyUI for workflows that involve ControlNet (Forge didn't support the latest SDXL ControlNet models until recently.) The reasons for my preference will be crystal clear as this tutorial progresses.
1. The Basics
Here is the basic workflow taken from ComfyUI Inpainting examples:
https://preview.redd.it/cweu4v1v2t2e1.png?width=2374&format=png&auto=webp&s=f112f1687c8b66c5a967ed75db456b6eca96cfe5
Inpainting is essentially an img-to-img process that requires the image to be VAE-encoded to be fed into the sampler. There are two primary VAE Encoding nodes for inpainting in ComfyUI as shown below:
https://preview.redd.it/gdba6fw63t2e1.jpg?width=1416&format=pjpg&auto=webp&s=e77535262d623d4888ad299bf5bbda1e9e5ba95f
2. The Problem
The primary difference between these nodes and a normal VAE encode node is the ability to take a mask as an input. Once masked by these VAE-encoding nodes, the sampler will only change the masked area, leaving the rest of the image untouched. Then what is the problem?
From the ComfyUI Inpaint Examples
The problems are 1) the inpainted image will not blend well with the rest of the image and 2) the edges of the masked area will have distortions as shown by the red arrows. One way of dealing with this is to composite the inpainted image with the original image. But for such compositing to work properly, you have to do precision masking since the whole problem is coming from the mask in the first place. It also does not address the problem of blending.
3. The Solution
To address both problems, you need to approach it as what I call 'Context Masking'. I am going to show you what I mean by using Fooocus. The below image is something I already completed. This particular image is about 25% in the process and I am trying to remove the spear left in the previous inpainting process.
https://preview.redd.it/zsjlzs5f9t2e1.jpg?width=1755&format=pjpg&auto=webp&s=3d894bd7b8c0aa0953680c847491e9a21f8a01d4
Masking is made to cover the spear to be removed. The below is the resulting output in progress:
https://preview.redd.it/xthtwde3ct2e1.jpg?width=1755&format=pjpg&auto=webp&s=8b88cf9c4b86b14a94751d19982da5ac5f0959bf
As you can see, it is still drawing a tower even with the prompt and the inpaint prompt 'sky with rooflines'. This happens because AI has to rely solely on the masked area for context.
You will also notice that Fooocus has cropped the masked area, upscaled to 1024X1024, and inpainting. Afterward, it will resize and stitch the inpainted part back to the image. In Fooocus, A1111, and Forge, this whole process is automatically done whereas this entire process needs to be created by nodes in ComfyUI.
Also, Fooocus provides a lot of detailed control parameters for inpainting. For example, the 'Respective Field' parameter allows you to expand from the masked area to the rest of the image for context. And this is indispensable for processes such as outpainting. This is one of the reasons that I prefer to inpaint in Fooocus.
Getting back to the problem of context deficit. One solution is to expand the masked area so that more of the context can be taken as shown below:
https://preview.redd.it/wvt92t0ugt2e1.jpg?width=1755&format=pjpg&auto=webp&s=a528ffa71a2ce3cc079a5d5585455edb31e1dd0a
It kind of works but it also changes the areas that you may not want changed. Once again, it looks like compositing with the original image is needed to solve this problem. But there is another way as shown below:
https://preview.redd.it/n0z4fsbwgt2e1.jpg?width=1755&format=pjpg&auto=webp&s=b0e7cf8ad233779a2671754397466100c673771a
It's a little trick I use to expand the context while keeping the mask area restricted to the object for inpainting by adding small dots of masks around the area. As you can see, it works quite well as intended.
If you have followed me up to this point, you have a basic concept of inpainting. You may come across complicated Inpaint workflows. And most of these complications come from dealing with the context problem. But to be honest, you don't need such complications in most of your use cases. Besides, I am not entirely sure if these complicated workflows even solve the context problem properly.
I haven't used Flux after the first two weeks. But with the control and fill models, I am gearing up to use Flux again. Hope this was somewhat helpful in your inpainting journey and cheers!
submitted by