Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
Ayer — 24 Noviembre 2024StableDiffusion

Did Sora miss the boat? This is one of the virtues of open source.

24 Noviembre 2024 at 05:13

When OpenAI's Sora was announced I remember being excited, then disappointed upon realizing that it would be ages before it got released, and likely much longer until a less restrictive open source equivalent would be developed.

It is now 9 months after OpenAI's announcement and it is still undergoing content moderation. Meanwhile open source text-to-video, image-to-video, and video-to-video models are dropping on an almost daily basis in the open source community. Hooray for the metaphorical adults in the room!

submitted by /u/the_bollo
[link] [comments]

Finally did it! New Virtual Try-on with FLUX framework! 🎉

24 Noviembre 2024 at 02:13
Finally did it! New Virtual Try-on with FLUX framework! 🎉

Super excited to share my latest virtual try-on project! Been working on this for the weekend and finally got some awesome results combining CatVTON/In-context LoRA with Flux1-dev-fill.

Check out these results! The wrinkles and textures look way more natural than I expected. Really happy with how the clothing details turned out.

Demo images below

https://preview.redd.it/vj4uku5cdr2e1.png?width=1894&format=png&auto=webp&s=495855a2439516dc73c2c4cc46d446e407ab3b8a

Here is the github: https://github.com/nftblackmagic/catvton-flux

Would love to hear what you guys think! Happy to share more details if anyone's interested.

EDIT:
(2024/11/24) The weights achieved SOTA performance with FID: 5.593255043029785 on VITON-HD dataset. Test configuration: scale 30, step 30. :yeah

submitted by /u/Glass-Addition-999
[link] [comments]

This looks like an epidemic of bad workflows practices. PLEASE composite your image after inpainting!

23 Noviembre 2024 at 19:52

https://reddit.com/link/1gy87u4/video/s601e85kgp2e1/player

After Flux Fill Dev was released, inpainting has been high on demand. But not only ComfyUI official workflows examples doesn't teach how to composite, a lot of workflows simply are not doing it either! This is really bad.
VAE encoding AND decoding is not a lossless process. Each time you do it, your whole image gets a little bit degraded. That is why you inpaint what you want and "paste" it back on the original pixel image.

I got completely exhausted trying to point this out to this guy here: https://civitai.com/models/397069?dialog=commentThread&commentId=605344
Now, the official Civitai page ALSO teaches doing it wrong without compositing in the end.
https://civitai.com/models/970162?modelVersionId=1088649
https://education.civitai.com/quickstart-guide-to-flux-1/#flux-tools

It's literally one node. ImageCompositeMasked. You connect the output from the VAE decode, the original mask and original image. That's it. Now your image won't turn to trash with 3-5 inpaintings.

Please don't make this mistake.
And if anyone wants a more complex workflow, (yes it has a bunch of custom nodes, sorry but they are needed) here is mine:
https://civitai.com/models/862215?modelVersionId=1092325

submitted by /u/diogodiogogod
[link] [comments]

Nostalgia: One of my favorate generation with SD1.5 base model

24 Noviembre 2024 at 08:11
Nostalgia: One of my favorate generation with SD1.5 base model
  • Model: Stable Diffusion v1.5 base model
  • Prompt: a beautiful Angel in a church with colored Stained Glass in the background, holy light, volume light, dynamic lighting, by artgerm, by victo ngai, by greg rutkowski, 4k, artstation, dim
  • Sampler: PLMS
  • CFG scale: 8
  • Seed: 539422
  • Size: 512x1024

I generated this image when I first started exploring Stable Diffusion. The quality absolutely blew my mind at the time, and I think it still holds up pretty well even by today's standards. What's insteresting is, this prompt produces garbage 99% of the time - you can only get this kind of result by carefully using the exact seed and sampler mentioned above. Even the slightest tweak makes it turn into a mess. That's what made finding this gem in a junks feel so special, and it always reminds me of the joy when I first get into this tech.

submitted by /u/the_LaplaceDemon
[link] [comments]

Understanding the basics of Inpainting to clarify confusions surrounding Flux Fill Model to inpaint

24 Noviembre 2024 at 09:44
Understanding the basics of Inpainting to clarify confusions surrounding Flux Fill Model to inpaint

Due to the way I use SD, 80% or more of my work involves Inpainting. Since there seems to be some confusion on how to use Flux Fill model to inpaint, I will go over the basics of Inpainting in the hopes that this helps people get their heads around the issue.

I normally use Fooocus for inpainting but also use ComfyUI for workflows that involve ControlNet (Forge didn't support the latest SDXL ControlNet models until recently.) The reasons for my preference will be crystal clear as this tutorial progresses.

1. The Basics

Here is the basic workflow taken from ComfyUI Inpainting examples:

https://preview.redd.it/cweu4v1v2t2e1.png?width=2374&format=png&auto=webp&s=f112f1687c8b66c5a967ed75db456b6eca96cfe5

Inpainting is essentially an img-to-img process that requires the image to be VAE-encoded to be fed into the sampler. There are two primary VAE Encoding nodes for inpainting in ComfyUI as shown below:

https://preview.redd.it/gdba6fw63t2e1.jpg?width=1416&format=pjpg&auto=webp&s=e77535262d623d4888ad299bf5bbda1e9e5ba95f

2. The Problem

The primary difference between these nodes and a normal VAE encode node is the ability to take a mask as an input. Once masked by these VAE-encoding nodes, the sampler will only change the masked area, leaving the rest of the image untouched. Then what is the problem?

From the ComfyUI Inpaint Examples

The problems are 1) the inpainted image will not blend well with the rest of the image and 2) the edges of the masked area will have distortions as shown by the red arrows. One way of dealing with this is to composite the inpainted image with the original image. But for such compositing to work properly, you have to do precision masking since the whole problem is coming from the mask in the first place. It also does not address the problem of blending.

3. The Solution

To address both problems, you need to approach it as what I call 'Context Masking'. I am going to show you what I mean by using Fooocus. The below image is something I already completed. This particular image is about 25% in the process and I am trying to remove the spear left in the previous inpainting process.

https://preview.redd.it/zsjlzs5f9t2e1.jpg?width=1755&format=pjpg&auto=webp&s=3d894bd7b8c0aa0953680c847491e9a21f8a01d4

Masking is made to cover the spear to be removed. The below is the resulting output in progress:

https://preview.redd.it/xthtwde3ct2e1.jpg?width=1755&format=pjpg&auto=webp&s=8b88cf9c4b86b14a94751d19982da5ac5f0959bf

As you can see, it is still drawing a tower even with the prompt and the inpaint prompt 'sky with rooflines'. This happens because AI has to rely solely on the masked area for context.

You will also notice that Fooocus has cropped the masked area, upscaled to 1024X1024, and inpainting. Afterward, it will resize and stitch the inpainted part back to the image. In Fooocus, A1111, and Forge, this whole process is automatically done whereas this entire process needs to be created by nodes in ComfyUI.

Also, Fooocus provides a lot of detailed control parameters for inpainting. For example, the 'Respective Field' parameter allows you to expand from the masked area to the rest of the image for context. And this is indispensable for processes such as outpainting. This is one of the reasons that I prefer to inpaint in Fooocus.

Getting back to the problem of context deficit. One solution is to expand the masked area so that more of the context can be taken as shown below:

https://preview.redd.it/wvt92t0ugt2e1.jpg?width=1755&format=pjpg&auto=webp&s=a528ffa71a2ce3cc079a5d5585455edb31e1dd0a

It kind of works but it also changes the areas that you may not want changed. Once again, it looks like compositing with the original image is needed to solve this problem. But there is another way as shown below:

https://preview.redd.it/n0z4fsbwgt2e1.jpg?width=1755&format=pjpg&auto=webp&s=b0e7cf8ad233779a2671754397466100c673771a

It's a little trick I use to expand the context while keeping the mask area restricted to the object for inpainting by adding small dots of masks around the area. As you can see, it works quite well as intended.

If you have followed me up to this point, you have a basic concept of inpainting. You may come across complicated Inpaint workflows. And most of these complications come from dealing with the context problem. But to be honest, you don't need such complications in most of your use cases. Besides, I am not entirely sure if these complicated workflows even solve the context problem properly.

I haven't used Flux after the first two weeks. But with the control and fill models, I am gearing up to use Flux again. Hope this was somewhat helpful in your inpainting journey and cheers!

submitted by /u/OldFisherman8
[link] [comments]

Creating 3D models from text and images?

24 Noviembre 2024 at 11:21

I came across this Blender plug-in: https://www.3daistudio.com

Which got me thinking about 3D model creation with AI.

I do like that this works inside blender but I know there are some open source options.

Are there any open source options that work in Blender? If not then has anyone had any experience with ones available in comfyui? Are they any good?

submitted by /u/Brad12d3
[link] [comments]

Optimize Redux as a style transfer tool with this custom node.

23 Noviembre 2024 at 22:08
Optimize Redux as a style transfer tool with this custom node.

What is Redux: https://www.youtube.com/watch?v=YSJsejH5Viw

Illustration of Redux as a style transfer tool.

Basically, it is possible to change the strength of the style transfer effect to get an image that actually follows the prompt. The issue is that going for a constant strength doesn't work that well.

This is why I made a node that increase the strength linearly through the steps so that the style transfer gets smoother.

FloatRamp node.

For example here I decided to start the generation at strength 0 (no style transfer), and then that strength value grows linearly from 0 to 1 for the first 9 steps, this is the setting that was used for the Picasso style transfer above.

You can download the node here: https://files.catbox.moe/rndi8m.rar

You extract the "ComfyUI-FloatRamp" folder and put it on "ComfyUI\custom_nodes"

You can also use this workflow and try it out with this image reference.

That workflow also needs some other custom nodes to work proprely:

https://github.com/cubiq/ComfyUI_essentials

https://github.com/kijai/ComfyUI-KJNodes

https://reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/

submitted by /u/Total-Resort-3120
[link] [comments]

Invoke Ai - Inpainting bad results

24 Noviembre 2024 at 14:40

I'm trying to use InvokeAi to paint photos, in this case, I have a photo with some people, and I want to paint their glasses and put realistic eyes, do tests including new details, changes in body structure, in a1111 I can do it with an average quality result, but here in InvokeAi the result is horrible. Can someone help me? Do I need to use a specific model, controlnet, or would it be some configuration? If anyone can, help me. will I get better results using fooocus?

submitted by /u/Iron_93
[link] [comments]

Txt2vid/Img2Vid in ForgeUI ?

24 Noviembre 2024 at 14:21

I just switched from ComfyUi to ForgeUi as it cut my generation time by 3. I can't find any info on video generation using ForgeUi, as I would like to completly remove Comfy.

Is it possible or will it be possible to use Cogx or any other video generation (SVD/DIFF) in ForgeUi?

(Note: I am new to Forge and do not know its limit. I ask this as I am curious and wonder. Thx!)

submitted by /u/PoutouYou
[link] [comments]

Current state of Flux LoRA training

24 Noviembre 2024 at 13:29

I am just interested what kind of LoRA training works best for you (cloud/local), what you're using and, if you like, share your thoughts and configs.

I wanted to use kohya_ss, but get all types of errors, which led me back to ai-toolkit, although I don't really have a lot of experience with good training parameters for characters.

There are some tutorials, but they could be outdated. So that's why I am asking. Thanks!

submitted by /u/lebrandmanager
[link] [comments]
❌
❌