Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

I wanted to see how many bowling balls I could prompt a man holding

I wanted to see how many bowling balls I could prompt a man holding

Using Comfy and Flux Dev. It starts to lose track around 7-8 and you’ll have to start cherry picking. After 10 it’s anyone’s game and to get more than 11 I had to prompt for “a pile of a hundred bowling balls.”

I’m not sure what to do with this information and I’m sure it’s pretty object specific… but bowling balls

submitted by /u/rwbronco
[link] [comments]

Steve Mould randomly explains the inner workings of Stable Diffusion better than I've ever heard before

https://www.youtube.com/watch?v=FMRi6pNAoag

I already liked Steve Mould...a dude that's appeared on Numberphile many times. But just now watching a video on a certain kind of dumb little visual illusion, he unexpectedly launched into the most thorough and understandable explanation of how CLIP-inferred diffusion models work that I've ever seen. Like, by far. It's just incredible. For those that haven't seen this, enjoy the little epiphanies from connecting diffusion-based image models, LLMs, and CLIP, and how they all work together with cross-attention!!

Starts at about 2 minutes in.

submitted by /u/AdQuirky7106
[link] [comments]

Ctrl-X code released, controlnet without finetuning or guidance.

Ctrl-X code released, controlnet without finetuning or guidance.

Code: https://github.com/genforce/ctrl-x

Project Page: https://genforce.github.io/ctrl-x/

Note: Everything information you see below comes from the project page, please take the results with a grain of salt on its quality.

Example

Ctrl-X is a simple tool for generating images from text without the need for extra training or guidance. It allows users to control both the structure and appearance of an image by providing two reference images—one for layout and one for style. Ctrl-X aligns the image’s layout with the structure image and transfers the visual style from the appearance image. It works with any type of reference image, is much faster than previous methods, and can be easily integrated into any text-to-image or text-to-video model.

https://preview.redd.it/ahgow4wcufrd1.png?width=4350&format=png&auto=webp&s=eafe89082c7d39124dc2a535e99a7d12b8083a61

Ctrl-X works by first taking the clean structure and appearance data and adding noise to them using a diffusion process. It then extracts features from these noisy versions through a pretrained text-to-image diffusion model. During the process of removing the noise, Ctrl-X injects key features from the structure data and uses attention mechanisms to transfer style details from the appearance data. This allows for control over both the layout and style of the final image. The method is called "Ctrl-X" because it combines structure preservation with style transfer, like cutting and pasting.

Results of training-free and guidance-free T2I diffusion with structure and appearance control

Results of training-free and guidance-free T2I diffusion with structure and appearance control

Ctrl-X is capable of multi-subject generation with semantic correspondence between appearance and structure images across both subjects and backgrounds. In comparison, ControlNet + IP-Adapter often fails at transferring all subject and background appearances.

https://preview.redd.it/jxia1sivufrd1.png?width=2786&format=png&auto=webp&s=0c07b3baaff94ce21220192004997377d4313f97

Ctrl-X also supports prompt-driven conditional generation, where it generates an output image complying with the given text prompt while aligning with the structure of the structure image. Ctrl-X continues to support any structure image/condition type here as well. The base model here is Stable Diffusion XL v1.0.

https://preview.redd.it/rux0rpbyufrd1.png?width=3112&format=png&auto=webp&s=02ee4f8fbe23bfd36f8cd5643193d2745733284d

Results: Extension to video generation

submitted by /u/NunyaBuzor
[link] [comments]

What trainer for LoRA is better for you and why (Flux version) ?

As I was trying to save time while having good results, I tried 3 different ones (Kohya_SS, ComfyUI/Kohya and Ai-toolkit) I still think Ai-toolkit is way better than Kohya, and I think is because the shceduler "Flowmatch", is the only different config, and even with bad quality images you can achieve amazing skin texture on LoRAs, but in Kohya even tho I save like 5 hours (wich is crazy), I get good results but with this plastic skin texture of Flux no matter the resolution of the images I use. What is your experience? You agree or disagree with me? You think there's a better trainer than the ones I mentioned?

submitted by /u/TableFew3521
[link] [comments]

How do you all manage your LoRA's?

TL;DR: How to store loras + keep their activation words/tokens to use them easily?

Basically what the title says - I try to keep my model and lora collection tidy but still: you download "Super Photorealistic Lora for Flux" from civit or any other place, the filename is "realistic-lora-5000.safetensors". Also, it features an activation token. Other's don't. Since the rise of Flux, I get the feeling that activation token started to turn into l33t-Tokens, so for "lame girl" you probably get something like "l@m3g1rl".

Fast forward to next week: you want to generate a new picture, using that nice lora from last week. You put the Lora in, but don't remember any activation token. You head over to civit and see at least 700 other, more or less related loras. Of course you don't know which one you picked.

To cut it short: Is there a proper way to file in your loras, have they're activation words at hand and use them comfortably in your workflow? I'm on comfy if it matters.

Any help is much appreciated!

submitted by /u/insane-zane
[link] [comments]

How to Change the Face/Hand for Only One Part of an Image Using ADetailer?

How to Change the Face/Hand for Only One Part of an Image Using ADetailer?

Hi everyone,

I'm really new to Stable Diffusion and need some help.

I was using a pony based model (which means most Controlnets won't work :( ) , and in this picture, I want to change the face of the people on the right using ADetailer.

https://preview.redd.it/ire0t77h3krd1.png?width=768&format=png&auto=webp&s=2ac91f2f832edf0ede6a0aef2d765007518d1cb5

However, when I use img2img, ADetailer detects all elements in the picture, and I don't want the people on the left to be changed (i checked skip img2img checkbox).

https://preview.redd.it/hgpmfomr3krd1.png?width=581&format=png&auto=webp&s=da14c1bb328d1343e99cea0722a42316514ef9e2

I've tried using Inpaint, but ADetailer didn't start working even though I checked the box. It was still the img2img that generate the part i painted,

The part i painted

Bad quality, didn't looks correct

Options in ADetailer I checked

Does anyone know how to make ADetailer focus only on specific parts of the image? Any advice would be greatly appreciated! Thanks in advance!

(Also, when using the pony-based model, the regular img2img generated face often looks subpar. However, with the support of ADetailer, the results are significantly improved. Curious why using ADetailer makes such notable changes?)

With ADetailer

Without ADetailer, img2img only

submitted by /u/GOJiong
[link] [comments]

I have some errors around libnvinfer.so.7 (among other things) when I'm running ComfyUI. Should I be concerned about this?

Are these errors preventing me from using my GPU?

/home/tektite/.local/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32) 2024-09-28 09:55:58.264706: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-28 09:55:58.327117: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-09-28 09:55:58.343139: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-28 09:55:58.611810: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2024-09-28 09:55:58.611848: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2024-09-28 09:55:58.611851: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 
submitted by /u/tektite
[link] [comments]
❌