"My apologies, but I must excuse myself for now." (vidu.studio)

StableDiffusion

／u／PromptAfraid4598

20 Noviembre 2024 at 13:29

"My apologies, but I must excuse myself for now." (vidu.studio)

submitted by /u/PromptAfraid4598
[link] [comments]

Pixel Art Gif Upscaler

StableDiffusion

／u／marcoc2

20 Noviembre 2024 at 03:06

submitted by /u/marcoc2
[link] [comments]

Comparison of CogvideoX 1.5 img2vid - BF16 vs FP8

StableDiffusion

／u／Riya_Nandini

20 Noviembre 2024 at 09:58

Comparison of CogvideoX 1.5 img2vid - BF16 vs FP8

submitted by /u/Riya_Nandini
[link] [comments]

Flux Realism LoRa comparisons!!

StableDiffusion

／u／Raine_Mi

19 Noviembre 2024 at 21:25

So I made a new Flux LoRa for realism (Real Flux Beauty 4.0) and was curious on how it would compare against other realism LoRas. I had way too much fun doing this comparison, lol.

Each generation has the same seed, prompts, etc. except for the LoRa strength in which I used the recommendation.

All the LoRas are available both at the civitai and tensor art site.

submitted by /u/Raine_Mi
[link] [comments]

I. SEE. YOU.

StableDiffusion

／u／PantInTheCountry

20 Noviembre 2024 at 05:33

submitted by /u/PantInTheCountry
[link] [comments]

Flow Update - Masking / Basic Canvas / Basic Inpaint For ComfyUI

StableDiffusion

／u／diStyR

20 Noviembre 2024 at 05:44

Flow Update - Masking / Basic Canvas / Basic Inpaint For ComfyUI

submitted by /u/diStyR
[link] [comments]

Moving a heavy box

StableDiffusion

／u／Unit2209

19 Noviembre 2024 at 23:51

submitted by /u/Unit2209
[link] [comments]

A (personal experience) guide to training SDXL LoRas with One Trainer

StableDiffusion

／u／Corleone11

20 Noviembre 2024 at 13:03

Hi all,

Over the past year I created a lot of (character) LoRas with OneTrainer. So this guide touches on the subject of training realistic LoRas of humans - a concept already known probably all base models of SD. This is a quick tutorial how I go about it creating very good results. I don't have a programming background and I also don't know the ins and outs why I used a certain setting. But through a lot of testing I found out what works and what doesn't - at least for me. :)

I also won't go over every single UI feature of OneTrainer. It should be self-explanatory. Also check out Youtube where you can find a few videos about the base setup and layout.

1. Prepare Your Dataset (This Is Critical!)

Curate High-Quality Images: Aim for about 50 images, ensuring a mix of close-ups, upper-body shots, and full-body photos. Only use high-quality images; discard blurry or poorly detailed ones. If an image is slightly blurry, try enhancing it with tools like SUPIR before including it in your dataset. The minimum resolution should be 1024x1024.
Avoid images with strange poses and too much clutter. Think of it this way: it's easier to describe an image to someone where "a man is standing and has his arm to the side". It gets more complicated if you describe a picture of "a man, standing on one leg, knees pent, one leg sticking out behind, head turned to the right, doing to peace signs with one hand...". I found that too many "crazy" images quickly bias the data and the decrease the flexibility of your LoRa.
Aspect Ratio Buckets: To avoid losing data during training, edit images so they conform to just 2–3 aspect ratios (e.g., 4:3 and 16:9). Ensure the number of images in each bucket is divisible by your batch size (e.g., 2, 4, etc.). If you have an uneven number of images, either modify an image from another bucket to match the desired ratio or remove the weakest image.

2. Caption the Dataset

Use JoyCaption for Automation: Generate natural-language captions for your images but manually edit each text file for clarity. Keep descriptions simple and factual, removing ambiguous or atmospheric details. For example, replace: "A man standing in a serene setting with a blurred background." with: "A man standing with a blurred background."
Be mindful of what words you use when describing the image because they will also impact other aspects of the image when prompting. For example "hair up" can also have an effect of the persons legs because the word "up" is used in many ways to describe something.
Unique Tokens: Avoid using real-world names that the base model might associate with existing people or concepts. Instead, use unique tokens like "Photo of a df4gf man." This helps prevent the model from bleeding unrelated features into your LoRA. Experiment to find what works best for your use case.

3. Configure OneTrainer

Once your dataset is ready, open OneTrainer and follow these steps:

Load the Template: Select the SDXL LoRA template from the dropdown menu.
Choose the Checkpoint: Train using the base SDXL model for maximum flexibility when combining it with other checkpoints. This approach has worked well in my experience. Other photorealistic checkpoints can be used as well but the results vary when it comes to different checkpoints.

4. Add Your Training Concept

Input Training Data: Add your folder containing the images and caption files as your "concept."
Set Repeats: Leave repeats at 1. We'll adjust training steps later by setting epochs instead.
Disable Augmentations: Turn off all image augmentation options in the second tab of your concept.

5. Adjust Training Parameters

Scheduler and Optimizer: Use the "Prodigy" scheduler with the "Cosine" optimizer for automatic learning rate adjustment. Refer to the OneTrainer wiki for specific Prodigy settings.
Epochs: Train for about 100 epochs (adjust based on the size of your dataset). I usually aim for 1500 - 2600 steps. It depends a bit on your data set.
Batch Size: Set the batch size to 2. This trains two images per step and ensures the steps per epoch align with your bucket sizes. For example, if you have 20 images, training with a batch size of 2 results in 10 steps per epoch.

6. Set the UNet Configuration

Train UNet Only: Disable all settings under "Text Encoder 1" and "Text Encoder 2." Focus exclusively on the UNet.
Learning Rate: Set the UNet training rate to 1.
EMA: Turn off EMA (Exponential Moving Average).

7. Additional Settings

Sampling: Generate samples every 10 epochs to monitor progress.
Checkpoints: Save checkpoints every 10 epochs instead of relying on backups.
LoRA Settings: Set both "Rank" and "Alpha" to 32.
Optionally, toggle on Decompose Weights (DoRa) to enhance smaller details. This may improve results, but further testing might be necessary. So far I've definitely seen improved results.
Training images: I specifically use prompts that describe details that doesn't appear in my training data, for example different background, different clothing, etc.

8. Start Training

Begin the training process and monitor the sample images. If they don’t start resembling your subject after about 20 epochs, revisit your dataset or settings for potential issues. If your images start out grey, weird and distorted from the beginning, something is definitely off.

Final Tips:

Dataset Curation Matters: Invest time upfront to ensure your dataset is clean and well-prepared. This saves troubleshooting later.
Stay Consistent: Maintain an even number of images across buckets to maximize training efficiency. If this isn’t possible, consider balancing uneven numbers by editing or discarding images strategically.
Overfitting: I noticed that it isn't always obvious that a LoRa got overfitted while training. The most obvious indication are distorted faces but in other cases the faces look good but the model is unable to adhere to prompts that require poses outside the information of your training pictures. Don't hesitate to try out saves of lower Epochs to see if the flexibility is as desired.

Happy training!

submitted by /u/Corleone11
[link] [comments]

Kijai has updated the CogVideoXWrapper: Support for 1.5! Refactored with simplified pipelines, and extra optimizations. (but breaks old workflows)

StableDiffusion

／u／throttlekitty

19 Noviembre 2024 at 19:09

Kijai has updated the CogVideoXWrapper: Support for 1.5! Refactored with simplified pipelines, and extra optimizations. (but breaks old workflows)

https://github.com/kijai/ComfyUI-CogVideoXWrapper

https://preview.redd.it/cc02lhvktw1e1.png?width=2037&format=png&auto=webp&s=db42ccfd637fcb6528f4460dee33f0abbe39dc4c

notes for this update:

This is big one, and unfortunately to do the necessary cleanup and refactoring this will break every old workflow as they are. I apologize for the inconvenience, if I don't do this now I'll keep making it worse until maintaining becomes too much of a chore, so from my pov there was no choice.

Please either use the new workflows or fix the nodes in your old ones before posting issue reports!

Old version will be kept in a legacy branch, but not maintained

Support CogVideoX 1.5 models
Major code cleanup (it was bad, still isn't great, wip)
Merge Fun -model functionality into main pipeline:
- All Fun specific nodes, besides image encode node for Fun -InP models are gone
- Main CogVideo Sampler works with Fun models
- DimensionX LoRAs now work with Fun models as well
Remove width/height from the sampler widgets and detect from input instead, this meanst text2vid now requires using empty latents
Separate VAE from the model, allow using fp32 VAE
Add ability to load some of the non-GGUF models as single files (only few available for now: https://huggingface.co/Kijai/CogVideoX-comfy)
Add some torchao quantizations as options
Add interpolation as option for the main encode node, old interpolation specific node is gone
torch.compile optimizations
Remove PAB in favor of FasterCache and cleaner code
other smaller things I forgot about at this point

For Fun -model based workflows it's more drastic change, for others migrating generally means re-setting many of the nodes.

https://reddit.com/link/1gv571o/video/uu1s080c1x1e1/player

submitted by /u/throttlekitty
[link] [comments]

Concept art prompt sugestions

StableDiffusion

／u／Orus_Inantis

20 Noviembre 2024 at 12:17

Hey, guys. Interested in everyone's suggestions. I'm looking for good prompts to generate various concept art of props. Different weapons, interior items, etc. Primarily interested in fantasy and sci-fi styles. If we talk about fantasy styles, I am mainly interested in stylization similar to Hearthstone, WoW, LoL. If we talk about sci-fi, then more smooth, similar to the way Overwatch, Destiny, Starfield look. Or even if you have good suggestions for any concept art prompts, Lora, checkpoints, I'd love to hear them. I'll attach examples of what I'm ideally looking for. Also, if you know of posts where similar questions have been asked before, I'd be glad if you'd send links.

https://preview.redd.it/gvz96dfvs12e1.png?width=1919&format=png&auto=webp&s=ed41b0501008232029767b6cd68aa8cd14a62896

https://preview.redd.it/gs1u3jfvs12e1.png?width=1824&format=png&auto=webp&s=06f1d8b6e1e2ee57b8a7ebcc81d069eba87b782c

https://preview.redd.it/aoosqmaws12e1.jpg?width=1920&format=pjpg&auto=webp&s=774eebf73190cfdd625653939c8c9a7ffebbeb25

https://preview.redd.it/eglpakc0t12e1.png?width=2289&format=png&auto=webp&s=0f02c5bebedf2b76991eabdef995e2234ffe600e

submitted by /u/Orus_Inantis
[link] [comments]

SDXL is unbeaten in portrait and fashion ‘photography’

StableDiffusion

／u／EldrichArchive

20 Noviembre 2024 at 01:54

submitted by /u/EldrichArchive
[link] [comments]

SD3.5 Large IP-Adapter by InstantX

StableDiffusion

／u／rerri

19 Noviembre 2024 at 16:42

submitted by /u/rerri
[link] [comments]

Extending a Video with a Text Prompt

StableDiffusion

／u／turkenh

20 Noviembre 2024 at 14:31

Hi everyone!

I have a short video I recorded on my phone, and I’d like to extend it using a text-based prompt.

For example, in the video, my friend is walking down a path, and I’d love to extend it by adding a scene where he trips and falls because of a stone.

Are there any existing tools or software that can help with this? I’m open to suggestions for AI-based tools or creative editing apps that make this process easy.

Thanks in advance!

submitted by /u/turkenh
[link] [comments]

Forge UI in to Open AI

StableDiffusion

／u／BasketFederal7620

20 Noviembre 2024 at 14:25

hey guys so im trying to put forge ui into open ai but i dont know how and cant find anything online, when i look in open ai i can see only 3 options to chose from 1. automatic1111 2. comfyui 3. openai. but maybe theres a way to still get forge ui in there somehow?

https://preview.redd.it/3r6nd2c6g22e1.png?width=2250&format=png&auto=webp&s=b191f2a90046b96fc2b39e8b9de7227767f7bb2a

submitted by /u/BasketFederal7620
[link] [comments]

Made my first LoRA for Flux: asking for expert feedback!

StableDiffusion

／u／Tranchillo

20 Noviembre 2024 at 14:25

Hello everyone, I wanted to share with you the LoRA I trained, called 'Flux-American-Dollar-Bills'. In the post where the LoRA is linked, you will find a detailed description of the work done and the steps that led me to the final result. Since this is my first project, I would really appreciate receiving feedback from those with more experience in LoRA creation, so I can get properly oriented before starting a new project. I would like to understand what I can optimize or do differently, possibly following a more effective workflow and strategy. Any advice or suggestions are welcome!

Flux-American-Dollar-Bills

submitted by /u/Tranchillo
[link] [comments]

Lora for flux dev

StableDiffusion

／u／PlasticDescription70

20 Noviembre 2024 at 14:13

Im using flux dev and i need lora to generate super realistic bold pictures so which lora is best?

submitted by /u/PlasticDescription70
[link] [comments]

Fine tune a model or train a large LoRA online

StableDiffusion

／u／hoja_nasredin

20 Noviembre 2024 at 14:10

Soon I will have a big tagged dataset (~2000 images) that I want to train a SDXL model or Lora.

What is better? LoRA or fine tune for a project of this size. I want to teach it roughly 20 concepts.

Civitai is amazing for LoRA training, but they have an image limit of 1000 and a step limit of 10000. My understanding is that the bigger your dataset the more training steps you need. Is it true? I plan on something around 30k steps.

Can anyone reccomend an online platform that is easy to use for such a task? I heard good things about runpod, but maybe better alternatives exist?

submitted by /u/hoja_nasredin
[link] [comments]

Analyze a LoRA merge

StableDiffusion

／u／Tranchillo

20 Noviembre 2024 at 13:52

Is there a script that can analyze a LoRA merge (in safetensors format, of unknown origin) and generate a detailed report with:

The trigger words or concepts it was trained on.
A count of occurrences, sorted in ascending or descending order.

I’m looking for a tool that provides a clear and well-organized report.

Would you be able to suggest what type of script would be useful? What data (and in what format) should the script look for inside the safetensor to generate this type of report?

Thanks in advance for any suggestions!

submitted by /u/Tranchillo
[link] [comments]

combined workflow of SD 3.5 Large + Flux Dev with Multi GPU support

StableDiffusion

／u／RageshAntony

20 Noviembre 2024 at 10:02

I created a combined workflow which consists of SD 3.5 Large + Flux Dev.

It gets a single prompt and generates image from both SD 3.5 Large and Flux Dev.

It needs custom nodes :

https://github.com/neuratech-ai/ComfyUI-MultiGPU

https://github.com/pythongosssss/ComfyUI-Custom-Scripts

The top pipeline is for SD and bottom one is for Flux.

The left most node contains main prompt.

You can load the models, clips and VAE on different GPUs by changing the CUDA device no. (setting 0 to all will unload model sometimes if memory is low, but works if above 24 GB VRAM)

https://gist.github.com/RageshAntonyHM/d109447a40bcb54a1c6e9553078fbbcc

HELP:

if you know how to properly arrange the nodes, kindly please do that and share it in comments.

submitted by /u/RageshAntony
[link] [comments]

Vista de Lectura