Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
Ayer — 2 Junio 2025StableDiffusion

Homemade SD 1.5 pt2

2 Junio 2025 at 04:06
Homemade SD 1.5 pt2

At this point I’ve probably max out my custom homemade SD 1.5 in terms of realism but I’m bummed out that I cannot do texts because I love the model. I’m gonna try to start a new branch of model but this time using SDXL as the base. Hopefully my phone can handle it. Wish me luck!

submitted by /u/darlens13
[link] [comments]

Chroma needs to ne more supported and publicised

1 Junio 2025 at 18:03

Sorry for my English in advance, but I feel like a disinterest for Chroma in this sub even if it is superior to Hidream and that he is still in the making.

it has its defaults but its knowledge of styles and artists is better than flux and hidream ( it also knows what a Dutch angle means lol) but it doesn't even have its own category in Civitai...basically no loras etc :'(

ps:the images are here to attract reactions u_u all are made in Chroma

submitted by /u/Dear-Spend-2865
[link] [comments]

While Flux Kontext Dev is cooking, Bagel is already serving!

2 Junio 2025 at 05:37
While Flux Kontext Dev is cooking, Bagel is already serving!

Bagel (DFloat11 version) uses a good amount of VRAM — around 20GB — and takes about 3 minutes per image to process. But the results are seriously impressive.
Whether you’re doing style transfer, photo editing, or complex manipulations like removing objects, changing outfits, or applying Photoshop-like edits, Bagel makes it surprisingly easy and intuitive.

It also has native text2image and an LLM that can describe images or extract text from them, and even answer follow up questions on given subjects.

Check it out here:
🔗 https://github.com/LeanModels/Bagel-DFloat11

Apart from the mentioned two, are there any other image editing model that is open sourced and is comparable in quality?

submitted by /u/iChrist
[link] [comments]

Finetuning model on ~50,000-100,000 images?

2 Junio 2025 at 10:55

I haven't touched Open-Source image AI much since SDXL, but I see there are a lot of newer models.

I can pull a set of ~50,000 uncropped, untagged images with some broad concepts that I want to fine-tune one of the newer models on to "deepen it's understanding". I know LoRAs are useful for a small set of 5-50 images with something very specific, but AFAIK they don't carry enough information to understand broader concepts or to be fed with vastly varying images.

What's the best way to do it? Which model to choose as the base model? I have RTX 3080 12GB and 64GB of VRAM, and I'd prefer to train the model on it, but if the tradeoff is worth it I will consider training on a cloud instance.

The concepts are specific clothing and style.

submitted by /u/TheJzuken
[link] [comments]

What are the latest tools and services for lora training in 2025?

2 Junio 2025 at 13:00

I want to create Loras of myself and use it for image generation (fool around for recreational use) but it seems complex and overwhelming to understand the whole process. I searched online and found a few articles but most of them seem outdated. Hoping for some help from this expert community. I am curious what tools or services people use to train Loras in 2025 (for SD or Flux). Do you maybe have any useful tips, guides or pointers?

submitted by /u/im3000
[link] [comments]

Chain-of-Zoom(Extreme Super-Resolution via Scale Auto-regression and Preference Alignment)

1 Junio 2025 at 17:15
Chain-of-Zoom(Extreme Super-Resolution via Scale Auto-regression and Preference Alignment)

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:

Blur and artifacts when pushed to magnify beyond its training regime

High computational costs and inefficiency of retraining models when we want to magnify further

This brings us to the fundamental question:
How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?

We address this via Chain-of-Zoom 🔎, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM. This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.

------

Paper: https://bryanswkim.github.io/chain-of-zoom/

Huggingface : https://huggingface.co/spaces/alexnasa/Chain-of-Zoom

Github: https://github.com/bryanswkim/Chain-of-Zoom

submitted by /u/hippynox
[link] [comments]

Why most video done with comfyUI WAN looks slowish and how to avoid it ?

2 Junio 2025 at 12:52

I've been looking at videos made on comfyUI with WAN and for the vast majority of them the movement look super slow and unrealistic. But some look really real like THIS.
How do people make their video smooth and human looking ?
Any advices ?

submitted by /u/telkmx
[link] [comments]

I made a lora loader that automatically adds in the trigger words

1 Junio 2025 at 19:29
I made a lora loader that automatically adds in the trigger words

would it be useful to anyone or does it already exist? Right now it parses the markdown file that the model manager pulls down from civitai. I used it to make a lora tester wall with the prompt "tarrot card". I plan to add in all my sfw loras so I can see what effects they have on a prompt instantly. well maybe not instantly. it's about 2 seconds per image at 1024x1024

submitted by /u/Recurrents
[link] [comments]

Cheap Framepack camera control loras with one training video.

2 Junio 2025 at 05:46
Cheap Framepack camera control loras with one training video.

During the weekend I made an experiment I've had in my mind for some time; Using computer generated graphics for camera control loras. The idea being that you can create a custom control lora for a very specific shot that you may not have a reference of. I used Framepack for the experiment, but I would imagine it works for any I2V model.

I know, VACE is all the rage now, and this is not a replacement for it. It's something different to accomplish something similar. Each lora takes little more than 30 minutes to train on a 3090.

I made an article over at huggingface, with the lora's in a model repository. I don't think they're civitai worthy, but let me know if you think otherwise, and I'll post them there, as well.

Here is the model repo: https://huggingface.co/neph1/framepack-camera-controls

submitted by /u/neph1010
[link] [comments]

How do I train a FLUX-LoRA to have a stronger and more global effect across the model?

2 Junio 2025 at 13:13

I’m trying to figure out how to train a LoRA have a more noticeable and a more global impact across generations, regardless of the prompt.

For example, say I train a LoRA using only images of daisies. If I then prompt "photo of a dog" I would just get a regular dog image with no sign of daisy influence. I would like the model to give me something like "a dog with a yellow face wearing a dog cone made of petals" even if I don’t explicitly mention daisies in the prompt.

Trigger words haven't been much help.

Been experimenting with params, but this is an example where I get good results via direct prompting (but not any global effect): unetLR: 0.00035, netDim:8, netAlpha:16, batchSize:2, trainingSteps: 2025, Cosine w restarts,

submitted by /u/Dysterqvist
[link] [comments]

Tried Eromantic.ai — Low Quality, Not Worth Using Right Now

2 Junio 2025 at 14:39

I saw Reco Jefferson (@roughneck_actual) promoting Eromantic.ai on Instagram, so I signed up to see what it could do.

The image generation is bad. Even when using the “advanced prompt” option and giving it very specific instructions, the results come out deformed most of the time with messed up eyes, weird faces, and it ignores half the prompt.

The video generation is worse. It’s extremely blurry and low quality. Nothing sharp or usable came out of it.

This platform isn’t ready. It needs a lot more development before it’s worth anyone’s time or money. If you're looking for quality generation, tools like Leonardo or Stable Diffusion are a better option right now.

Has anyone actually gotten solid results from it?

submitted by /u/marketingexpert1
[link] [comments]

SDXL 6K+ LTXV 2K (5sec export video!!)

2 Junio 2025 at 12:15
SDXL 6K+ LTXV 2K (5sec export video!!)

SDXL 6K, LTXV 2K New test with LTXV in its distilled version: 5 seconds to export with my 4060ti! Crazy result with totally good output. I started with image creation with the good old SDXL (and a refined workflow with hires/detalier/UPscaler...) and then switched to LTXV. (And then upscaled the video to 2k as well). Very convincing results!

submitted by /u/Dacrikka
[link] [comments]

Hardware for best video gen

2 Junio 2025 at 14:18

Good afternoon! I am very interested in working with video generation (WAN 2.1, etc.) and training models, and I am currently putting together hardware for this. I have seen two extremely attractive options for this purpose: the AMD AI 395 Max with an iGPU 8060s and the ability to have 96 GB of VRAM (unfortunately only LPDDR5), and the NVIDIA DGX Spark. The DGX Spark hasn’t been released yet, but the AMD processors are already available. However, in all the tests I’ve found, they’re testing some trivial workloads—at best someone installs SD 3.5 for image generation, but usually they only run SD 1.5. Has anyone tested this processor on more complex tasks? How terrible is the software support for AMD (I’ve heard it’s really bad)?

submitted by /u/No-Purpose-8733
[link] [comments]
❌
❌