Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

A different approach to fix Flux weaknesses with LoRAs (Negative weights)

A different approach to fix Flux weaknesses with LoRAs (Negative weights)

Image on the left: Flux, no LoRAs.

Image on the center: Flux with the negative weight LoRA (-0.60).

Image on the right: Flux with the negative weight LoRA (-0.60) and this LoRA (+0.20) to improve detail and prompt adherence.

Many of the LoRAs created to try and make Flux more realistic, better skin, better accuracy on human like pictures, a part of those still have the Plastic-ish skin of Flux, but the thing is: Flux knows how to make realistic skin, it has the knowledge, but the fake skin recreated is the only dominant part of the model, to say an example:

-ChatGPT

So instead of trying to make the engine louder for the mechanic to repair, we should lower the noise of the exhausts, and that's the perspective I want to bring in this post, Flux has the knoledge of how real skin looks like, but it's overwhelmed by the plastic finish and AI looking pics, to force Flux to use his talent, we have to train a plastic skin LoRA and use negative weights to force it to use his real resource to present real skin, realistic features, better cloth texture.

So the easy way is just creating a good amount of pictures and variety you need with the bad examples you want to pic, bad datasets, low quality, plastic and the Flux chin.

In my case I used joycaption, and I trained a LoRA with 111 images, 512x512. Describe the Ai artifacts on the image, Describe the plastic skin... etc.

I'm not an expert, I just wanted to try since I remembered some Sd 1.5 LoRAs that worked like this, and I know some people with more experience would like to try this method.

Disadvantages: If Flux doesn't know how to do certain things (like feet in different angles) may not work at all, since the model itself doesn't know how to do it.

In the examples you can see that the LoRA itself downgrades the quality, it can be due to overtraining, using low resolution like 512x512, and that's the reason I wont share the LoRA since it's not worth it for now.

Half body shorts and Full body shots look more pixelated.

The bokeh effect or depth of field still intact, but I'm sure it can be solved.

Joycaption is not the most diciplined with the instructions I wrote, for example it didn't mention the "bad quality" on many of the images of the dataset, it didn't mention the plastic skin on every image, so if you use it make sure to manually check every caption, and correct if necessary.

submitted by /u/TableFew3521
[link] [comments]

My results on LTXV 9.5

My results on LTXV 9.5

Hi everyone! I'm sharing my results using LTXV. I spent several days trying to get a "decent" output, and I finally made it!
My goal was to create a simple character animation — nothing too complex or with big movements — just something like an idle animation.
These are my results, hope you like them! I'm happy to hear any thoughts or feedback!

submitted by /u/Eliot8989
[link] [comments]

I have created an optimized setup for using AMD APUs (including Vega)

Hi everyone,

I have created a relatively optimized setup using a fork of Stable Diffusion from here:

likelovewant/stable-diffusion-webui-forge-on-amd: add support on amd in zluda

and

ROCM libraries from:

brknsoul/ROCmLibs: Prebuilt Windows ROCm Libs for gfx1031 and gfx1032

After a lot of experimenting, I have set Token Merging to 0.5 and used Stable Diffusion LCM models using the LCM Sampling Method and Schedule Type Karras at 4 steps. Depending on system load and usage or a 512 width x 640 length image, I was able to achieve as fast as 4.40s/it. On average it hovers around ~6s/it. on my Mini PC that has a Ryzen 2500u CPU (Vega 8), 32GB of DDR4 3200 RAM, and 1TB SSD. It may not be as fast as my gaming rig but uses less than 25w on full load.

Overall, I think this is pretty impressive for a little box that lacks a GPU. I should also note that I set the dedicated portion of graphics memory to 2GB in the UEFI/BIOS and used the ROCM 5.7 libraries and then added the ZLUDA libraries to it, as in the instructions.

Here is the webui-user.bat file configuration:

@echo off @REM cd /d %~dp0 @REM set PYTORCH_TUNABLEOP_ENABLED=1 @REM set PYTORCH_TUNABLEOP_VERBOSE=1 @REM set PYTORCH_TUNABLEOP_HIPBLASLT_ENABLED=0 set PYTHON= set GIT= set VENV_DIR= set SAFETENSORS_FAST_GPU=1 set COMMANDLINE_ARGS= --use-zluda --theme dark --listen --opt-sub-quad-attention --upcast-sampling --api --sub-quad-chunk-threshold 60 @REM Uncomment following code to reference an existing A1111 checkout. @REM set A1111_HOME=Your A1111 checkout dir @REM @REM set VENV_DIR=%A1111_HOME%/venv @REM set COMMANDLINE_ARGS=%COMMANDLINE_ARGS% ^ @REM --ckpt-dir %A1111_HOME%/models/Stable-diffusion ^ @REM --hypernetwork-dir %A1111_HOME%/models/hypernetworks ^ @REM --embeddings-dir %A1111_HOME%/embeddings ^ @REM --lora-dir %A1111_HOME%/models/Lora call webui.bat 

I should note, that you can remove or fiddle with --sub-quad-chunk-threshold 60; removal will cause stuttering if you are using your computer for other tasks while generating images, whereas 60 seems to prevent or reduce that issue. I hope this helps other people because this was such a fun project to setup and optimize.

submitted by /u/technofox01
[link] [comments]

Liquid: Language Models are Scalable and Unified Multi-modal Generators

Liquid: Language Models are Scalable and Unified Multi-modal Generators

Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100× in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as Qwen2.5 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation.

Liquid has been open-sourced on 😊 Huggingface and 🌟 GitHub.
Demo: https://huggingface.co/spaces/Junfeng5/Liquid_demo

submitted by /u/fruesome
[link] [comments]

Some recent sci-fi artworks ... (SD3.5Large *3, Wan2.1, Flux Dev *2, Photoshop, Gigapixel, Photoshop, Gigapixel, Photoshop)

Some recent sci-fi artworks ... (SD3.5Large *3, Wan2.1, Flux Dev *2, Photoshop, Gigapixel, Photoshop, Gigapixel, Photoshop)

Here's a few of my recent sci-fi explorations. I think I'm getting better at this. Original resolution is 12k Still some room for improvement in several areas but pretty pleased with it.

I start with Stable Diffusion 3.5 Large to create a base image around 720p.
Then two further passes to refine details.

Then an up-scale to 1080p with Wan2.1.

Then two passes of Flux Dev at 1080p for refinement.

Then fix issues in photoshop.

Then upscale with Gigapixel using the diffusion Refefine model to 8k.

Then fix more issues with photoshop and adjust colors etc.

Then another upscale to 12k or so with Gigapixel High Fidelity.

Then final adjustments in photoshop.

submitted by /u/ih2810
[link] [comments]

Wan 2.1 Knowledge Base 🦢 with workflows and example videos

Wan 2.1 Knowledge Base 🦢 with workflows and example videos

This is an LLM-generated, hand-fixed summary of the #wan-chatter channel on the Banodoco Discord.

Generated on April 7, 2025.

Created by Adrien Toupet: https://www.ainvfx.com/
Ported to Notion by Nathan Shipley: https://www.nathanshipley.com/

Thanks and all credit for content to Adrien and members of the Banodoco community who shared their work and workflows!

submitted by /u/AtreveteTeTe
[link] [comments]

Userscript to fix ram/bandwidth issue on Civitai

Userscript to fix ram/bandwidth issue on Civitai

Since Civitai added gif/badge/clutter the website has been sluggish.

Turns out they allow 50mb images for profiles and some of their gif badge/badge animation are +10mb.
When you are loading a gallery with potentially 100 different ones, it's no wonder the thing takes so long to load.

Just a random example, do we really need to load a 3mb gif for 32x32px ?

So, with the help of our friend deepseek, here is an userscript that prevent some html elements to load (using Violentmonkey/Greasemonkey/Tampermonkey):
https://github.com/Poutchouli/CivitAI-Crap-Blocker

The script removes the avatars, badges, avatar outlines, outline gradients on images.

I tested it on Chrome and Brave, if you find any issue make sure to either open an issue on github or tell me about it here. Also I do not generate images on there, so the userscript might interfere with it, but I haven't ran into any issues with the few tests I did.

Here is the before/after with loading the front page

Some badges still shows up because they don't stick to their naming conventions. But the script should hide 90% of them, the worst offenders are the gifs ones which are mostly covered in those 90%.

submitted by /u/Patchipoo
[link] [comments]
❌