Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

What I've learned so far in the process of uncensoring HiDream-I1

For the past few days, I've been working (somewhat successfully) on finetuning HiDream to undo the censorship and enable it to generate not-SFW (post gets filtered if I use the usual abbreviation) images. I've had a few false starts, and I wanted to share what I've learned with the community to hopefully make it easier for other people to train this model as well.

First off, intent:

My ultimate goal is to make an uncensored model that's good for both SFW and not-SFW generations (including nudity and sex acts) and can work in a large variety of styles with good prose-based prompt adherence and retaining the ability to produce SFW stuff as well. In other words, I'd like for there to be no reason not to use this model unless you're specifically in a situation where not-SFW content is highly undesirable.

Method:

I'm taking a curriculum learning approach, where I'm throwing new things at it one thing at a time, because my understanding is that that can speed up the overall training process (and it also lets me start out with a small amount of curated data). Also, rather than doing a full finetune, I'm training a DoRA on HiDream Full and then merging those changes into all three of the HiDreams checkpoints (full, dev, and fast). This has worked well for me thus far, particularly when I zero out most of the style layers before merging the dora into the main checkpoints, preserving most of the extensive style information already in HiDream.

There are a few style layers involved in censorship (mostly likely part of the censoring process involved freezing all but those few layers and training underwear as a "style" element associated with bodies), but most of them don't seem to affect not-SFW generations at all.

Additionally, in my experiments over the past week or so, I've come to the conclusion that CLIP and T5 are unnecessary, and Llama does the vast majority of the work in terms of generating the embedding for HiDream to render. Furthermore, I have a strong suspicion that T5 actively sabotages not-SFW stuff. In my training process, I had much better luck feeding blank prompts to T5 and CLIP and training llama explicitly. In my initial run where I trained all four of the encoders (CLIPx2 + t5 + Llama) I would get a lot of body horror crap in my not-SFW validation images. When I re-ran the training giving t5 and clip blank prompts, this problem went away. An important caveat here is that my sample size is very small, so it could have been coincidence, but what I can definitely say is that training on llama only has been working well so far, so I'm going to be sticking with that.

I'm lucky enough to have access to an A100 (Thank you ShuttleAI for sponsoring my development and training work!), so my current training configuration accounts for that, running batch sizes of 4 at bf16 precision and using ~50G of vram. I strongly suspect that with a reduced batch size and running at fp8, the training process could fit in under 24 gigabytes, although I haven't tested this.

Training customizations:

I made some small alterations to ai-toolkit to accommodate my training methods. In addition to blanking out t5 and CLIP prompts during training, I also added a tweak to enable using min_snr_gamma with the flowmatch scheduler, which I believe has been helpful so far. My modified code can be found behind my patreon paywall. j/k it's right here:

https://github.com/envy-ai/ai-toolkit-hidream-custom/tree/hidream-custom

EDIT: Make sure you checkout the hidream-custom branch, or you won't be running my modified code.

I also took the liberty of adding a couple of extra python scripts for listing and zeroing out layers, as well as my latest configuration file (under the "output" folder).

Although I haven't tested this, you should be able to use this repository to train Flux and Flex with flowmatch and min_snr_gamma as well. I've submitted the patch for this to the feature requests section of the ai-toolkit discord.

These models are already uploaded to CivitAI, but since Civit seems to be struggling right now, I'm currently in the process of uploading the models to huggingface as well. The CivitAI link is here (not sfw, obviously):

https://civitai.com/models/1498292

It can also be found on Huggingface:

https://huggingface.co/e-n-v-y/hidream-uncensored/tree/main

How you can help:

Send nudes. I need a variety of high-quality, high resolution training data, preferably sorted and without visible compression artifacts. AI-generated data is fine, but it absolutely MUST have correct anatomy and be completely uncensored (that is, no mosaics or black boxes -- it's fine for naughty bits not to be visible as long as anatomy is correct). Hands in particular need to be perfect. My current focus is adding male nudity and more variety to female nudity (I kept it simple to start with just so I could teach it that vaginas exist). Please send links to any not-SFW datasets that you know of.

Large datasets with ~3 sentence captions in paragraph form without chatgpt bullshit ("the blurbulousness of the whatever adds to the overall vogonity of the scene") are best, although I can use joycaption to caption images myself, so captions aren't necessary. No video stills unless the video is very high quality. Sex acts are fine, as I'll be training on those eventually.

Seriously, if you know where I can get good training data, please PM the link. (Or, if you're a person of culture and happen to have a collection of training images on your hard drive, zip it up and upload it somewhere.)

If you want to speed this up, the absolute best thing you can do is help to expand the dataset!

If you don't have any data to send, you can help by generating images with these models and posting those images to the CivitAI page linked above, which will draw attention to it.

Tips:

  • ChatGPT is a good knowledge resource for AI training, and can to some extent write training and inference code. It's not perfect, but it can answer the sort of questions that have no obvious answers on google and will sit unanswered in developer discord servers.
  • t5 is prude as fuck, and CLIP is a moron. The most helpful thing for improving training has been removing them both from the mix. In particular, t5 seems to be actively sabotaging not-SFW training and generation. Llama, even in its stock form, doesn't appear to have this problem, although I may try using an abliterated version to see what happens.

Conclusion:

I think that covers most of it for now. I'll keep an eye on this thread and answer questions and stuff.

submitted by /u/Incognit0ErgoSum
[link] [comments]
❌