Hey everyone! Let's revisit SD 1.5! test tri-structured prompts
![]() | Since I have weaker hardware, I never really stopped using it, and sometimes I come across techniques I’ve never seen before. For example, I recently found a new model (Contra Base) and decided to run it through my standard set of prompts for comparison (I have a special spreadsheet for that). The author mentioned something called a tri-structured prompt for image generation, which they used during training. I didn’t recall seeing this approach before, so I decided to check it out. Here’s a link to the article. In short:
I'll assume you roughly understand this logic and won't dive into the finer details.
However, having everything in a single text field is much more convenient - both for reading and editing, especially when using multiple Let’s play around and see what results we get. First, we need to set up the basics. Let's start with the default pipeline. For some reason, I wanted to generate a clown in an unusual environment, so I wrote this: Not very readable, right? And we still need to write the negative prompt, as described in the article. A more readable format could look something like this: After some thought, I decided the prompt format should be more intuitive and visually clear while still being easy to process. So I restructured it like this: The first line is the main prompt, with clarifying details listed below as keyword branches. I added underscores for better readability. They don’t affect generation significantly, but I’ll remove them before processing. For comparison, of course, I decided to test what would be generated without BREAK commands to see how much of an impact they have. Let's begin! The resolution I want to make more than 768 points, which will give us repetitions and duplications without additional “dodging” of the model... As expected! One noticeable difference: the To keep it simple, I added:
Much better results! Not perfect, but definitely more interesting. Then I remembered that in SD 1.5, I could use an external encoder instead of the one from the loaded model. Flux works well for this. Using this one, I got these results: What conclusions can be drawn about using this prompting method? "It's not so simple." I think no one would argue with that. Some things improved, while others were lost. By the way, my clown just kept juggling, no matter how much I tweaked the prompt. But I didn’t stress over it too much. One key takeaway: increasing the number of “layers” indefinitely is a bad idea. The more nested branches there are, the less weight each one carries in the overall prompt, which leads to detail loss. So in my opinion, 3 to 4 clarifications are the optimal amount. a smaller number of branches gives a clearer following Now, let’s try other prompt variations for better comparison. While working with this, I discovered that Kohya Deep Shrink sometimes swaps colors - turning the dress green and the car red. It seems to depend on the final image resolution. Different samplers also handle this prompt differently (who would’ve thought, right?). Another interesting detail: I clearly specified that the dress should be fluffy with large knots. In the general prompt, this token is considered, but since there are many layers, its weight is diluted, resulting in just a plain red dress. Also, the base prompt tends to generate a doll-like figure, while the branches produce a more realistic image. Let’s try another one: No cats here. And no painterly effect from the branches. My guess? Since the painting-style tokens are placed in just one out of five branches, their total weight is only one-fifth of the overall prompt. Let’s test this by increasing the weight of that branch. With a small boost, no visible changes. But if we overdo it (e.g., 1.6), abstract painting tokens dominate, making the image completely off-topic. weight of painting brunch is 1.55 Conclusion: this method is not suitable for defining overall art style. And finally, let’s wrap up with a cat holding a sign. Of course, SD 1.5 won’t magically generate perfect text, but splitting into branches does improve results. Final thoughtsIn my opinion, this prompting technique can be useful for refining a few specific elements, but it doesn't work as the original article described. More branches = less influence per branch = loss of control. Right now, I think there are better ways to add complexity and detail to SD 1.5 models. For example, ELLA handles more intricate prompts much better. To test this, I used the same prompts with ELLA and the same seed values: If anyone wants to experiment, I’ve uploaded my setup here. Let me know your thoughts or if you see any flaws in my approach. Happy generating! 🎨🚀 [link] [comments] |