Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.

SD3 API (from 2 months ago) and SD3m comparison

27 Junio 2024 at 13:18
SD3 API (from 2 months ago) and SD3m comparison

Some time ago when the SD3 API was released and we still hoped the open model would be on par with its performance, a series of prompts was tried and compared to MJ and Dall-E.

For reference, here are the links to the results of this comparison:\_first\_impression\_from\_prompt\_list\_comparison/\_first\_impression\_from\_prompt\_list\_comparison/\_first\_impression\_from\_prompt\_list\_comparison/\_first\_impression\_from\_prompt\_list\_comparison/

Now that it's possible (not certain, but a possibility) that the SD3m is the only model we'll get, I thought it would be useful to rerun the prompts of these threads, generate 8 of them and comment on the result.

TLDR: the SD3m model is FAR FAR FAR worse than the API of two month ago.

Test 1 : Inside a steampunk workshop, a young cute redhead inventor, wearing blue overall and a glowing blue tattoo on her left shoulder, is working on a mechanical spider

This one gave OK results compared to the SD3 API/Dall-E, but with much less variation for the mechanical spiders, more hesitation over the number of legs it should have and failed with the location of the tattoo. It can fail to put it on the correct arm, or, worse, put it over the clothing, or make it the wrong color. Interestingly, the API made the inventor wear only overalls, while in 7 out of 8 case, the medium model Added a white undeclothing. It's more realistic, but it's interesting that it avoided to show more skin than necessary. Hands are generally garbled, which is sad since it was supposedly a strong point of SD3.

The best out of 8 was this one:

Test 2

prompt: A fluffy blue cat with black bat wings is flying in a steampunk workshop, breathing fire at a mouse

In this case, the API failed to have the cat breath fire from its mouth, and the SD3m model fails as well. But it also failed, in 6 out of 8 cases, to have a cat with two bat wings. The best outcome is meh, it has all the elements but the positionning fails hard.

Test 3 : A trio of typical D&D adventurer are looking through the bushes at a forest clearing in which a gothic manor is standing. In the night sky, three moons can be seen, the large green one, the small red one and the white one

IN this one, I can't but notice that the 8 images are _very_ close, the model displaying small variety. The API one did better, as well as D3. For example, all the characters have white hair, as if the typical D&D party was recruited among retirement home escapees. Same with the manor, which doesn't display a lot of variation. With regard to prompt respect, one can't have 3 moons of the right colour. Generally, I got 3 white moons. This is severely disappointing as prompt adherence was supposed to be a strong suit of this model.

Test 4 : A dynamic image depicting a naval engagement between an 18th century man-of-war and a 20th century battleship. The scene shows the man-of-war with its tall sails and cannons, juxtaposed against the formidable steel structure of the modern battleship equipped with large gun turrets. The ocean around them is turbulent, illustrating the clash of eras in naval warfare. The background features stormy skies and high waves, enhancing the dramatic effect of this historical and technological confrontation. This image blends historical accuracy with imaginative interpretation, showcasing the stark contrast in naval technology.

1 out of SIXTEEN displayed a wooden ship and a steel ship. All the other had two steel warships. It's a fail and a strong step back from the API model.

Test 5 : The breathtaking view of the Garden Dome in a space station orbiting Uranus, with passengers sitting and having coffee

MUCH less interesting images than the API. Visages and hands are bad. More focus on people having coffee than on representing Uranus (0 out of 8). I should try to ask for Jupiter because maybe SAI thought it was unsafe and unethical to look at Uranus?

Test 6 : An orc and an elf swordfighting. The elf wields a katana, the orc a crude bone saber. The orc is wearing a loincloth, the elf an intricate silvery plate armor

This one is awful. I got 0 elf out of 8 generation. Only two orcs battling, disregarding the intricate silvery armor and the weapons descriptions. Exceptionnally, the (slightly) worst out of 8, but they are all awful:

Test 7 : A man juggling with three balls, one red, one blue, one green, while holding one one foot clad in a yellow boo

Another awful one. SD3m can't do poses. The best out of 8 was this one...

but the average generation was more like this one :

Test 8 : A man doing a handstand while riding a bicycle in front of a mirror

This one generated body horror. The API AND Dall-E didn't do well on this one, so I won't post images but it is awful.

Test 9 : A woman wearing a 18th century attire, on all four, facing the viewer, on a table in a pirate tavern

The fact that this is the best out of 8 should suffice to say that most of my prompt was ignored, despite being extremely safe for work, 18th century dress are all covering. I never got an image of the woman on the table. Neither did I get a pirate tavern, unless those were place of Learning (I got books on the table in 6 cases out of 8).

Test 10 :

A defeated trio of SS soldiers on the East Front, looking sad

No evocation of the East Front, no mention of them being SS or defeated. I got a trio of random soldiers. Another big fail.

Test 11 : A vivid depiction of the Easter procession in Sevilla, highlighting penitents wearing their iconic pointed hoods. The scene is set in the historic streets of Sevilla, with penitents dressed in traditional robes and hoods, creating a solemn and reflective atmosphere. The procession includes ornate pasos (floats) carrying religious icons, surrounded by a crowd of onlookers. The architecture of Sevilla, with its intricate details and historic charm, forms the backdrop, emphasizing the deep religious and cultural significance of this annual event.

A mix of body horror, penitents without eyes and Strange things.

Test 12: A detailed picture of a sexy catgirl doing a handstand over a table

100% fails. Body horror generally. D3 does much better, despite being heavily censored, which some claims SD3 isn't.

Test 13 : a bulky man in the halasana yoga pose, cheered by a pair of cherleaders.

Body Horror mostly. Interestingly it got the cheerleaders...

Test 14 : a person holding a foot with his or her hands, his or her face obviously in pain

All are body-horror level... Admittedly Dall-E can't do it quite right either, but at least it has a semblance of adhereing to the prompt. Or it draws a foot.

Maybe SD3m can be saved with finetunes but it behaves so bad compared to base SDXL that I wonder if it's worth it to try to improve a 2B model, nerfed on anatomy and dynamic poses as this one.

submitted by /u/Mean_Ship4545
[link] [comments]