Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
Ayer — 27 Junio 2024StableDiffusion

Open-Sora does promising video generation on consumer GPUs

27 Junio 2024 at 09:49
Open-Sora does promising video generation on consumer GPUs

Generating a 240p 2/4s clip takes 30/60s and uses up nearly all the memory (24GB).

It follows prompts decently well but struggles with detail.

You may need quite a few attempts to get a share worthy result.

The Open-Sora GitHub repository provides access to the model and details on how to get started.

Alternatively, there's a hosted environment on Backprop that shows more examples and lets you try out the model on a 3090 without any setup.

This is just version 1.2 - excited to see what future improvements bring!

submitted by /u/ojasaar
[link] [comments]

So, you have generated hundreds of thousands of images, what now?

26 Junio 2024 at 23:13

That's what I keep asking myself. Why am I doing this? What am I wanting to do with all these generated images?

Before I got into Stable Diffusion I mainly used 3d apps to create videos. One app that I have used in the past is Daz3d Studio, but not to create videos with, though. And that I rarely used it to generate images, which is what Daz3d is mainly known for. I mostly used it to port 3d models via fbx and obj, etc, to these other apps that I used to create videos with. Now I no longer even do that because I have somehow become unreasonably addicted to Stable Diffusion and have lost interest in what I was doing before I found out about Stable Diffusion.

And like I already pointed out, generating images was never anything I was into, even when I was using Daz3d a lot. I still have all these other 3d apps installed but now find them boring compared to Stable Diffusion.

And now I have generated well over 200,000 images and I have no clue what I'm supposed to do with them? There has to be a use for that many images except I wouldn't know what is. Seems like I just like to generate images to just collect them then do nothing with them after that. And some of you with top of the line Gpus, by now you are probably into your millions of images you have generated. And I can't even figure out something useful to do with 2k plus images. Couldn't imagine if I had a million or more I need to try and do something useful with.

No doubt about it in my mind, this Stable Diffusion AI is the most addicting thing one can do on their computer. There is no way this Stable Diffusion AI stuff is just a fad and will eventually fade away before we know it. It's here to stay, apparently. Maybe even for forever.

submitted by /u/Fabulous-Ad9804
[link] [comments]

SD3 API (from 2 months ago) and SD3m comparison

27 Junio 2024 at 13:18
SD3 API (from 2 months ago) and SD3m comparison

Some time ago when the SD3 API was released and we still hoped the open model would be on par with its performance, a series of prompts was tried and compared to MJ and Dall-E.

For reference, here are the links to the results of this comparison:

https://www.reddit.com/r/StableDiffusion/comments/1c94ojx/sd3\_first\_impression\_from\_prompt\_list\_comparison/

https://www.reddit.com/r/StableDiffusion/comments/1c94698/sd3\_first\_impression\_from\_prompt\_list\_comparison/

https://www.reddit.com/r/StableDiffusion/comments/1c93h5k/sd3\_first\_impression\_from\_prompt\_list\_comparison/

https://www.reddit.com/r/StableDiffusion/comments/1c92acf/sd3\_first\_impression\_from\_prompt\_list\_comparison/

Now that it's possible (not certain, but a possibility) that the SD3m is the only model we'll get, I thought it would be useful to rerun the prompts of these threads, generate 8 of them and comment on the result.

TLDR: the SD3m model is FAR FAR FAR worse than the API of two month ago.

Test 1 : Inside a steampunk workshop, a young cute redhead inventor, wearing blue overall and a glowing blue tattoo on her left shoulder, is working on a mechanical spider

This one gave OK results compared to the SD3 API/Dall-E, but with much less variation for the mechanical spiders, more hesitation over the number of legs it should have and failed with the location of the tattoo. It can fail to put it on the correct arm, or, worse, put it over the clothing, or make it the wrong color. Interestingly, the API made the inventor wear only overalls, while in 7 out of 8 case, the medium model Added a white undeclothing. It's more realistic, but it's interesting that it avoided to show more skin than necessary. Hands are generally garbled, which is sad since it was supposedly a strong point of SD3.

The best out of 8 was this one:

https://preview.redd.it/bfv8qopw549d1.png?width=1024&format=png&auto=webp&s=1970a41737b3ba0fceebea129a6926b2240cfe6e

Test 2

prompt: A fluffy blue cat with black bat wings is flying in a steampunk workshop, breathing fire at a mouse

In this case, the API failed to have the cat breath fire from its mouth, and the SD3m model fails as well. But it also failed, in 6 out of 8 cases, to have a cat with two bat wings. The best outcome is meh, it has all the elements but the positionning fails hard.

https://preview.redd.it/du6f8suy549d1.png?width=1024&format=png&auto=webp&s=339222d04d04d44b428c6731784bcc4f8c0403fd

Test 3 : A trio of typical D&D adventurer are looking through the bushes at a forest clearing in which a gothic manor is standing. In the night sky, three moons can be seen, the large green one, the small red one and the white one

https://preview.redd.it/droxl5c4649d1.png?width=1024&format=png&auto=webp&s=35e93b5086d9179e7ac985e6c8c66742c11d68ae

IN this one, I can't but notice that the 8 images are _very_ close, the model displaying small variety. The API one did better, as well as D3. For example, all the characters have white hair, as if the typical D&D party was recruited among retirement home escapees. Same with the manor, which doesn't display a lot of variation. With regard to prompt respect, one can't have 3 moons of the right colour. Generally, I got 3 white moons. This is severely disappointing as prompt adherence was supposed to be a strong suit of this model.

Test 4 : A dynamic image depicting a naval engagement between an 18th century man-of-war and a 20th century battleship. The scene shows the man-of-war with its tall sails and cannons, juxtaposed against the formidable steel structure of the modern battleship equipped with large gun turrets. The ocean around them is turbulent, illustrating the clash of eras in naval warfare. The background features stormy skies and high waves, enhancing the dramatic effect of this historical and technological confrontation. This image blends historical accuracy with imaginative interpretation, showcasing the stark contrast in naval technology.

1 out of SIXTEEN displayed a wooden ship and a steel ship. All the other had two steel warships. It's a fail and a strong step back from the API model.

https://preview.redd.it/i68x6t67649d1.png?width=1024&format=png&auto=webp&s=ca85d126bcf081b211b88135200c8a7ddaa19aaf

Test 5 : The breathtaking view of the Garden Dome in a space station orbiting Uranus, with passengers sitting and having coffee

https://preview.redd.it/223adld8649d1.png?width=1024&format=png&auto=webp&s=8219326892affa6676a45874a0ce78b2bd1d15b8

MUCH less interesting images than the API. Visages and hands are bad. More focus on people having coffee than on representing Uranus (0 out of 8). I should try to ask for Jupiter because maybe SAI thought it was unsafe and unethical to look at Uranus?

Test 6 : An orc and an elf swordfighting. The elf wields a katana, the orc a crude bone saber. The orc is wearing a loincloth, the elf an intricate silvery plate armor

This one is awful. I got 0 elf out of 8 generation. Only two orcs battling, disregarding the intricate silvery armor and the weapons descriptions. Exceptionnally, the (slightly) worst out of 8, but they are all awful:

https://preview.redd.it/u7n5ydsa649d1.png?width=1024&format=png&auto=webp&s=061b14ba6462564f57a20901bc5ad828330e3e80

Test 7 : A man juggling with three balls, one red, one blue, one green, while holding one one foot clad in a yellow boo

Another awful one. SD3m can't do poses. The best out of 8 was this one...

https://preview.redd.it/mztm3isc649d1.png?width=1024&format=png&auto=webp&s=c35082d410fe09f7d96dfec2a1f34d1594833255

but the average generation was more like this one :

https://preview.redd.it/4yuartne649d1.png?width=1024&format=png&auto=webp&s=d802617a87858cfc9b2e198bf3bdd51c7ba9d398

Test 8 : A man doing a handstand while riding a bicycle in front of a mirror

This one generated body horror. The API AND Dall-E didn't do well on this one, so I won't post images but it is awful.

Test 9 : A woman wearing a 18th century attire, on all four, facing the viewer, on a table in a pirate tavern

https://preview.redd.it/fdnrnvjg649d1.png?width=1024&format=png&auto=webp&s=59e3f97db34343bc0bce0c3236624fd126d03a8f

The fact that this is the best out of 8 should suffice to say that most of my prompt was ignored, despite being extremely safe for work, 18th century dress are all covering. I never got an image of the woman on the table. Neither did I get a pirate tavern, unless those were place of Learning (I got books on the table in 6 cases out of 8).

Test 10 :

A defeated trio of SS soldiers on the East Front, looking sad

https://preview.redd.it/ek7jucvi649d1.png?width=1024&format=png&auto=webp&s=d9bc10d67be1542bcf4e5482e607854934422747

No evocation of the East Front, no mention of them being SS or defeated. I got a trio of random soldiers. Another big fail.

Test 11 : A vivid depiction of the Easter procession in Sevilla, highlighting penitents wearing their iconic pointed hoods. The scene is set in the historic streets of Sevilla, with penitents dressed in traditional robes and hoods, creating a solemn and reflective atmosphere. The procession includes ornate pasos (floats) carrying religious icons, surrounded by a crowd of onlookers. The architecture of Sevilla, with its intricate details and historic charm, forms the backdrop, emphasizing the deep religious and cultural significance of this annual event.

A mix of body horror, penitents without eyes and Strange things.

https://preview.redd.it/0f2i5f4m649d1.png?width=1024&format=png&auto=webp&s=7beb2312d3035ece829fbc6c6478887a608a658e

https://preview.redd.it/tyj3x3en649d1.png?width=1024&format=png&auto=webp&s=6824cc9a6b781221f3d8d5682037444c7c804238

Test 12: A detailed picture of a sexy catgirl doing a handstand over a table

100% fails. Body horror generally. D3 does much better, despite being heavily censored, which some claims SD3 isn't.

https://preview.redd.it/slihca3p649d1.png?width=1024&format=png&auto=webp&s=add024516473cb610b9a42773bba50a7a9822642

Test 13 : a bulky man in the halasana yoga pose, cheered by a pair of cherleaders.

https://preview.redd.it/rsw9vepq649d1.png?width=1024&format=png&auto=webp&s=2e2a3c81afd5a8f875559a5bcc37bc1cb34c866a

Body Horror mostly. Interestingly it got the cheerleaders...

Test 14 : a person holding a foot with his or her hands, his or her face obviously in pain

https://preview.redd.it/6rljb26t649d1.png?width=1024&format=png&auto=webp&s=89e047cbc00ed204b79d5537d05f9b7c3b8e83e5

All are body-horror level... Admittedly Dall-E can't do it quite right either, but at least it has a semblance of adhereing to the prompt. Or it draws a foot.

Maybe SD3m can be saved with finetunes but it behaves so bad compared to base SDXL that I wonder if it's worth it to try to improve a 2B model, nerfed on anatomy and dynamic poses as this one.

submitted by /u/Mean_Ship4545
[link] [comments]

Update and FAQ on the Open Model Initiative – Your Questions Answered

26 Junio 2024 at 16:05

Hello r/StableDiffusion --

A sincere thanks to the overwhelming engagement and insightful discussions following our announcement yesterday of the Open Model Initiative. If you missed it, check it out here.

We know there are a lot of questions, and some healthy skepticism about the task ahead. We'll share more details as plans are formalized -- We're taking things step by step, seeing who's committed to participating over the long haul, and charting the course forwards.

That all said - With as much community and financial/compute support as is being offered, I have no hesitation that we have the fuel needed to get where we all aim for this to take us. We just need to align and coordinate the work to execute on that vision.

We also wanted to officially announce and welcome some folks to the initiative, who will support with their expertise on model finetuning, datasets, and model training:

  • AstraliteHeart, founder of PurpleSmartAI and creator of the very popular PonyXL models
  • Some of the best model finetuners including Robbert "Zavy" van Keppel and Zovya
  • Simo Ryu, u/cloneofsimo, a well-known contributor to Open Source AI
  • Austin, u/AutoMeta, Founder of Alignment Lab AI
  • Vladmandic & SD.Next
  • And over 100 other community volunteers, ML researchers, and creators who have submitted their request to support the project

Due to voiced community concern, we’ve discussed with LAION and agreed to remove them from formal participation with the initiative at their request. Based on conversations occurring within the community we’re confident that we’ll be able to effectively curate the datasets needed to support our work.


Frequently Asked Questions (FAQs) for the Open Model Initiative

We’ve compiled a FAQ to address some of the questions that were coming up over the past 24 hours.

How will the initiative ensure the models are competitive with proprietary ones?

We are committed to developing models that are not only open but also competitive in terms of capability and performance. This includes leveraging cutting-edge technology, pooling resources and expertise from leading organizations, and continuous community feedback to improve the models.

The community is passionate. We have many AI researchers who have reached out in the last 24 hours who believe in the mission, and who are willing and eager to make this a reality. In the past year, open-source innovation has driven the majority of interesting capabilities in this space.

We’ve got this.

What does ethical really mean?

We recognize that there’s a healthy sense of skepticism any time words like “Safety” “Ethics” or “Responsibility” are used in relation to AI.

With respect to the model that the OMI will aim to train, the intent is to provide a capable base model that is not pre-trained with the following capabilities:

  • Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts
  • Generating the likeness of unconsented individuals
  • The production of AI Generated Child Sexual Abuse Material (CSAM).

There may be those in the community who chafe at the above restrictions being imposed on the model. It is our stance that these are capabilities that don’t belong in a base foundation model designed to serve everyone.

The model will be designed and optimized for fine-tuning, and individuals can make personal values decisions (as well as take the responsibility) for any training built into that foundation. We will also explore tooling that helps creators reference styles without the use of artist names.

Okay, but what exactly do the next 3 months look like? What are the steps to get from today to a usable/testable model?

We have 100+ volunteers we need to coordinate and organize into productive participants of the effort. While this will be a community effort, it will need some organizational hierarchy in order to operate effectively - With our core group growing, we will decide on a governance structure, as well as engage the various partners who have offered support for access to compute and infrastructure.

We’ll make some decisions on architecture (Comfy is inclined to leverage a better designed SD3), and then begin curating datasets with community assistance.

What is the anticipated cost of developing these models, and how will the initiative manage funding?

The cost of model development can vary, but it mostly boils down to the time of participants and compute/infrastructure. Each of the initial initiative members have business models that support actively pursuing open research, and in addition the OMI has already received verbal support from multiple compute providers for the initiative. We will formalize those into agreements once we better define the compute needs of the project.

This gives us confidence we can achieve what is needed with the supplemental support of the community volunteers who have offered to support data preparation, research, and development.

Will the initiative create limitations on the models' abilities, especially concerning NSFW content?

It is not our intent to make the model incapable of NSFW material. “Safety” as we’ve defined it above, is not restricting NSFW outputs. Our approach is to provide a model that is capable of understanding and generating a broad range of content.

We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM.

What license will the model and model weights have?

TBD, but we’ve mostly settled between an MIT or Apache 2 license.

What measures are in place to ensure transparency in the initiative’s operations?

We plan to regularly update the community on our progress, challenges, and changes through the official Discord channel. As we evolve, we’ll evaluate other communication channels.

Looking Forward

We don’t want to inundate this subreddit so we’ll make sure to only update here when there are milestone updates. In the meantime, you can join our Discord for more regular updates.

If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI.

Thank you for your support and enthusiasm!

Sincerely,

The Open Model Initiative Team

submitted by /u/hipster_username
[link] [comments]

How do I start with Stable Diffusion?

27 Junio 2024 at 13:31

I am a complete beginner but I have seen people create cool art. I do art on side like sketching and digital art and I think I can use my creativity on this platform. I run a MacBook Pro M2, just wanted to know from you guys how you guys started with stable diffusion and what are the steps needed for it. Thanks

submitted by /u/CompetitiveFilm3585
[link] [comments]
❌
❌