Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

The witches of RunwayML gen3-Alpha

The witches of RunwayML gen3-Alpha

Was playing around with the just today released Runwayml gen3-alpha and wanted to share my first impression. Since there is no way to fine tune or use an input image currently the style is very generic photoreal but I really liked how I managed to steer things to move nice at least. What I most enjoyed was the simple approach. Lately kinda tired with the complexity of comfyUI and animateDiff. Feels good to just think about the creative aspect and not get distracted too much by the technicalities. Hoping that things will get more simplified in the open source domain as well. An instruct LLM that can assist with tasks maybe?

submitted by /u/tarkansarim
[link] [comments]

[Meta Discussion] Kling Spam

This sub is just becoming Kling and RWML spam. The video generation services are using this community as an astro turfing field. All the videos that are irrelevant to stable diffusion, are getting upvote surges . This suggests bots are being used to signal boost these posts.

Does anyone else agree that the closed source proprietary video generation has very little justification for being here? There's probably some room to consider of course. Like a workflow for producing the best images to then take to whatever video generation service a user might want to use, but just posting straight up videos for luls seems very low effort.

Just seems like there's a crowd of dirty vikings in here that won't shut up.

submitted by /u/ScionoicS
[link] [comments]

Running DreamBooth LoRA fine-tuning with SD3 in a free-tier Colab

Running DreamBooth LoRA fine-tuning with SD3 in a free-tier Colab

We worked on a mini-project to show how to run SD3 DreamBooth LoRA fine-tuning on a free-tier Colab Notebook 🌸

The project is educational and is meant to serve as a template. Only good vibes here please 🫡
https://github.com/huggingface/diffusers/tree/main/examples/research_projects/sd3_lora_colab

Techniques used:

* We first pre-compute the text embedding as undoubtedly it's the most memory-intensive part when you use all three text encoders of SD3. Additionally, to keep the memory requirements manageable for the free-tier Colab, we use the 8bit T5 (8bit as in `llm.int8()`). This helped us reduce the memory requirements from 20GB to 10GB.

* We then use a myriad of popular techniques to conduct the actual training. This includes 8-bit Adam, SDPA, and gradient checkpointing.

Yes, none of these is new or nerve-breaking. But felt nice to be able to pull it off and put it together.

https://preview.redd.it/shvv9hl8raad1.png?width=2320&format=png&auto=webp&s=8431ac02d4df75f13711df20113506ece0b37048

submitted by /u/RepresentativeJob937
[link] [comments]

Meta 3DGen Generates Complex 3D Models in Under a Minute

Meta 3DGen Generates Complex 3D Models in Under a Minute

Meta Research has introduced 3DGen, a new AI system that creates high-quality 3D assets from text prompts in less than a minute. 3DGen combines two powerful components: AssetGen for initial 3D generation and TextureGen for enhanced texturing. The system outperforms leading industry solutions in prompt fidelity and visual quality, especially for complex scenes. 3DGen supports physically-based rendering, allowing generated assets to be used for real-world applications.

Key details:

  • Generates 3D assets with high-resolution textures and material maps
  • Produces results 3-10x faster than existing solutions
  • Supports PBR (physically-based rendering) for realistic lighting
  • Can generate new textures for existing 3D shapes
  • Outperforms baselines on prompt fidelity
  • Beats TripoSR by Stability and Tripo3D in generation time and quality
  • Evaluated by professional 3D artists and general users
  • For now only research paper published, code still not released

Source: Linkedin - Meta Research

PS: If you enjoyed this post, you'll love the free newsletter. Short daily summaries of the best AI news and insights from 300+ media, to gain time and stay ahead.

https://reddit.com/link/1dtvzeo/video/x7ubrddh06ad1/player

submitted by /u/Altruistic_Gibbon907
[link] [comments]

How To Make This Style of Infinite Zoom Morphing Videos

I want to create this kind of videos in which it endlessly zooms into the image and morphs.

https://www.youtube.com/watch?v=OrNTyPAAxuE

I experimented a while ago with disco diffusion workbooks (e.g. https://www.patreon.com/posts/i-used-ai-to-66518281) but tbh. I don't remember it quite and the results haven't been really good.

I guess the idea is probably to start with a seed image, and then render out diff. sections of the video and use the last frames a s new starting images to control the journey and to have the parameters set a bit diff. e.g. changing zoom / movement through the video. E.g. like rendering 15s bits / or a musical phrase until you have the full video done. Or maybe you need some kind of checkpoint images where it should morph to inbetween?

I'm quite a rookie and would be looking for tutorials you can recommend or even better example workbooks I can run, or online platforms which simplify the process. I'm happy to join a Patreon or buy a prompt if that makes things easier/faster. But my issue is a bit that I'm not even sure what I shall google for, and whats the right process. I mostly landed on tutorials which link to civitai, but I guess this is more for morphing a real video into a certain model e.g. a comic character.

submitted by /u/psiger
[link] [comments]

DynamiCrafter finetuning update to resolve Conditional Image Leakage

We had a recent update to DynamiCrafter ComfyUI wrapper, this time resolving the Conditional Image Leakage (CIL) issue. The authors for CIL are still working on the non-watermarked model and various resolutions. So I had spent some time trying out the watermarked model for 576x1024 resolution.

I used 945M, disabled analytic_init, 26 steps, cfg 7.5.
Generation speed is slower (obviously due to additional computations).

Video (1.5s):
https://imgur.com/a/o3zdDuD

I managed to get the character to turn her head to face the viewer.
Now I know this is not Kling, Luma, Gen3 standard, but at least this is what our open source research community has managed to progress.

For now, I am going to check if they release the non-watermarked model everyday as I am done waiting for 14 hours for Luma to process my generation request.

submitted by /u/doogyhatts
[link] [comments]
❌