Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

Someone leaked an API to Sora on HuggingFace( it has been suspended already)

Here's the link https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora

He're the manifesto in case the page is going to be deleted

┌∩┐(◣◢)┌∩┐ DEAR CORPORATE AI OVERLORDS ┌∩┐(◣◢)┌∩┐

We received access to Sora with the promise to be early testers, red teamers and creative partners. However, we believe instead we are being lured into "art washing" to tell the world that Sora is a useful tool for artists.

Hundreds of artists provide unpaid labor through bug testing, feedback and experimental work for the program for a $150B valued company. While hundreds contribute for free, a select few will be chosen through a competition to have their Sora-created films screened — offering minimal compensation which pales in comparison to the substantial PR and marketing value OpenAI receives.

▌║█║▌║█║▌║ DENORMALIZE BILLION DOLLAR BRANDS EXPLOITING ARTISTS FOR UNPAID R&D AND PR ║▌║█║▌║█║▌

Furthermore, every output needs to be approved by the OpenAI team before sharing. This early access program appears to be less about creative expression and critique, and more about PR and advertisement.

[̲̅$̲̅(̲̅ )̲̅$̲̅] CORPORATE ARTWASHING DETECTED [̲̅$̲̅(̲̅ )̲̅$̲̅]

We are releasing this tool to give everyone an opportunity to experiment with what ~300 artists were offered: a free and unlimited access to this tool.

We are not against the use of AI technology as a tool for the arts (if we were, we probably wouldn't have been invited to this program). What we don't agree with is how this artist program has been rolled out and how the tool is shaping up ahead of a possible public release. We are sharing this to the world in the hopes that OpenAI becomes more open, more artist friendly and supports the arts beyond PR stunts.

We call on artists to make use of tools beyond the proprietary:

Open Source video generation tools allow artists to experiment with the avant garde free from gate keeping, commercial interests or serving as PR to any corporation. We also invite artists to train their own models with their own datasets.

Some open source video tools available are: Open Source video generation tools allow artists to experiment with avant garde tools without gate keeping, commercial interests or serving as a PR to any corporation. Some open source video tools available are:

CogVideoX

Mochi 1

LTX Video

Pyramid Flow

However, as we are aware not everyone has the hardware or technical capability to run open source tools and models, we welcome tool makers to listen to and provide a path to true artist expression, with fair compensation to the artists.

Enjoy,

some sora-alpha-artists, Jake Elwes, Memo Akten, CROSSLUCID, Maribeth Rauh, Joel Simon, Jake Hartnell, Bea Ramos, Power Dada, aurèce vettier, acfp, Iannis Bardakos, 204 no-content | Cintia Aguiar Pinto & Dimitri De Jonghe, Emmanuelle Collet, XU Cheng

submitted by /u/Querens
[link] [comments]

Food Photography (Prompts Included)

Food Photography (Prompts Included)

I've been working on prompts to achieve photorealistic and super-detailed food photos uisnf Flux. Here are some of the prompts I used, I thought some of you might find them helpful:

A luxurious chocolate lava cake, partially melted, with rich, oozy chocolate spilling from the center onto a white porcelain plate. Surrounding the cake are fresh raspberries and mint leaves, with a dusting of powdered sugar. The scene is accented by a delicate fork resting beside the plate, captured in soft natural light to accentuate the glossy texture of the chocolate, creating an inviting depth of field.

A tower of towering mini burgers made with pink beetroot buns, filled with black bean patties, vibrant green lettuce, and purple cabbage, skewered with colorful toothpicks. The burgers are served on a slate platter, surrounded by a colorful array of dipping sauces in tiny bowls, and warm steam rising, contrasting with a blurred, lively picnic setting behind.

A colorful fruit tart with a crisp pastry crust, filled with creamy vanilla custard and topped with an assortment of fresh berries, kiwi slices, and a glaze. The tart is displayed on a vintage cake stand, with a fork poised ready to serve. Surrounding it are scattered edible flowers and mint leaves for contrast, while the soft light highlights the glossy surface of the fruits, captured from a slight overhead angle to emphasize the variety of colors.

submitted by /u/Vegetable_Writer_443
[link] [comments]

Looking for volunteers for 4090 compute time

I'm cleaning up the CC12m dataset. I've gotten it down to 8.5 million by handpruning things, but it wasnt as effective as I'd hoped, so I'm falling back to VLM assistance, to get rid of 99% of the watermarks in it.

Trouble is, going through a subset of just 2 million, is going to take 5 days on my 4090.
It averages 5 images per second, or 18,000 an hour. Or, 400,000 in one day.

Would anyone like to step up and contribute some compute time?
You will, if you choose, get mentioned in the credit section of the resulting dataset.

There should be around 5 million images left after my run.
You are free to process any number of 1million image segments that you wish.

(you may even try it on a lesser card. Do note that the VLM takes at least 16gb vram to run though)

submitted by /u/lostinspaz
[link] [comments]

Possibility to train LoRA for Shuttle 3 Diffusion?

Hi,

(just a quick fyi, since it looks like many people are confusing the two, I am talking about SHUTTLE 3 DIFFUSION, not STABLE DIFFUSION 3)

I have been using OneTrainer to train character specific LoRA's for the Flux-Dev Base Model but the results are worse than what I have been getting from my SDXL LoRA's.

I wanted to try and set Shuttle 3 Diffusion as my base checkpoint to train my LoRA on as I had very good results with that model. I downloaded the Huggingface repo and used the same settings which are working for the Flux Dev base model but when I select the Shuttle 3 Diffusion hugging face repository, I get following error:

Traceback (most recent call last): File "G:\90_AI\StableDiffusion\OneTrainer\modules\ui\TrainUI.py", line 561, in __training_thread_function trainer.train() File "G:\90_AI\StableDiffusion\OneTrainer\modules\trainer\GenericTrainer.py", line 674, in train model_output_data = self.model_setup.predict(self.model, batch, self.config, train_progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\90_AI\StableDiffusion\OneTrainer\modules\modelSetup\BaseFluxSetup.py", line 475, in predict guidance=guidance.to(dtype=model.train_dtype.torch_dtype()), ^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'to'

I am not very good in Python but I can see that the error must be related to the guidance and since I guess Shuttle 3 Diffusion being a Schnell based model, its missing the guidance stuff.

Does anyone know a way around it? Or what is the best way to train a LoRA using a Schnell Checkpoint as Base model? Or am I doing something wrong? Is there a way?

Thanks a lot

submitted by /u/Reasonable_Net_6071
[link] [comments]

Open Sourcing Qwen2VL-Flux: Replacing Flux's Text Encoder with Qwen2VL-7B

Open Sourcing Qwen2VL-Flux: Replacing Flux's Text Encoder with Qwen2VL-7B

Hey StableDiffusion community! 👋

I'm excited to open source Qwen2vl-Flux, a powerful image generation model that combines the best of Stable Diffusion with Qwen2VL's vision-language understanding!

https://preview.redd.it/97mphvlhp63e1.png?width=2950&format=png&auto=webp&s=fa6d3a430d9b0058fd6b4b19a736770cc8d2a526

🔥 What makes it special?

We Replaced the t5 text encoder with Qwen2VL-7B, and give Flux the power of multi-modal generation ability

✨ Key Features:

## 🎨 Direct Image Variation: No Text, Pure Vision Transform your images while preserving their essence - no text prompts needed! Our model's pure vision understanding lets you explore creative variations seamlessly.

https://preview.redd.it/iljy0uwyo63e1.png?width=2892&format=png&auto=webp&s=2a80aaf846e4d92fcdcd1fe5729a7fc9721ebd93

https://preview.redd.it/t9vlvsrzo63e1.png?width=2906&format=png&auto=webp&s=43d95264eba2bf282308ed1ebabbae12f1282e32

## 🔮 Vision-Language Fusion: Reference Images + Text Magic Blend the power of visual references with text guidance! Use both images and text prompts to precisely control your generation and achieve exactly what you want.

https://preview.redd.it/zzw7ry82p63e1.png?width=2978&format=png&auto=webp&s=2169fa97a009e44f033073c495cfd81e7feac1dd

https://preview.redd.it/u0ydf9q2p63e1.png?width=2974&format=png&auto=webp&s=6c69c52232cbabc12cb31df27340999555ae4f52

## 🎯 GridDot Control: Precision at Your Fingertips Fine-grained control meets intuitive design! Our innovative GridDot panel lets you apply styles and modifications exactly where you want them.

https://preview.redd.it/skaegt54p63e1.png?width=2898&format=png&auto=webp&s=9d73184d5c7c04c9c17bdc096cfbe10313ea3fef

https://preview.redd.it/tx8zy0n4p63e1.png?width=2886&format=png&auto=webp&s=520fc690ba0a258ae720a7c2c733566fb81da2fa

https://preview.redd.it/2p670h55p63e1.png?width=2886&format=png&auto=webp&s=e321d4e31e634767845cf517f057cd6fd1c2bd07

https://preview.redd.it/3klt0dn5p63e1.png?width=2898&format=png&auto=webp&s=fbca55b2facf25e332fe86bd29e3acb9b4e80a85

## 🎛️ ControlNet Integration: Structure Meets Creativity Take control of your generations with built-in depth and line guidance! Perfect for maintaining structural integrity while exploring creative variations.

https://preview.redd.it/m3894048p63e1.png?width=2864&format=png&auto=webp&s=e86ee5192cbb9b08d32d86b3edc22bc4e6be48bc

https://preview.redd.it/6i8mipm8p63e1.png?width=2874&format=png&auto=webp&s=a42e8df428821b7900498510717ece360e3a9ab4

https://preview.redd.it/v9p2rg19p63e1.png?width=2878&format=png&auto=webp&s=ddecebde0335f3311d2df889e1a8ad14b12590b7

🔗 Links:

- Model: https://huggingface.co/Djrango/Qwen2vl-Flux

- Inference Code & Documentation: https://github.com/erwold/qwen2vl-flux

💡 Some cool things you can do:

  1. Generate variations while keeping the essence of your image
  2. Blend multiple images with intelligent style transfer
  3. Use text to guide the generation process
  4. Apply fine-grained style control with grid attention

I'd love to hear your thoughts and see what you create with it! Feel free to ask any questions - I'll be here in the comments.

submitted by /u/Weak_Trash9060
[link] [comments]
❌