Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

Open Sourcing TripoSG: High-Fidelity 3D Generation from Single Images using Large-Scale Flow Models (1.5B Model Released!)

Open Sourcing TripoSG: High-Fidelity 3D Generation from Single Images using Large-Scale Flow Models (1.5B Model Released!)

https://reddit.com/link/1jpl4tm/video/i3gm1ksldese1/player

Hey Reddit,

We're excited to share and open-source TripoSG, our new base model for generating high-fidelity 3D shapes directly from single images! Developed at Tripo, this marks a step forward in 3D generative AI quality.

Generating detailed 3D models automatically is tough, often lagging behind 2D image/video models due to data and complexity challenges. TripoSG tackles this using a few key ideas:

  1. Large-Scale Rectified Flow Transformer: We use a Rectified Flow (RF) based Transformer architecture. RF simplifies the learning process compared to diffusion, leading to stable training for large models.
  2. High-Quality VAE + SDFs: Our VAE uses Signed Distance Functions (SDFs) and novel geometric supervision (surface normals!) to capture much finer geometric detail than typical occupancy methods, avoiding common artifacts.
  3. Massive Data Curation: We built a pipeline to score, filter, fix, and process data (ending up with 2M high-quality samples), proving that curated data quality is critical for SOTA results.

What we're open-sourcing today:

  • Model: The TripoSG 1.5B parameter model (non-MoE variant, 2048 latent tokens).
  • Code: Inference code to run the model.
  • Demo: An interactive Gradio demo on Hugging Face Spaces.

Check it out here:

We believe this can unlock cool possibilities in gaming, VFX, design, robotics/embodied AI, and more.

We're keen to see what the community builds with TripoSG! Let us know your thoughts and feedback.

https://preview.redd.it/dfqagixndese1.png?width=2824&format=png&auto=webp&s=37ed594a23ede64bfa6fc179de43d7a978bbabb5

Cheers,
The Tripo Team

submitted by /u/pookiefoof
[link] [comments]

FaceUpDat Upscale Model Tip: Downscale the image before running it through the model

FaceUpDat Upscale Model Tip: Downscale the image before running it through the model

A lot of people know about the 4xFaceUpDat model. It's a fantastic model for upscaling any type of image where a person is the focal point (especially if your goal is photorealism). However, the caveat is that it's significantly slower (25s+) than other models like 4xUltrasharp, Siax, etc.

What I don't think people realize is that downscaling the image before processing it through the upscale model yields significantly better and much faster results (4-5 seconds). This puts it on par with the models above in terms of speed, and it runs circles around them in terms of quality.

I included a picture of the workflow setup. Optionally, you can add a restore face node before the downscale. This will help fix pupils, etc.

Note, you have to play with the downscale size depending on how big the face is in frame. For a closeup, you can set the downscale as low as 0.02 megapixels. However, as the face becomes smaller in frame, you'll have to increase it. As a general reference... Close:0.05 Medium:0.15 Far:0.30

Link to model: 4x 4xFaceUpDAT - OpenModelDB

submitted by /u/DBacon1052
[link] [comments]

XLSD model development status: alpha2

XLSD model development status: alpha2

base sd1.5, then xlsd alpha, then current work in progress

For those not familiar with my project: I am working on an SD1.5 base model, forcing it to use the SDXL VAE, and then training it to be much better than original. So the goal here is to provide high image quality gens, for a 8GB, or possibly even 4GB VRAM system.

The image above shows the same prompt, with no negative prompt or anything else, used on:

base sd1.5: then my earlier XLSD: and finally the current work in progress.

i'm cherry picking a little: results from the model dont always turn out like this. As with most things AI, it depends heavily on prompt!
Plus, both SD1.5, and the intermediate model, are capable of better results, if you play around with prompting some more.

But the above set of comparison pics is a fair, level playing field comparison, with same setting used on all, same seed -- everything.

The version of the XLsd model I used here, can be grabbbed from
https://huggingface.co/opendiffusionai/xlsd32-alpha2

Full training on it, if its like last time, it will be a million steps and 2 weeks away....but I wanted to post something about the status so far, to keep motivated.

Official update article at https://civitai.com/articles/13124

submitted by /u/lostinspaz
[link] [comments]

3D motion designer looking to include GenAi in my workflow.

I'm a 3d motion designer and looking to embrace what GenAi has to offer and how I can include it in my workflow.

Places I've integrared ai already:- Chatgpt for ideation Various text to image models for visualization/storyboarding Meshy ai for generating 3d models from sketches and images Rokoko's motion capture ai for animating humans Sometimes I use ai upscale for upscaling resolution of my videos

I feel like I can speed up my workflow a lot by involving GenAi in my rendering workflow. I'm looking for models which I can use to add elements/effects to final renders. Or if I render a video at low samples and resolution, a model to upscale it's resolution and add details. I saw an Instagram post where the person hasrscreen recorded their 3d viewport and use Krea ai to get final render like output.

I am new to this so if you include a tutorial or stepst on how to get started, that would help me a lot.

submitted by /u/taboopancake7
[link] [comments]

How to convert photo to statue version in 2025?

How to input a person photo, and then convert the person to materials such as chrome, jelly or melting candle or holograms while keeping the likeness and good image quality?

I wonder what is the better option in 2025, here is the old method I know: Forge SDXL inpaint, person mask extension, CN canny for the contour, CN ipadaptor to input the material.

The problems are: 1. The eyes are usually still human eyes

  1. The image quality become blurry or simply bad, way worse than using the same model and ask it to draw a statue of that material, it is like CN or inpaint will force degrade it

  2. The face looks like some roman statue instead of the person

  3. The material look like close up texture shot from e.g 10cm, but the person is full or upper half body shot, so probably 100cm away, and so the outcome doesn't look good.

  4. The input texture will not match the 3d depth/normal of the person so forcing the material with ipadaptor often make some parts look flattened, but using CN depth to workaround will just turn the output into human again

  5. The masked person border seldom look good

Thanks in advance

submitted by /u/yamfun
[link] [comments]

Stable Diffusion Quantization

In the context of quantizing Stable Diffusion v1.x for research — specifically applying weight-only quantization where Linear and Conv2d weights are saved as UINT8, and FP32 inference is performed via dequantization — what is the conventional practice for storing and managing the quantization parameters (scale and zero point)?

Is it more common to:

  1. Save the quantized weights and their scale/zero_point values in a separate .pth file? For example, save a separate quantized_info.pth file (no weight itself) to save the zero point and scale value and load zero_point and scale value from there.
  2. Redesign the model architecture and save a modified ckpt model with embedded quantization logic.
  3. Create custom wrapper classes for quantized layers and integrate scale/zero_point there?

I know that my question might look weird, but please understand that I am new to the field.

Please recommend any GitHub code or papers to look for to find conventional methods in the research field.

Thank you.

submitted by /u/Secret-Respond5199
[link] [comments]

NEW Video-to-Video WAN VACE WF + IF Video Prompt Node

NEW Video-to-Video WAN VACE WF + IF Video Prompt Node

I made a node that can reverse engineer Videos and also this workflow with the latest greatest in WAN tech VACE!. This model effectively replaces Stepfun 1.3 impainting and control in one go for me. Best of all, my base T2V lora for my OC works with it.

https://youtu.be/r3mDwPROC1k?si=_ETWq42UmK7eVo14

Video-to-Video WAN VACE WF + IF Video Prompt Node

submitted by /u/ImpactFrames-YT
[link] [comments]

Some thoughts after starting to work on automating Image generation and refinement using Gemini in ComfyUI

After looking at what 4o was capable of doing, it occurred to me that why not let AI control, generate, and refine image generation with a simple user request. In this age of vibe coding and agents, it was only natural to consider it I thought.

So, I decided to build a workflow using Gemini Pro 2.5 through API to handle from selecting the model, loras, controlnet, and everything else, let it analyze the input image and the user request to begin the process, and rework/ refine the output through a defined pass/fail criteria and a series of predefined routines to address different aspects of the image until it produces the image that matches the request made by the user.

I knew that it would require building a bunch of custom nodes but it involved more than just building custom nodes as it require necessary database for Gemini to rely on its decisions and actions in addition to building a decision/action/output tracking data necessary for each API call to Gemini could understand the context.

At the moment, I am still defining the database schema with Gemini 2.5 Pro as can be seen below:

summary_title: Resource Database Schema Design & Refinements

details:

- point: 1

title: General Database Strategy

items:

- Agreed to define YAML schemas for necessary resource types (Checkpoints, LoRAs, IPAdapters) and a global settings file.

- Key Decision: Databases will store model **filenames** (matching ComfyUI discovery via standard folders and `extra_model_paths.yaml`) rather than full paths. Custom nodes will output filenames to standard ComfyUI loader nodes.

- point: 2

title: Checkpoints Schema (`checkpoints.yaml`)

items:

- Finalized schema structure including: `filename`, `model_type` (Enum: SDXL, Pony, Illustrious), `style_tags` (List: for selection), `trigger_words` (List: optional, for prompt), `prediction_type` (Enum: epsilon, v_prediction), `recommended_samplers` (List), `recommended_scheduler` (String, optional), `recommended_cfg_scale` (Float/String, optional), `prompt_guidance` (Object: prefixes/style notes), `notes` (String).

- point: 3

title: Global Settings Schema (`global_settings.yaml`)

items:

- Established this new file for shared configurations.

- `supported_resolutions`: Contains a specific list of allowed `[Width, Height]` pairs. Workflow logic will find the closest aspect ratio match from this list and require pre-resizing/cropping of inputs.

- `default_prompt_guidance_by_type`: Defines default prompt structures (prefixes, style notes) for each `model_type` (SDXL, Pony, Illustrious), allowing overrides in `checkpoints.yaml`.

- `sampler_compatibility`: Optional reference map for `epsilon` vs. `v_prediction` compatible samplers (v-pred list to be fully populated later by user).

- point: 4

title: ControlNet Strategy

items:

- Primary Model: Plan to use a unified model ("xinsir controlnet union").

- Configuration: Agreed a separate `controlnets.yaml` is not needed. Configuration will rely on:

- `global_settings.yaml`: Adding `available_controlnet_types` (a limited list like Depth, Canny, Tile - *final list confirmation pending*) and `controlnet_preprocessors` (mapping types to default/optional preprocessor node names recognized by ComfyUI).

- Custom Selector Node: Acknowledged the likely need for a custom node to take Gemini's chosen type string (e.g., "Depth") and activate that mode in the "xinsir" model.

- Preprocessing Execution: Agreed to use **existing, individual preprocessor nodes** (from e.g., `ComfyUI_controlnet_aux`) combined with **dynamic routing** (switches/gates) based on the selected preprocessor name, rather than building a complex unified preprocessor node.

- Scope Limitation: Agreed to **limit** the `available_controlnet_types` to a small set known to be reliable with SDXL (e.g., Depth, Canny, Tile) to manage complexity.

- point: 5

title: IPAdapters Schema (`ipadapters.yaml`)

items:

- Identified the need to select specific IPAdapter models (e.g., general vs. face).

- Agreed a separate `ipadapters.yaml` file is necessary.

- Proposed schema including: `filename`, `model_type` (e.g., SDXL), `adapter_purpose` (List: tags like 'general', 'face_transfer'), `required_clip_vision_model` (String: e.g., 'ViT-H'), `notes` (String).

- point: 6

title: Immediate Next Step

items:

- Define the schema for **`loras.yaml`**.

While working on this, something occurred to me. It came about when I was explaining about the need to build certain custom nodes (e.g. each controlnet preprocessor has its own node and the user typically just add that corresponding node into the workflow but that simply didn't work in the AI automated workflow.) As I had to explain why this and that node needed to be built, I realized the whole issue with the ComfyUI; it was designed to be used by human manual construction which didn't fit with the direction I was trying to build.

The whole point of 4o is that, as the AI advances with more integrated capabilities, the need for a complicated workflow becomes unnecessary and obsolete. And this advancement will only accelerate in the coming days. So, all I am doing may just be a complete waste of time on my part. Still being a human, I am going to be irrational about it: since I started it, I would finish it regardless.

And all the buzz about agents and MCP looks to me like desperate attempts at relevance by the people about to become irrelevant.

submitted by /u/OldFisherman8
[link] [comments]

Has anyone trained experimental LORAs?

Has anyone trained experimental LORAs?

After a deeply introspective and emotional journey, I fine-tuned SDXL using old family album pictures of my childhood [60], a delicate process that brought my younger self into dialogue with the present, an experience that turned out to be far more impactful than I had anticipated.

This demo, for example, is Archaia's [touchdesigner] system intervened with the resulting LORA.

You can explore more of my work, tutorials, and systems via: https://linktr.ee/uisato

submitted by /u/Chuka444
[link] [comments]
❌