Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

I have created an optimized setup for using AMD APUs (including Vega)

Hi everyone,

I have created a relatively optimized setup using a fork of Stable Diffusion from here:

likelovewant/stable-diffusion-webui-forge-on-amd: add support on amd in zluda

and

ROCM libraries from:

brknsoul/ROCmLibs: Prebuilt Windows ROCm Libs for gfx1031 and gfx1032

After a lot of experimenting, I have set Token Merging to 0.5 and used Stable Diffusion LCM models using the LCM Sampling Method and Schedule Type Karras at 4 steps. Depending on system load and usage or a 512 width x 640 length image, I was able to achieve as fast as 4.40s/it. On average it hovers around ~6s/it. on my Mini PC that has a Ryzen 2500u CPU (Vega 8), 32GB of DDR4 3200 RAM, and 1TB SSD. It may not be as fast as my gaming rig but uses less than 25w on full load.

Overall, I think this is pretty impressive for a little box that lacks a GPU. I should also note that I set the dedicated portion of graphics memory to 2GB in the UEFI/BIOS and used the ROCM 5.7 libraries and then added the ZLUDA libraries to it, as in the instructions.

Here is the webui-user.bat file configuration:

@echo off @REM cd /d %~dp0 @REM set PYTORCH_TUNABLEOP_ENABLED=1 @REM set PYTORCH_TUNABLEOP_VERBOSE=1 @REM set PYTORCH_TUNABLEOP_HIPBLASLT_ENABLED=0 set PYTHON= set GIT= set VENV_DIR= set SAFETENSORS_FAST_GPU=1 set COMMANDLINE_ARGS= --use-zluda --theme dark --listen --opt-sub-quad-attention --upcast-sampling --api --sub-quad-chunk-threshold 60 @REM Uncomment following code to reference an existing A1111 checkout. @REM set A1111_HOME=Your A1111 checkout dir @REM @REM set VENV_DIR=%A1111_HOME%/venv @REM set COMMANDLINE_ARGS=%COMMANDLINE_ARGS% ^ @REM --ckpt-dir %A1111_HOME%/models/Stable-diffusion ^ @REM --hypernetwork-dir %A1111_HOME%/models/hypernetworks ^ @REM --embeddings-dir %A1111_HOME%/embeddings ^ @REM --lora-dir %A1111_HOME%/models/Lora call webui.bat 

I should note, that you can remove or fiddle with --sub-quad-chunk-threshold 60; removal will cause stuttering if you are using your computer for other tasks while generating images, whereas 60 seems to prevent or reduce that issue. I hope this helps other people because this was such a fun project to setup and optimize.

submitted by /u/technofox01
[link] [comments]
❌