Toolboxes for GenAI on AMD Ryzen AI MAX+
Containerized environments for LLMs, Image Generation, and Fine-tuning.
In August 2025, I got my hands on a Strix Halo machine. I needed to run local inference for some
Cyber Security work where Cloud LLMs were not an option.
I quickly realized the software ecosystem wasn't ready. Stuff wasn't working. So I started
digging, learning, and fixing things.
I shared my findings in a video. People found it
useful, so I've continued to maintain these toolboxes to help others unlock the potential of
their hardware.
Thanks to support from the Strix Halo Home Lab community, Framework, and AMD, I've continued to maintain these "Toolboxes" to help others reproduce this setup and run AI workloads on Strix Halo hardware.
Donato Capitella
Software Engineer and Ethical Hacker. I enjoy understanding systems by breaking them down and documenting the process.
"Strix Halo" (Ryzen AI MAX+) is AMD's high-performance mobile processor platform. Its key feature for AI workloads is Unified Memory, allowing the iGPU to access up to 128GB of system RAM, significantly increasing the model size capacity compared to traditional consumer GPUs.
These are containerized environments built on Toolbx (Docker/Podman). This approach allows you to easily get the specific runtime needed for Strix Halo, keep the host system clean, and instantly switch between different ROCm or software versions without dependency conflicts.
Setup for LLM inference. Supports clustering via RDMA and Vulkan/ROCm backends.
View Repo ->Environment for Image & Video generation. Validated for Flux, Wan 2.2, HunyuanVideo, and Qwen.
View Repo ->Serving server setup. Includes custom RCCL patches for high-speed clustering.
View Repo ->Training environment. QLoRA and Full Fine-Tuning support for Gemma 3, Qwen 3, and generic models.
View Repo ->This is the configuration I use on my Framework Desktop to maintain and benchmark all toolboxes.
My Rig - Sent to me by Framework
Many guides suggest statically partitioning memory between the CPU and iGPU (e.g., locking 32GB for video). However, this is a waste. With Unified Dynamic Memory, we can let the GPU access nearly all system RAM (up to ~124GB) on demand, while keeping the flexibility to use it for the CPU when needed.
| Parameter | How it enables Unified Memory |
|---|---|
| iommu=pt | Pass-Through Mode: Bypass IOMMU translation for the GPU, reducing overhead when accessing the user-space pages. |
| amdgpu.gttsize=126976 | GTT Size (Graphics Translation Table): Explicitly sets the maximum unified memory addressable by the GPU to ~124GB (126976 MB), overriding default driver limits. |
| ttm.pages_limit=32505856 | Pinned Memory Limit: Allows the TTM (Translation Table Manager) to pin up to ~124GB of pages in high-speed system RAM, ensuring the GPU has direct access without swapping. |
Following the documentation here, we set a performance profile to get max performance.
Connect with other Strix Halo owners, share benchmarks, and get help.