Hello! I'm Arthur Rasmusson, a specialist in AI, large training and inference clusters, GPUs, and IO virtualization. I've contributed to open-source projects for GPU virtualization at LibVF.IO and Open-IOV.org. In 2023, I joined Cohere's Model Efficiency team as a Machine Learning Engineer, where I worked on pioneering projects aimed at enhancing GPU cluster and AI inference software capabilities. In 2024, I joined Weka in their CTO Office as Principal AI Engineer, where I worked on improving efficiency in open source inference servers running on top of NeuralMesh by Weka.
I'm passionate about pushing the boundaries of GPU performance and AI inference. Feel free to contact me if you're interested in collaborating or learning more about my work.
Paged Attention over RDMA (PAoR)
Lessons Learned Scaling LLM Training and Inference with Direct Memory Access (DMA)
NVIDIA TensorRT-LLM Pull/3209 "feature: KV Cache GPUDirect Storage" (merged)
A modified of my original Python-Native-libCuFile code is used in cufile-python which is a dependency for LMCache pull/699 which added the GPUDirect Storage backend to LMCache for the vLLM ecosystem.
Open-IOV Community Calls – Regular community call series on open GPU virtualization (collaborative discussions and knowledge sharing).
World Summit AI Talk – In-Person Session at World Summit AI USA 2025 on boosting LLM inference throughput and reducing GPU bottlenecks.
GPU Driver Internals – GPU driver internals for virtualization.
OpenRM – Analysis of NVIDIA’s open-sourced GPU Resource Manager API and RM Core.
GPU Firmware – Documentation of GPU embedded firmware and virtualization support.
LIME Is Mediated Emulation – LibVF.IO feature enabling Windows applications via GPU virtualization.