About

Welcome to my personal blog, where I share insights from my experiences in programming and AI. I’m currently building cutting-edge inference engines for large language models (LLMs) at Nvidia.

Previously, I worked on customizing and optimizing inference engines for diffusion models and LLMs at OctoAI, which was acquired by Nvidia.

I received my PhD in 2022 from the University of Washington, where I was advised by Luis Ceze and Dan Grossman. During my PhD, I worked on programming languages and compilers for both machine learning and hardware design. Two significant projects I worked on were VTA, a hardware and software stack for machine learning, and Reticle, A two-level intermediate representation (IR) that better captures and utilizes specialized hardware units compared to conventional hardware languages.

Prior to UW, I spent four years at the University of California, San Diego as a staff researcher working mainly on computer architecture. Specifically, I was working on designing highly specialized datacenters. At the time, special purpose datacenters for large workloads such as AI weren’t as popular as today. We named this project ASIC Clouds and argued that the only way to deploy these highly complex workloads worldwide is through datacenter specialization, from custom silicon chips all the way up to optimized server racks.

Before coming to the US, I received a Master’s degree in Electrical and Computer Engineering from the Technical University of Kaiserslautern and a Bachelor’s degree in Electrical Engineering from the University of the Andes.