





















































Twilio Segment was purpose-built so that you don’t have to worry about your data. Forget the data chaos, dissolve the silos between teams and tools, and bring your data together with ease. So that you can spend more time innovating and less time integrating.
Hi
Welcome to the ninth issue of Deep Engineering.
As CPUs, GPUs, TPUs, and custom accelerators proliferate, compilers have become the thin yet critical layer that enables both abstraction and performance.
Our feature this week looks at Multi-Level Intermediate Representation (MLIR)—a compiler infrastructure that promises to unify optimization across wildly different domains. Born at Google and now adopted in projects like OpenXLA, LLVM Flang, NVIDIA’s CUDA Quantum, and even hardware DSLs like Chisel, MLIR offers a powerful foundation—but one that comes with real‑world friction: steep learning curves, ecosystem fragmentation, and legacy integration challenges. We unpack where MLIR delivers, where developers struggle with it, and what its future might mean for software architects.
Building on this theme, we’re also kicking off a new series on Mojo🔥, a programming language built entirely on MLIR. Written by Ivo Balbaert, Lector at CVO Antwerpen and author of The Way to Go and Packt introductions to Dart, Julia, Rust, and Red, Building with Mojo (Part 1): A Language Born for AI and Systems explores Mojo’s origins, its design goals, and its promise to unify Pythonic ergonomics with AI‑scale performance. Future parts will go deeper—covering Mojo’s tooling, metaprogramming, hardware abstraction, and its role in simplifying development pipelines that currently span Python, CUDA, and systems languages.
Read on for our take on MLIR’s trajectory—and then take your first step into Mojo, a language built for the next wave of AI and systems programming.
To use a cliched statement: hardware and software are becoming increasingly diverse and complex. And because modern workloads must run efficiently across this diversity and complexity in form of CPUs, GPUs, TPUs, and custom accelerators, compilers are now critical for both abstraction and performance. MLIRemerged to tame this complexity by enabling multiple layers of abstraction in one framework. MLIR has rapidly grown from a Google research project into an industry-wide technology. After being open-sourced and contributed to LLVM in 2019, MLIR’s modular design attracted a broad community.
Today MLIR underpins projects beyond Google’s TensorFlow. For example, it is the foundation of OpenXLA, an open compiler ecosystem co-developed by industry leaders (AMD, Apple, NVIDIA, etc.) to unify ML model deployment on diverse hardware. It’s also inside OpenAI’s Triton (for GPU kernel optimization) and even quantum computing compilers like NVIDIA’s CUDA Quantum (which defines a “Quake” IR on MLIR). In hardware design, the LLVM-affiliated experimental CIRCT project applies MLIR to circuit design and digital logic – so much so that a modern hardware DSL like Chisel moved its back-end to MLIR for richer analysis than standard RTL provides. MLIR’s multi-dialect flexibility has proven useful well beyond machine learning.
MLIR has also made inroads into a traditional compiled language. The new LLVM Fortran compiler (Flang) adopted MLIR to represent high-level Fortran IR (FIR), allowing more powerful optimizations than the old approach of jumping straight to LLVM IR. This MLIR-based Flang already achieves performance on par with classic Fortran compilers in many benchmarks (within a few percent of GCC’s Fortran). In fact, in 2024, AMD announced its next-gen Fortran compiler will be based on Flang/MLIR to target AMD GPUs and CPUs in a unified way.
However, MLIR’s adoption remains uneven across domains. For example, the LLVM C/C++ frontend (Clang) still uses its traditional monolithic pipeline. There is work in progress on a Clang IR dialect (“CIR”) to eventually bring C/C++ into MLIR, but Clang’s large legacy and stability requirements mean it won’t rewrite itself overnight.
MLIR is proving itself in new or specialized compilers (AI, HPC, DSLs) faster than it can retrofit into long-established general-purpose compilers. It is technically capable of being a general compiler framework, but the industry is still in transition.
Engineers may be enthusiastic about MLIR’s potential but also hit real pain points when evaluating it for production. Some key challenges include:
“Unfortunately, this explosion happened very early in MLIR’s design, and many design decisions in these dialects weren’t ideal for the evolving requirements of GenAI. For example, much of this early work was directed towards improving TensorFlow and building OpenXLA, so these dialects weren’t designed with first-class PyTorch and GenAI support.”
The result was that by the time generative AI and PyTorch use cases rose, the upstream MLIR dialects (like linalg or tensor) were not a perfect fit for new workloads. Companies ended up forking or inventing their own dialects (e.g., Google’s StableHLO vs. others), leading to ecosystem fracture. Lattner describes it as an “identity crisis.” Architecturally, it is difficult to determine which dialects to build on or standardize around. On the bright side, the MLIR project recently established a new governance structure and an MLIR area team to improve consistency, but it will take time to harmonize the dialect zoo.
But probably the most practical pain point is day-to-day developer experience. Debugging an MLIR-based compiler can be challenging – error messages often come from deep in the MLIR/LLVM machinery, and stepping through multi-dialect lowering is hard. So, there are challenges and tradeoffs in MLIR adoption at both the organizational and individual levels. But how have these trade-offs played out in the real world: who is successfully using MLIR today, and what did they learn from it?
Despite the hurdles, some teams have embraced MLIR and demonstrated tangible benefits. Let’s explore four use cases:
MLIR’s value multiplies in “greenfield” projects or where incumbents are hitting limits. New hardware with no legacy compiler, new languages (like Mojo, which we will talk about shortly) or AI serving stacks that need every ounce of performance – these are where MLIR has shined. The most effective MLIR deployments often abstract MLIR behind a higher-level interface. Flang hides MLIR behind normal Fortran semantics for end-users; SiFive’s users see an AI runtime API, not MLIR directly; even OpenXLA exposes a compiler API and uses MLIR internally. This suggests a potential best practice to ease adoption: shield developers from MLIR’s complexity via good APIs or DSLs, so they benefit from it without needing to write MLIR from scratch.
No discussion of MLIR in 2025 is complete without Mojo – a new programming language from Modular (a company founded by Chris Lattner and others) that has been making waves. Mojo is essentially a distilled essence of what MLIR can enable in software design. It’s billed as a superset of Python, combining Python’s ease with C++/Rust-like performance. Under the hood, Mojo is built entirely on MLIR – in fact, Mojo’s compiler is an MLIR pipeline specialized for the language. This design choice sheds light on what MLIR brings that classic LLVM IR could not:
Mojo’s success so far validates MLIR’s promised benefits. Within a few months of Mojo’s preview release, the Modular team itself used Mojo to write all the high-performance kernels in their AI engine. Like we mentioned earlier, Mojo was born because writing those kernels in pure MLIR was too slow – by creating a high-level language that compiles via MLIR, the Modular team combined productivity with performance.
Figure 1.1: “Mojo is built on top of MLIR, which makes it uniquely powerful when writing systems-level code for AI workloads.” (Source: Modular Blog)
Mojo’s compile-time cost is mitigated by MLIR’s design as well – parallelizing and caching in the compiler are easier with MLIR’s explicit pass pipeline, so Mojo can afford to do more heavy analysis without long build times. The language is still young, but it shines a promising light on what’s possible.
(As an aside for readers, Mojo’s use of MLIR is a deep topic on its own. In Building with Mojo (Part 1): A Language Born for AI and Systems, Ivo introduces Mojo’s origins, design goals, and its promise to unify Pythonic ergonomics with AI-scale performance—but only at a high level. Later parts of the series will go deeper into Mojo’s internals, including how MLIR enables compile-time metaprogramming, hardware-specific optimizations, and seamless Python interoperability. To receive these articles in your inbox as soon as they are published, subscribe here)
MLIR’s trajectory over the past year shows cautious but real momentum toward broader adoption. The community has addressed key pain points like dialect fragmentation with new governance and curated core dialects, while new tooling—such as the Transform dialect presented at CGO 2025—lowers the barrier for tuning compiler optimizations. Proposed additions like a WebAssembly dialect and Clang CIR integration suggest MLIR is expanding beyond its “ML-only” roots into systems compilers and web domains. Industry trends reinforce its relevance: heterogeneous compute continues to grow, and MLIR already underpins projects like OpenXLA with backing from NVIDIA, AMD, Intel, Apple, and AWS. Still, its success depends on balancing generality with usability and proving its value beyond Google and Modular; competing approaches like SPIR‑V and TVM remain viable alternatives. Yet with advocates like Chris Lattner, ongoing research from firms like Meta and DeepMind, and AMD and Fujitsu adopting MLIR for HPC compilers, it’s likely to become a cornerstone of future compiler infrastructure if it maintains this pace.
IREE – MLIR-Based Compiler & Runtime
Intermediate Representation Execution Environment (IREE) is an open-source end-to-end compiler and runtime for machine learning models, built on MLIR. In the OpenXLA ecosystem, IREE serves as a modular MLIR-based compiler toolchain that can lower models from all major frameworks (TensorFlow, PyTorch, JAX, ONNX, etc.) into highly optimized executables for a wide variety of hardware targets.
Highlights:
That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.
Take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.
We’ll be back next week with more expert-led content.
Stay awesome,
Divya Anne Selvaraj
Editor-in-Chief, Deep Engineering
If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.