





















































GoLab 2025 promises a rich and diverse program crafted to elevate the skills and insights of every attendee, from aspiring Gophers to seasoned experts. The agenda features a comprehensive array of:
>In-depth Workshops: Hands-on learning experiences for practical skill development.
>Technical Talks: Presentations on the latest advancements, best practices, and challenges in Go development.
>Lightning Talks: Quick, insightful interventions that spark new ideas and discussions.
Use code PKT15SP for a 15% discount on all ticket types.
Hi
Welcome to the eleventh issue of Deep Engineering
LLVM has long been celebrated for its modular frontend and optimizer. But for years, its backend—the part responsible for turning IR into machine code—remained monolithic, with instruction selectors like SelectionDAG and FastISel combining multiple responsibilities in a single, opaque pass. That’s now changing, as modular pipelines begin to reshape how LLVM handles instruction selection.
This issue’s delves into GlobalISel, the instruction selection framework designed to replace SelectionDAG and FastISel with a more modular, testable, and maintainable architecture. Built around a pipeline of distinct passes—IR translation, legalization, register bank selection, and instruction selection—GlobalISel improves backend portability, supports new Instruction Set Architectures (ISAs) like RISC-V, and makes it easier to debug and extend LLVM across targets.
To understand the design decisions behind GlobalISel—and the broader implications for backend engineering—we spoke with its architect, Quentin Colombet. A veteran LLVM contributor who joined Apple in 2012, Colombet has worked across CPU, GPU, and DSP backends and is also the code owner of LLVM’s register allocators. His perspective anchors our analysis of the trade-offs, debugging strategies, and real-world impact of modular code generation.
We also include an excerpt from LLVM Code Generation (Packt, 2025), Colombet’s new book. The selected chapter introduces TableGen, LLVM’s domain-specific language for modeling instructions and backend logic—a central tool in GlobalISel's extensibility, despite its sharp edges.
You can watch the complete interview and read the transcript here or scroll down to read the feature and book excerpt.
LLVM’s instruction selection was long dominated by SelectionDAG and FastISel, both monolithic frameworks that performed legalization, scheduling, and selection in a single pass per basic block. This design limited code reuse and optimization scope. GlobalISel was created to improve performance, granularity, and modularity. It operates on whole functions and uses Machine IR (MIR) directly, avoiding the need for a separate IR like SelectionDAG. This reduces overhead and improves compile times. While AArch64’s GlobalISel was initially slower than x86’s DAG selector at -O0, ongoing work has closed the gap; by LLVM 18, GlobalISel’s fast path was within 1.5× of FastISel.
Perhaps more importantly, GlobalISel breaks down instruction selection into independent passes. Rather than one big conversion, it has a pipeline: IR translation, legalization of unsupported types, register bank selection, and actual instruction selection. Quentin Colombet, LLVM’s GlobalISel architect, explains that in SelectionDAG
“all those steps happen in one monolithic pass…It’s a black box. But with GlobalISel, it’s a set of distinct optimization passes. Between those passes, you can insert your own target-specific or generic optimizations. That modularity gives you better flexibility, more opportunities for code reuse, and makes debugging and testing easier.”
GlobalISel is designed as a toolkit of reusable components. Targets can share the common Core Pipeline and customize only what they need. Even the fast-O0 and optimized-O2 selectors now use the same pipeline structure, just configured differently. This is a big change from the past, where ports often had to duplicate logic across FastISel and SelectionDAG. The modular design not only avoids code duplication, it establishes clear debug boundaries between stages. If a bug or suboptimal codegen is observed after instruction selection, a backend engineer can pinpoint whether it originated in the legalization phase, the register banking phase, or elsewhere, by inspecting the MIR after each pass. LLVM’s infrastructure supports dumping the MIR at these boundaries, making it far easier to diagnose issues than untangling a single mega-pass. As Colombet quips,
“Instruction selection actually involves multiple steps…From the start, [GlobalISel] has a much more modular design.”
The benefit is that each phase (e.g. illegal operation handling) can be tested and understood in isolation.
A clear motivation for this overhaul is target portability. LLVM today must cater to a wide variety of architectures – not just x86 and ARM, but RISC-V (with its ever-expanding extensions), GPUs, DSPs, FPGAs, and more. A monolithic selector makes it hard to support radically different ISAs without accumulating lots of target-specific complexity. GlobalISel’s design, by contrast, forces a clean separation of concerns that parallels how one thinks about a new target. There are four major target hooks in GlobalISel, corresponding to the key decisions a backend must make:
Each of these components is relatively self-contained. When bringing LLVM to a new architecture, developers can implement and test them one by one. Colombet advises keeping the big picture in mind:
“There’s no single right way to do instruction selection…because GlobalISel is modular, it’s easy to look at just one piece at a time. But if you’re not careful, those pieces may not fit together properly, or you may end up implementing functionality that doesn’t even make sense in the broader pipeline.”
In practice, the recommended approach is to first ensure you can lower a simple function end-to-end (even if using slow or naive methods), then refine each stage knowing it fits into the whole. This incremental path is much more feasible with a pipelined design than it was with SelectionDAG’s all-or-nothing pattern matching.
Real-world experience shows the value of this approach. RISC-V, for instance, has been rapidly adding standard and vendor-specific extensions. LLVM 20 and 21 have seen numerous RISC-V backend updates – from new bit-manipulation and crypto instructions to the ambitious V-vector extension. With GlobalISel, adding support for a new instruction set extension often means writing TableGen patterns or legality rules without touching the core algorithm. In early 2025, LLVM’s RISC-V backends even implemented vendor extensions like Xmipscmove and Xmipslsp for custom silicon.
This kind of targeted enhancement – adding a handful of operations in one part of the pipeline – is exactly what the modular design enables. It’s telling that as soon as the core GlobalISel framework matured, targets like ARM64 and AMDGPU quickly adopted it for their O0 paths, and efforts are underway to make it the default at higher optimizations.
New CPU architectures (for example, a prospective future CPU with unusual 128-bit scalar types) can be accommodated by plugging in a custom legalizer and reusing the rest of the pipeline. And non-traditional targets stand to gain as well. Apple’s own GPU architecture, which Colombet has worked on, was one early beneficiary of a GlobalISel-style approach – its unusual register and instruction structure could be cleanly modeled through custom RegisterBank and Legalizer logic, rather than fighting a general-purpose DAG matcher.
The result is that LLVM’s backend is better positioned to embrace emerging ISAs. As Colombet noted,
“The spec [for RISC-V] is still evolving, and people keep adding new extensions. As those extensions mature, they get added to the LLVM backend…If your processor supports a new, more efficient instruction, LLVM can now use it.”
Another aspect of portability is code reuse across targets. GlobalISel makes it possible to write generic legalization rules – for example, how to lower a 24-bit integer multiply using 32-bit operations – once in a target-independent manner. Targets can then opt into those rules or override them with a more optimal target-specific sequence. In SelectionDAG, some of that was possible, but GlobalISel is designed with such flexibility in mind from the start. This pays off when supporting families of architectures (say, many ARM variants or entirely new ones) – one can leverage the existing passes instead of reinventing the wheel. Even the register allocator and instruction scheduling phases (which come after instruction selection) can benefit from more uniform input thanks to GlobalISel producing consistent results across targets.
The switch to a modular backend isn’t just about adding features – it also improves the day-to-day experience of compiler engineers maintaining and debugging the code generator. With the old monolithic pipeline, a failure in codegen (like an incorrect assembly sequence or a compiler crash) often required reverse-engineering the entire selection process. By contrast, GlobalISel’s structured passes and the use of MIR make it far more tractable. Engineers can inspect the MIR after each stage (translation, legalize, register assignment, etc.) using LLVM’s debugging flags, to see where things start to diverge from expectations. For instance, if an out-of-range immediate wasn’t properly handled, the issue will be visible right after the Legalizer pass – before it ever propagates to final assembly. This clear separation of concerns reduces the cognitive load in debugging.
Colombet emphasizes testing and debugging as first-class considerations. He advocates using tools like llvm-extract and llvm-reduce to isolate the function or instruction that triggers a bug.
“Instead of debugging hundreds or thousands of lines, you end up with 10 lines that still reproduce the problem. That’s a huge productivity win,” Colombet says of minimizing test cases.
With GlobalISel, this strategy can be taken even further. Each pass in the pipeline can often be run on its own, enabling unit-test-like isolation. LLVM’s verifier checks invariants between passes, so errors tend to surface closer to their source.
This modular design yields tangible benefits:
TableGen, for its part, remains a double-edged sword. GlobalISel backends rely heavily on it to define matching rules, allowing reuse across targets. But the tooling is infamously brittle. As Colombet puts it:
“TableGen is kind of the hated child in LLVM… The syntax alone doesn't tell you the semantics… what your code means depends on how it’s used in the backend generator. And the error messages are often vague or inconsistent… everyone in the LLVM community kind of dislikes TableGen.”
Despite its flaws, TableGen is central to GlobalISel’s maintainability. It helps abstract instruction complexity into compact, reusable rules — a major win for modern ISAs.
Backend stability is also reinforced by fuzzing. Tools like llvm-isel-fuzzer generate random IR to stress-test instruction selectors, uncovering obscure failures that user test cases might miss. Colombet highlights their importance, especially in contexts like GPU drivers:
“In contexts like GPU drivers, a compiler crash could potentially be exploited, so hardening the backend against unexpected input is vital.”
While fuzzing doesn’t improve performance, it ensures each GlobalISel pass handles unexpected inputs robustly. Over time, this approach, combining modularity, reproducibility, automation, and stress-testing, has made LLVM’s backend infrastructure more resilient and easier to evolve.
LLVM’s move toward a modular backend reflects two major broader architectural shifts in computing: the rise of heterogeneous computing, which LLVM addresses through MLIR; and the growing use of machine learning to guide compiler decisions, exemplified by projects like MLGO. Both reflect a broader trend toward modularity, data-driven optimization, and architectural flexibility in modern compilers.
As heterogeneous systems become standard, combining CPUs, GPUs, and specialized accelerators, compilers must generate efficient code across dissimilar targets, and optimize across their boundaries. LLVM’s response is Multi-Level Intermediate Representation (MLIR) which we covered in Deep Engineering #9, a flexible, extensible IR framework that sits above traditional LLVM IR and enables high-level, domain-specific optimizations before lowering to machine code.
Colombet explains:
“With MLIR, you can model both your CPU and GPU modules within the same IR. That opens up optimization opportunities across different targets… you could move computations between devices more easily or apply cost models to decide what should run where.”
This enables compilers to consider cross-device trade-offs early in the pipeline — for example, determining whether a tensor operation should run on a GPU or CPU based on context or cost. MLIR achieves this via a layered, dialect-based design: each dialect captures a different level of abstraction (e.g., tensor algebra, affine loops, GPU kernels), which can be progressively lowered. Once it reaches LLVM IR, the standard code generation path, including GlobalISel, takes over.
MLIR’s integration with GlobalISel brings key advantages:
Although GlobalISel doesn’t directly manage CPU–GPU splitting, its modular design makes it easier to support unconventional targets cleanly, whether an Apple GPU or a DSP with custom arithmetic units. The combination of MLIR’s flexible front-end IR and GlobalISel’s extensible backend forms a coherent pipeline for future hardware.
A second major shift, still largely experimental — is the integration of machine learning inside the compiler itself. Research tools like Machine Learning Guided Optimization (MLGO) have shown promising results in replacing fixed heuristics with learned policies. In 2021, Trofin et al. used reinforcement learning to drive LLVM’s inliner, achieving ~5% code size reductions at -Oz with only ~1% additional compile time. The same framework was applied to register allocation, learning spill strategies that occasionally outperformed the default greedy allocator.
Colombet sees real potential here:
“Compilers are full of heuristics, and machine learning is great at discovering heuristics we never would’ve thought of.”
But he’s also clear about the practical challenges. First is the problem of feature extraction — the task of encoding program state into meaningful inputs for a model:
“To use an analogy: could you price a house just by counting the number of windows? There’s probably some correlation, but it’s not enough. Similarly, in something like register allocation, the features you use to train your model may not carry enough information.”
Even with good features, integration into the backend is nontrivial. LLVM’s register allocator and GlobalISel weren’t built with explicit “decision points” for ML models to hook into.
“If all you can do is tweak some knobs from the outside, you may not be able to make meaningful improvements… do we need to write our own instruction selector or register allocator to take full advantage of machine learning? I think the answer is yes – but we’ll see.”
The implication is that further modularization may be needed — isolating backend subproblems (like spill code insertion or instruction choice) into well-defined, pluggable interfaces. This would allow learned components to replace or guide specific decisions without requiring wholesale rewrites. Such a hybrid model — rule-based infrastructure augmented by ML at critical junctures — aligns with the trajectory GlobalISel already began: decoupling backend logic into testable, replaceable units.
Whether through MLIR’s IR layering or MLGO’s data-driven policies, the common trend is clear: LLVM’s backend is evolving toward composability, configurability, and adaptability by refactoring it into pieces that are easier to understand, reuse, and eventually learn. By decomposing code generation into well-defined passes, LLVM has made it easier to support new ISAs such as RISC-V, extend to targets like GPUs and DSPs, and integrate with tools like MLIR. The transition is still ongoing, and trade-offs remain—compile-time costs, tooling gaps, and the complexity of mixing TableGen with C++—but the payoff is clear: a backend that is more debuggable, more maintainable, and better prepared for architectural change. As machine learning and domain-specific IRs reshape the frontend, GlobalISel ensures that the backend can evolve in parallel. It is not just a rewrite; it is infrastructure for the next era of compilers.
If the architectural case for modular code generation in LLVM caught your attention, Quentin Colombet’s book, LLVM Code Generation offers the definitive deep dive. Colombet, the architect behind GlobalISel, takes readers inside the backend machinery of LLVM—from instruction selection and register allocation to debugging infrastructure and TableGen. The following excerpt—Chapter 6: TableGen – LLVM’s Swiss Army Knife for Modeling—introduces the declarative DSL that powers much of LLVM’s backend logic. It explains how TableGen structures instruction sets, eliminates boilerplate, and underpins the extensibility that modular backends depend on.
For every target, there are a lot of things to model in a compiler infrastructure to be able to do the following:
This list is not exhaustive, but the point IS that you need to model a lot of details of a target in a compiler infrastructure.
While it is possible to implement everything with your regular programming language, such as C++, you can find more productive ways to do so. In the LLVM infrastructure, this takes the form of a domain-specific language (DSL) called TableGen.
In this chapter, you will learn the TableGen syntax and how to work your way through the errors reported by the TableGen tooling. These skills will help you be more productive when working with this part of the LLVM ecosystem.
This chapter focuses on TableGen itself, not the uses of its output through the LLVM infrastructure. How the TableGen output is used is, as you will discover, TableGen-backend-specific and will be covered in the relevant chapters. Here, we will use one TableGen backend to get you accustomed to the structure of the TableGen output, starting you off on the right foot for the upcoming chapters.
LLVM Code Generation is for both beginners to LLVM and experienced LLVM developers. If you’re new to LLVM, it offers a clear, approachable guide to compiler backends, starting with foundational concepts. For seasoned LLVM developers, it dives into less-documented areas such as TableGen, MachineIR, and MC, enabling you to solve complex problems and expand your expertise.
Use codeLLVM20 for 20% off at packtpub.com.
DirectX Shader Compiler (DXC) – HLSL Compiler Based on LLVM/Clang
DXC is Microsoft’s official open-source compiler for High-Level Shader Language (HLSL), built on LLVM and Clang. It supports modern shader development for Direct3D 12 and Vulkan via SPIR-V, and is widely used in production graphics engines across the gaming and visual computing industries.
Highlights:
That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next.
Take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.
We’ll be back next week with more expert-led content.
Stay awesome,
Divya Anne Selvaraj
Editor-in-Chief, Deep Engineering
If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.