Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Deep Engineering

55 Articles
Divya Anne Selvaraj
21 Aug 2025
Save for later

Deep Engineering #14: Mihalis Tsoukalos on Go’s Concurrency Discipline

Divya Anne Selvaraj
21 Aug 2025
Contexts, cancellations, and bounded work—plus Chapter 8 from Mastering Go#14Mihalis Tsoukalos on Go’s Concurrency DisciplineContexts, cancellations, and bounded work—plus Chapter 8 from Mastering GoMastering Memory in C++ with Patrice RoyMore than 70% of severe vulnerabilities come from memory safety errors. This masterclass will show you how to write C++ that isn’t part of that problem.Join Patrice Roy — ISO C++ Standards Committee member and author of C++ Memory Management — for a 2-day live masterclass on writing safe, efficient, and robust C++ code.What you’ll learn (hands-on):✔Smart pointers and RAII for predictable ownership✔Exception-safe, high-performance techniques✔Debugging leaks, alignment, and ownership issues✔Building memory-safe code that performs under pressurePatrice has taught C++ since 1998, trained professional programmers for over 20 years, and speaks regularly at CppCon. This masterclass distills his experience into practical skills you can apply immediately in production.Use code DEEPENG30 for 30% off.Register NowHi Welcome to the fourteenth issue of Deep Engineering.Go 1.25 has arrived with container-aware GOMAXPROCS defaults—automatically sizing parallelism to a container’s CPU limit and adjusting as limits change—so services avoid kernel throttling and the tail-latency spikes that follow. This issue applies the same premise at the code level—structure concurrency to real capacity with request-scoped contexts, explicit deadlines, and bounded worker pools—so behavior under load is predictable and observable.For today’s issue we spoke with Mihalis Tsoukalos, a UNIX systems engineer and prolific author of Go Systems Programming and Mastering Go (4th ed.). He holds a BSc (University of Patras) and an MSc (UCL), has written for Linux Journal, USENIX ;login:, and C/C++ Users Journal, and brings deep systems, time-series, and database expertise.We open with a feature on request-scoped concurrency, cancellations, and explicit limits—then move straight into the complete Chapter 8: Go Concurrency from Mastering Go. You can watch the interview and read the complete transcript here, or scroll down for today’s feature.📢 Important: Deep Engineering is Moving to SubstackIn two weeks, we’ll be shifting Deep Engineering fully to Substack. From that point forward, all issues will come from [email protected] ensure uninterrupted delivery, please whitelist this address in your mail client. No other action is required.You’ll continue receiving the newsletter on the same weekly cadence, and on Substack you’ll also gain more granular control over preferences if you wish to adjust them later.We’ll send a reminder in next week’s issue as the cutover approaches.Sign Up |AdvertiseStructured Concurrency in Go for Real-World Reliability with Mihalis TsoukalosGo’s structured concurrency model represents a set of disciplined practices for building robust systems. By tying goroutines to request scopes with context, deadlines, and limits, engineers can prevent leaks and overload, achieving more predictable, observable behavior under production load.Why Structured Concurrency Matters in Go (and What It Prevents)In production Go services, concurrency must be deliberate. Structured concurrency means organizing goroutines with clear lifecycles—so no worker is left running once its purpose is served. This prevents common failure modes like memory leaks, blocked routines, and resource exhaustion from runaway goroutines.As Mihalis Tsoukalos emphasizes, concurrency in Go “is not just a feature—it’s a design principle. It influences how your software scales, how efficiently it uses resources, and how it behaves under pressure”.Unstructured use of goroutines (e.g. spawning on every request without coordination) can lead to unpredictable latencies and crashes. In contrast, a structured approach ensures that when a client drops a request or a deadline passes, all related goroutines cancel promptly. The result is a system that degrades gracefully instead of accumulating ghosts and locked resources.Request-Scoped Concurrency with Context and CancellationGo’s context.Context is the cornerstone of request-scoped concurrency. Every inbound request or task should carry a Context that child goroutines inherit, allowing coordinated cancellation and timeouts. By convention, functions accept a ctx parameter and propagate it downward.As Tsoukalos advises, “always be explicit about goroutine ownership and lifecycle” by using contexts for cancellation—this way, goroutines “don’t hang around longer than they should, avoiding memory leaks and unpredictable behavior”.A common pattern is to spawn multiple sub-tasks and cancel all of them if one fails or the client disconnects. The golang.org/x/sync/errgroup package provides a convenient way to manage such groups of goroutines with a shared context. Using errgroup.WithContext, each goroutine returns an error, and the first failure cancels the group’s context, immediately signaling siblings to stop. Even without this package, you can achieve similar structure with sync.WaitGroup and manual cancellation signals, but errgroup streamlines error propagation.The following is a snippet from Mastering Go, 4th Ed. demonstrating context cancellation in action. A goroutine is launched to simulate some work and then cancel the context, while the main logic uses a select to either handle normal results or react to cancellation:c1, cancel := context.WithCancel(context.Background())defer cancel()go func() { time.Sleep(4 * time.Second) cancel()}()select {case <-c1.Done(): fmt.Println("Done:", c1.Err()) returncase r := <-time.After(3 * time.Second): fmt.Println("result:", r)}Listing: Using context.WithCancel to tie a goroutine’s work to a cancelable context.In this example, if the work doesn’t finish before the context is canceled (or a 3-second timeout elapses), the Done channel is closed and the function prints the error (e.g. “context canceled”). In real services, you would derive the context from an incoming request (HTTP, RPC, etc.), use context.WithTimeout or WithDeadline to bound its lifetime, and pass it into every database call or external API request. All goroutines spawned to handle that request listen for ctx.Done() and exit when cancellation or deadline occurs. This structured approach prevents goroutine leaks – every launched goroutine is tied to a request context that will be canceled on completion or error. It also centralizes error handling: the context’s error (such as context.DeadlineExceeded) signals a timeout, which can be logged or reported upstream in a consistent way.Bounding Concurrency and Backpressure with Semaphores and ChannelsAnother key to structured concurrency is bounded work. Go’s goroutines are cheap, but they aren’t free – unchecked concurrency can exhaust memory or overwhelm databases.Tsoukalos warns that just because goroutines are lightweight, you shouldn’t “spin up thousands of them without thinking. If you’re processing a large number of tasks or I/O operations, use worker pools, semaphores, or bounded channels to keep things under control”.In practice, this means limiting the number of concurrent goroutines doing work for a given subsystem. By applying backpressure (through limited buffer channels or tokens), you avoid queueing infinite work and crashing under load.One simple pattern is a worker pool: maintain a fixed pool of goroutines that pull tasks from a channel.This provides controlled concurrency — “you’re not overloading the system with thousands of goroutines, and you stay within limits like memory, file descriptors, or database connections,” as Tsoukalos notes.The system’s behavior under load becomes predictable because you’ve put an upper bound on parallel work.Another powerful primitive is a weighted semaphore. The Go team provides golang.org/x/sync/semaphore for this purpose. You can create a semaphore with weight equal to the maximum number of workers, then acquire a weight of 1 for each job. If all weights are in use, further acquisitions block – naturally throttling the input. The following code (from the Mastering Go chapter) illustrates a semaphore guarding a section of code that launches goroutines:Workers := 4sem := semaphore.NewWeighted(int64(Workers))results := make([]int, nJobs)ctx := context.TODO()for i := range results { if err := sem.Acquire(ctx, 1); err != nil { fmt.Println("Cannot acquire semaphore:", err) break } go func(i int) { defer sem.Release(1) results[i] = worker(i) // do work and store result }(i)}// Block until all workers have released their permits:_ = sem.Acquire(ctx, int64(Workers))Listing: Bounded parallelism with a semaphore limits workers to Workers at a time.In this pattern, no more than 4 goroutines will be active at once because any additional Acquire(1) calls must wait until a permit is released. The final Acquire of all permits is a clever way to wait for all workers to finish (it blocks until it can acquire Workers permits, i.e. until all have been released). Bounded channels can achieve a similar effect: for example, a buffered channel of size N can act as a throttle by blocking sends when N tasks are in flight. Pipelines, a series of stages connected by channels, also inherently provide backpressure – if a downstream stage is slow or a channel is full, upstream goroutines will pause on send, preventing unlimited buildup. The goal in all cases is the same: limit concurrency to what your system resources can handle. Recent runtime changes in Go 1.25 even adjust GOMAXPROCS automatically to the container’s CPU quota, preventing the scheduler from running too many threads on limited CPUgo.dev. By design, structured concurrency forces us to think in terms of these limits, so that a surge of traffic translates to graceful degradation (e.g. queued requests or slower processing) rather than a self-inflicted denial of service.Observability and Graceful Shutdown in PracticeStructured concurrency not only makes systems more reliable during normal operation, but also improves their observability and shutdown behavior. With context-based cancellation, timeouts and cancellations surface explicitly as errors that can be logged and counted, rather than lurking silently. For instance, if a database call times out, Go returns a context.DeadlineExceeded error that you can handle – perhaps logging a warning with the operation name and duration.These error signals let you differentiate between a real failure (bug or unavailable service) and an expected timeout. In metrics, you might track the rate of context cancellations or deadlines exceeded to detect slowness in dependencies. Similarly, because every goroutine is tied to a context, you can instrument how many goroutines are active per request or service. Go’s pprof and runtime metrics make it easy to measure goroutine count; if it keeps rising over time, that’s a red flag for leaks or blocked goroutines. By structuring concurrency, any unexpected goroutine buildup is easier to trace to a particular code path, since goroutines aren’t spawned ad-hoc without accountability.Shutdown sequences also benefit. In a well-structured Go program, a SIGINT (Ctrl+C) or termination signal can trigger a cancellation of a root context, which cascades to cancel all in-flight work. Each goroutine will observe ctx.Done() and exit, typically logging a final message. Using deadlines on background work ensures that even stuck operations won’t delay shutdown indefinitely – they’ll timeout and return. The result is a clean teardown: no hanging goroutines or resource leaks after the program exits.As Tsoukalos puts it, “goroutine supervision is critical. You need to track what your goroutines are doing, make sure they shut down cleanly, and prevent them from sitting idle in the background”.This discipline means actively monitoring and controlling goroutines’ lifecycle in code and via observability tools.Production Go teams often implement heartbeat logs or metrics for long-lived goroutines to confirm they are healthy, and use context to ensure any that get stuck can be cancelled. In distributed tracing systems, contexts carry trace IDs and cancellation signals across service boundaries, so a canceled request’s trace clearly shows which operations were aborted. All of this contributes to a system where concurrency is not a source of mystery bugs – instead, cancellations, timeouts, and errors become first-class, visible events that operators can understand and act upon.7-Point Structured Concurrency Checklist for ProductionContext Everywhere: Pass a context.Context to every goroutine and function handling a request. Derive timeouts or deadlines to avoid infinite waits.Always Cancel (Cleanup): Use defer cancel() after context.WithTimeout/Cancel so resources are freed promptly. Never leave a context dangling.Bound the Goroutines: Limit concurrency with worker pools, semaphores, or bounded channels – don’t spawn unbounded goroutines on unbounded work.Propagate Failures: Use errgroup or sync.WaitGroup + channels to wait for goroutines and propagate errors. If one task fails, cancel the rest to fail fast.Graceful Shutdown Hooks: On service shutdown, signal cancellation (e.g. cancel a root context or close a quit channel) and wait for goroutines to finish or timeout.Avoid Blocking Pitfalls: Use buffered channels for high-volume pipelines and select with a default or timeout case in critical loops to prevent global stalls.Instrument & Observe: Monitor goroutine counts, queue lengths, and context errors in logs/traces. A spike in “context canceled” or steadily rising goroutines means your concurrency is getting out of control.In Go, by consciously scoping and bounding every goroutine – and embracing cancellation as a normal outcome – engineers can build services that stay robust and transparent under stress. The effort to impose this structure pays off with systems that fail gracefully instead of unpredictably, proving that well-managed concurrency is a prerequisite for reliable production Go.🧠Expert InsightThe complete “Chapter 8: Go Concurrency” from Mastering Go, 4th ed. by Mihalis TsoukalosIn this comprehensive chapter, Tsoukalos walks you through the production primitives you’ll actually use: goroutines owned by a Context, channels when appropriate (and when to prefer mutex/atomics), pipelines and fan-in/out, WaitGroup discipline, and a semaphore-backed pool that keeps concurrency explicitly bounded.The key component of the Go concurrency model is the goroutine, which is theminimum executable entityin Go. To create a new goroutine, we must use thegokeyword followed by a function call or an anonymous function—the two methods are equivalent. For a goroutine or a function to terminate the entire Go application, it should callos.Exit()instead ofreturn. However, most of the time, we exit a goroutine or a function usingreturn because...Read the Complete Chapter🛠️ Tool of the WeekRay – Open-Source, High-Performance Distributed Computing FrameworkRay is an open-source distributed execution engine that enables developers to scale applications from a single machine to a cluster with minimal code changes.Highlights:Easy Parallelization: Ray offers a simple API (e.g. the @ray.remote decorator) to turn ordinary functions into distributed tasks, running across cores or nodes with minimal code modifications and hiding the complexity of threads or networking behind the scenes.Scalable & Heterogeneous: It supports fine-grained and coarse-grained parallelism, efficiently executing many concurrent tasks on a cluster.Resilient Execution: Built-in fault tolerance means Ray automatically retries failed tasks and can persist state (checkpointing), so even long-running jobs recover from node failures without manual intervention.Battle-Tested at Scale: It’s been deployed on clusters with thousands of nodes (over 1 million CPU cores) for demanding applications – demonstrating robust operation at extreme scale.Learn more about Ray📎Tech BriefsGo 1.25 is released: The version update brings improvements across tools, runtime, compiler, linker, and the standard library, along with opt-in experimental features like a new garbage collector and an updated encoding/json/v2 package.Container-aware GOMAXPROCS: Go 1.25 introduces container-aware defaults for GOMAXPROCS, automatically aligning parallelism with container CPU limits to reduce throttling, improve tail latency, and make Go more production-ready out of the box.Combine Or-Channel Patterns Like a Go Expert: Advanced Go Concurrency by Archit Agarwal: Explains the “or-channel” concurrency pattern in Go, showing how to combine multiple done channels into one so that execution continues as soon as any goroutine finishes, and demonstrates a recursive implementation that scales elegantly to handle any number of channels.Concurrency | Learn Go with tests by Chris James: Shows you how to speed up a slow URL-checking function in Go by introducing concurrency: using goroutines to check multiple websites in parallel, and channels to safely coordinate results without race conditions, making the function around 100× faster while preserving correctness through tests and benchmarks.Singleflight in Go : A Clean Solution to Cache Stampede by Dilan Dashintha: Explains how Go’s singleflight package addresses the cache stampede problem by ensuring that only one request for a given key is in-flight at any time, while other concurrent requests wait and reuse the result.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next. Do take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief,Deep EngineeringIf your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
14 Aug 2025
Save for later

Deep Engineering #13: Designing Staleness SLOs for Dynamo-Style KV Stores with Archit Agarwal

Divya Anne Selvaraj
14 Aug 2025
Make “eventual” measurable: N/R/W tuning, staleness SLIs, instrumentation, and repair budgets.#13Designing Staleness SLOs for Dynamo-Style KV Stores with Archit AgarwalMake “eventual” measurable: N/R/W tuning, staleness SLIs, instrumentation, and repair budgets.Staying sharp in .NET takes more than just keeping up with release notes. You need practical tips, battle-tested patterns, and scalable solutions from experts who’ve been there. That’s exactly what you’ll find in .NETPro, Packt’s new newsletter, with a free eBook waiting for you as a welcome bonus when you sign up.Join .NETPro — It’s FreeHi Welcome to the thirteenth issue of Deep Engineering.Eventual consistency is a fact of life in distributed key‑value stores. The operational task is to bound staleness and make it observable.This issue features a guest article by Archit Agarwal that builds a Dynamo‑style store in Go from first principles—consistent hashing, replication, quorums, vector clocks, gossip, and Merkle trees—without hiding the details. Building on it, our feature turns those primitives into a staleness SLO. We cover selecting N/R/W, defining SLIs (stale‑read rate, staleness age, convergence time), sizing anti‑entropy and hinted‑handoff budgets, and placing instrumentation on the read and write paths.Agarwal is a Principal Member of Technical Staff at Oracle, where he engineers ultra‑low‑latency authorization services in Go. He also writes The Weekly Golang Journal, focused on turning advanced system design into practical tools, with a consistent emphasis on performance and operational efficiency.You can start with Agarwal’s walkthrough for the mechanics, then read today’s feature for SLIs/SLOs, alert thresholds, and more.Become a C++ Memory Expert and Learn Live with Patrice Roy40% off for a Limited TimeUse code PRELAUNCH40 at checkout to get a 40% discount - our lowest-ever, available only until August 18th, when we officially launch.Register NowSign Up |AdvertiseDesigning Staleness SLOs for Dynamo‑Style KV Stores with Archit AgarwalIn an eventually consistent, Dynamo-style key-value store, not all reads immediately reflect the latest writes – some reads may return stale data until replication catches up. Staleness is the window during which a read sees an older value than the freshest replica. Defining a Service Level Objective (SLO) for staleness makes this behavior explicit and measurable, so teams can control how “eventual” the consistency is in operational terms.Control surfaces for stalenessIn Dynamo-style systems, three parameters shape staleness behavior: N, R, and W. N is the replication factor (number of replicas per key). R and W are the read and write quorum counts – the minimum replicas that must respond to consider a read or write successful. These define the overlap between readers and writers. If you choose quorums such that R + W > N, every read set intersects every write set by at least one replica, guaranteeing that a read will include at least one up-to-date copy (no stale values) under normal conditions.Tuning R and W affects latency and availability. A larger R means each read waits for more replicas, reducing the chance of stale data but increasing read latency (and failing if fewer than R nodes are available). A larger W similarly slows writes (and risks write unavailability if W nodes aren’t up) but ensures more replicas carry the latest data on write acknowledge. The replication factor N provides fault tolerance and influences quorum choices: a higher N lets the system survive more failures, but if R and W aren’t adjusted, it can also increase propagation delay (more replicas to update) and the quorum sizes needed for consistency. Under network partitions, a Dynamo-style store can choose to continue with a partial quorum (favoring availability at the cost of serving stale data) or pause some operations to preserve consistency – R, W, N settings determine these trade-offs on the CAP spectrum (for example, a low R/W will serve data in a partition but possibly outdated, whereas high R/W might block reads/writes during a partition to avoid inconsistency).Read path vs. write path: On writes, a coordinating node sends the update to all N replicas but considers the write successful once W replicas have acknowledged it. Only those W (or more) nodes are guaranteed to have the new version when the client gets a “success”. The remaining replicas will receive the update asynchronously (hinted handoff or background sync).Here is a simplified Go snippet enforcing a write quorum:// Write quorum acknowledgement checkif ackCount >= W { fmt.Println("Write successful")} else { fmt.Println("Write failed: insufficient replicas")}This check ensures the write isn’t confirmed to the client until at least W replicas have persisted it. Operational impact: we can instrument this point to count how often writes succeed versus fail quorum. A high failure rate (ackCount < W) would hurt availability, whereas a success with only W acknowledgments means N - W replicas are still lagging – a window where stale reads are possible. On reads, the coordinator contacts R replicas (often via a digest query). It waits for R responses and, typically, returns the latest version among those responses to the client (often using timestamps or vector clocks to identify freshness). If R < N, the coordinator might not see some newer replica that wasn’t queried, so it’s possible the client got a slightly stale value. That’s why ensuring quorum overlap (R+W > N) or using R = N mitigates staleness. Still, even with quorums, if a write just succeeded with W acks, there may be N−W replicas not updated yet; a subsequent read that happens at a lower consistency level or before repair could encounter an older copy. In summary, R and W are the dials: crank them up for fresher reads (at the cost of latency/availability), or dial them down for speed and uptime (accepting a higher stale-read window).What to Measure: Staleness SLIs and SLOTo manage staleness, we define Service Level Indicators (SLIs) that capture how stale the data is, and set SLO targets for them. Key metrics include:Stale-read rate: the fraction of reads that return data older than the newest replica’s value at the moment of read. In practice, a “stale read” can be flagged if a read request did not fetch the most up-to-date version that exists in the system. (Detecting this may require the coordinator to compare all R responses or consult a freshness timestamp from a designated primary.) This rate should ideally trend toward 0% once the system has quiesced after writes. It directly indicates how often users see outdated data.Staleness age: the time difference between the value’s timestamp (or version) that a read returned and the latest write timestamp for that item at read time. This measures how old the data is.Convergence time: how long it takes for a write to propagate to all N replicas. Even after a write is acknowledged (at W nodes), the remaining replicas might get the update later (through gossip or anti-entropy). Convergence time can be measured by tracking the time from write commit to the time when the last replica has applied it. We should aim to keep convergence time low (and predictable) so that the window for stale reads (N−W replicas catching up) is bounded.Repair backlog: the amount of data needing anti-entropy repair. This can be measured in number of keys or bytes that are out-of-sync across replicas. For example, if using Merkle trees for anti-entropy, we might track how many tree partitions differ between replicas, or how many hints are queued waiting to be delivered. In Cassandra, metrics like Hints_created_per_node reflect the number of pending hinted handoff messages per target node. A growing repair backlog indicates the system is accumulating inconsistency (replicas lagging behind) – which threatens the staleness SLO if not addressed. Operators should budget how much lag is acceptable and tune repair processes to keep this backlog small.Hinted-handoff queue depth: if the system uses hinted handoff (buffering writes destined for a temporarily down node), this is a specific backlog metric tracking how many hints are stored and waiting. A large queue of hints means one or more replicas have been down or slow for a while and have many writes to catch up on. This directly correlates with staleness: those downed replicas might serve significantly stale data if read (or will cause consistency repair load when they recover). Monitoring the hints queue (count and age of oldest hint) helps ensure a down node doesn’t silently violate staleness objectives by falling too far behind.Vector clock conflict rate: the rate at which concurrent updates are detected, leading to divergent versions (siblings) that need reconciliation. Dynamo-style systems often use vector clocks to detect when two writes happened without knowledge of each other (e.g. during a network partition or offline write merges). Each unique conflict means a client might read two or more versions for the same key – an extreme form of staleness where causal order is unclear. We measure the proportion of operations (or writes) that result in conflict reconciliation. A higher conflict rate suggests the system is frequently writing in partitions or without coordination, requiring merges and possibly exposing clients to multi-version data. Lowering conflict rate (via stronger quorums or a “last write wins” policy) usually reduces stale anomalies at the cost of losing some update history. In Agarwal’s Dynamo-Go implementation, vector clocks are represented as:// Vector clock representationtype VectorClock map[string]intEach node’s counter in this map increments on local updates. When a write is replicated, the vector clocks are merged. If a read finds two concurrent VectorClock states that neither dominates (i.e., different nodes each advanced their own counter), it indicates a conflict. We could emit a metric at that point (e.g. conflict_versions_total++). Tracking this helps quantify how often clients might see non-linear history that needs merging. A rising conflict rate might trigger an alert to consider increasing W or improving network reliability.With these SLIs defined, we can now set an SLO for staleness. Typically, an SLO will specify a threshold for staleness that should be met a certain percentage of the time. For example, an organization might decide: “95% of reads should have a staleness age below 500 milliseconds, and stale-read occurrences should stay under 0.1% of all reads.” Such an SLO sets clear expectations that nearly all reads are fresh (within 0.5s of the latest data) and very few return old data. It’s important to pair these objectives with alerting thresholds and operational responses:Example SLO (Staleness) – Target: P95 staleness age ≤ 500 ms, and stale-read rate ≤ 0.1% (per 1 hour window). Alerts: If 95th percentile staleness exceeds 500 ms for more than 10 minutes (primary alert), on-call should investigate lagging replicas or network issues (possible causes: replication failing, anti-entropy backlog). If it exceeds 500 ms intermittently (e.g. 5 minutes in an hour – secondary warning), schedule a closer look at load or repair processes. Likewise, if stale-read rate rises above 0.1%, a primary alert signals potential consistency problems – operators might check for nodes down or heavy write load overwhelming W acknowledgments. A secondary alert at 0.05% could warn of a trend toward SLO violation, prompting checks of the hinted-handoff queue or Merkle tree diffs. We also set an absolute convergence time cap: e.g. maximum convergence time 5 s at P99.9. If any write takes more than 5 s to reach all replicas, that’s a primary alert (perhaps a replica is stuck or a stream is failing – check the repair service or consider removing the node from rotation). A softer alert at 3–4 s convergence can help catch issues early. Runbook notes: on stale-read alerts, first identify if a particular replica or region is lagging (e.g. check the repair backlog metrics and hint queues). On convergence-time alerts, verify the anti-entropy jobs aren’t backlogged or throttled, and look for network partitions. The SLO is met when these metrics stay within targets.Anti-Entropy and Repair BudgetsAchieving a staleness SLO requires active repair mechanisms to limit how long inconsistencies persist. Dynamo-style systems use two complementary approaches: read repair and background anti-entropy. Read repair triggers during a read operation when the system discovers that the replicas contacted have mismatched versions. In Cassandra, for example, if a quorum read finds one replica out-of-date, it will update that replica on the spot before returning to the client. The client gets the up-to-date value, and the involved replicas are made consistent. Read repair thus opportunistically burns down staleness for frequently-read data – the more a piece of data is read, the more chances to fix any replica that missed a write. However, read repair alone isn’t enough for rarely-read items (which might remain inconsistent indefinitely if never read). That’s where background anti-entropy comes in.Background anti-entropy tasks (often using Merkle trees or similar data digests) run periodically to compare replicas and repair differences in bulk. Each replica maintains a Merkle tree of its key-range; by comparing trees between replicas, the system can find which segments differ without comparing every item. A simple representation of a Merkle tree node in Go might look like:type MerkleNode struct { hash []byte left *MerkleNode right *MerkleNode}Using such trees, a background job can efficiently identify out-of-sync keys and synchronize them. The cadence and rate of this repair job act as a budget for staleness: if you run anti-entropy more frequently (or allow it to use more bandwidth), inconsistencies are corrected sooner, reducing worst-case staleness. For example, if repairs run every hour, a replica that missed an update will be stale at most an hour (ignoring hints) before the Merkle tree comparison catches it. If that’s too long for your SLO, you might increase repair frequency or switch to continuous incremental repair.It’s important to configure repair rate limits so that anti-entropy doesn’t overwhelm the cluster. Repair can be I/O-intensive; throttling it (e.g. limiting streaming bandwidth or number of partitions fixed per second) prevents impact to front-end latency but prolongs how long replicas remain inconsistent. The SLO provides a guideline here: if our SLO is “staleness age P95 < 500ms”, and we notice background repairs are taking minutes to hours to cover the dataset, that’s a mismatch – we’d need either a faster repair cycle or rely on stronger quorums to mask that delay.Membership churn (nodes leaving or joining) can rapidly inflate the repair backlog. For instance, when a node goes down, any writes it misses will generate hints and differences. If it’s down for 30 minutes, that’s 30 minutes of writes to reconcile when it comes back. If nodes frequently fail or if we add new nodes (which require streaming data to them), the system could constantly be in “catch-up” mode. Operators should track how quickly repair debt accrues vs. how fast it’s paid off.Parameter Playbook: N, R, W Trade-offsTo concretely guide tuning, here’s a playbook of quorum settings and their qualitative effects. Each row shows a representative (N, R, W) configuration, the quorum overlap (R + W – N), tolerance to failures, and the read/write latency-consistency trade-off:In practice, many deployments choose a middle ground like (N=3, R=2, W=1) or (N=3, R=1, W=2) for eventually consistent behavior, or (R=2, W=2) for firm consistency. The overlap formula R + W – N indicates how many replicas’ data a read is guaranteed to share with the last write; positive overlap means at least one replica in common (so a read will catch that write), zero or negative means it’s possible for a read to entirely miss the latest writes. As shown above, larger quorums improve consistency at the expense of latency and fault tolerance. Smaller quorums improve performance and fault tolerance (you can lose more nodes and still operate) but increase the chance of stale responses. When setting an SLO, you can use this table to pick a configuration that meets your freshness targets.(Note: The table uses N=3 for illustration; higher N follow similar patterns. For instance, (5, 3, 1) has overlap -1 (fast writes, slow-ish reads, likely stale), whereas (5, 3, 3) has overlap +1 (quorum consistency), and (5, 4, 4) would have overlap +3 but little failure tolerance).Implementation Hooks and MetricsFinally, let’s tie these concepts to the actual implementation (as in Agarwal’s Dynamo-style Go store) and discuss where to instrument. We’ve already seen how write quorum enforcement is coded and where we could count successes/failures. Another crucial piece is replica selection – knowing which nodes are responsible for a key. Agarwal’s store uses consistent hashing to map keys to nodes. For a given key, the system finds the N replicas in the ring responsible for it:// Replica selection for a key (basis for R/W placement and convergence measurement)func (ring *HashRing) GetNodesForKey(key string) ([]ICacheNode, error) { h, err := ring.generateHash(key) if err != nil { return nil, err } start := ring.search(h) seen := map[string]struct{}{} nodes := []ICacheNode{} for i := start; len(nodes) < ring.config.ReplicationFactor && i < start+len(ring.sortedKeys); i++ { vHash := ring.sortedKeys[i%len(ring.sortedKeys)] node, _ := ring.vNodeMap.Load(vHash) n := node.(ICacheNode) if _, ok := seen[n.GetIdentifier()]; !ok { nodes = append(nodes, n) seen[n.GetIdentifier()] = struct{}{} } } return nodes, nil}This function returns the list of nodes that should hold a given key (up to N distinct nodes). It’s the backbone of both the write and read paths – writes go to these N nodes, reads query a subset (of size R) of them. From an SLO perspective, GetNodesForKey provides the scope of where we must monitor consistency for each item. We could instrument right after a write is accepted to track convergence. Also, if a read at consistency level < ALL is performed, using this function we could compare the version it got to other replicas’ versions – if one of the other replicas has a higher version, that read was stale. This check could increment the stale-read counter. Essentially, GetNodesForKey lets us pinpoint which replicas to compare; it’s where we “measure” consistency across the replica set.For conflict detection, we already discussed vector clocks. Instrumentation-wise, whenever the system merges vector clocks (after a write or read repair), it can check if the merge resulted in multiple surviving branches. If yes, increment the conflict metric. The VectorClock type above is simple, but in usage, e.g., vc1 := VectorClock{ "nodeA":5, "nodeB":3 } and vc2 := VectorClock{ "nodeA":5, "nodeB":4 } would be compared – if neither dominates, you have a conflict. By observing how often that happens (and perhaps how many versions result), we quantify the “consistency anomalies” experienced.Throughout the code, there are many places to emit metrics: when writes succeed or fail the quorum check, when read repair runs (count how many rows repaired), size of hinted-handoff queues, etc. The key is to map them to our SLO. For instance, after the Write successful log above, we might record the lagging replicas count (N - ackCount) for that write – if >0, that write contributes to potential staleness until those catch up. Summing such lag over time or tracking the max lag can inform convergence times. Similarly, each read could log the staleness age (now - last_write_timestamp seen) for that item. These instrumentations ensure that the theoretical SLI definitions (stale-read rate, staleness age, etc.) have concrete counters and timers in the running system.With careful tuning (quorum sizes, repair cadence) and diligent monitoring, teams can reap the benefits of high availability while keeping staleness within acceptable bounds.Archit Agarwal’s guest article provides the implementation details of these mechanisms in Go:🧠Expert InsightBuilding a Distributed Key-Value Store in Go: From Single Node to Planet Scale by Archit AgarwalA build-to-learn exercise that walks through the architectural primitives behind Dynamo-style systems.Read the Complete Article🛠️ Tool of the WeekFoundationDB – Open-Source, Strongly Consistent Distributed DatabaseFoundationDB is a distributed key-value store that delivers strict serializable ACID transactions at scale, letting teams build multi-model services (documents, graphs, SQL-ish layers) on a single fault-tolerant core.Highlights:End-to-End Transactions: Global, multi-key ACID transactions with strict serializability simplify correctness versus eventually consistent or ad-hoc sharded systems.Layered Multi-Model: Build higher-level data models (queues, doc/graph, catalog/metadata) as “layers” on top of the core KV engine—one reliable substrate for many services.Resilience by Design: Automatic sharding, replication, and fast failover; continuous backup/restore and encryption options for enterprise reliability.Deterministic Simulation Testing: Each release is hammered by large-scale fault-injection simulation, yielding exceptional robustness under node and network failures.Learn more about FoundationDB📎Tech BriefsSkybridge: Bounded Staleness for Distributed Caches by Lyerly et al. | Meta Platforms Inc. and OpenAI: This conference paper describes Skybridge, a lightweight system developed at Meta that provides fine-grained, per-item staleness metadata for distributed caches, enabling best-effort or fail-closed bounded staleness (e.g., two seconds) at global scale by indexing recent writes across all shards, detecting replication gaps, and allowing cache hosts to prove most reads are fresh without re-fills—achieving up to 99.99998% 2-second consistency with minimal CPU, memory, and bandwidth overhead.DAG-based Consensus with Asymmetric Trust (Extended Version) by AMORES-SESAR et al.: This paper proves that naïvely swapping threshold quorums for asymmetric ones breaks DAG common-core (“gather”) primitives, then introduces a constant-round asymmetric gather and, from it, the first randomized asynchronous DAG-based consensus for asymmetric trust that decides in expected constant rounds.Rethinking Distributed Computing for the AI Era by Akshay Mittal | Staff Software Engineer at PayPal: This article calls for rethinking distributed computing for AI, highlighting how current architectures clash with transformer workloads and advocates for AI-native designs such as asynchronous updates, hierarchical communication, and adaptive resource use, drawing on DeepSeek’s sparse Mixture-of-Experts model.Repairing Sequential Consistency in C/C++11 by Lahav et al.: This paper identifies that the C/C++11 memory model’s semantics for sequentially consistent (SC) atomics are flawed, and proposes a corrected model called RC11 that restores soundness of compilation, preserves DRF-SC, strengthens SC fences, and prevents out-of-thin-air behaviors.Amazon SQS Fair Queues: a New Approach to Multi-Tenant Resiliency: Introduced in July 2025, this is a new feature that automatically mitigates noisy neighbor effects in multi-tenant message queues by prioritizing messages from quieter tenants to maintain low dwell times, combining the performance of standard queues with group-level fairness without requiring changes to existing consumer logic.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next. Do take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief,Deep EngineeringIf your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
07 Aug 2025
Save for later

Deep Engineering #12: Tony Dunsworth on AI for Public Safety and Critical Systems

Divya Anne Selvaraj
07 Aug 2025
From quantization to synthetic data, how to build AI that’s fast, private, and resilient#12Tony Dunsworth on AI for Public Safety and Critical SystemsFrom quantization to synthetic data, how to build AI that’s fast, private, and resilient under pressure.Live Virtual Workshop: Securing Vibe CodingJoin Snyk's Staff Developer Advocate Sonya Moisset on August 28th at 11:00AM ET to learn:✓ How Vibe Coding is reshaping development and the risks that come with it✓ How Snyk secures your AI-powered SDLC from code to deployment✓ Strategies to secure AI-generated code at scaleEarn 1 CPE Credit!Register today!Hi Welcome to the twelfth issue of Deep Engineering“The challenge isn’t how to train the biggest model—it’s how to make a small one reliable.”That’s how Tony Dunsworth sums up his work building AI infrastructure for 911 emergency systems. In public safety, failure can have devastating effects with lives at stake. You’re also working with limited compute, strict privacy mandates, and call centers staffed by only two to five people at a time. There’s no budget for a proprietary AI stack. And there’s no tolerance for downtime.Dunsworth holds a Ph.D. in data science, with a dissertation focused on forecasting models for public safety answering points. For over 15 years, he’s worked across the full data lifecycle—from backend engineering to analytics and deployment—in some of the most sensitive domains in government. Today, he leads AI and data efforts for the City of Alexandria, where he’s building secure, on-prem AI systems that help triage calls, reduce response time, and improve operational resilience.To understand what it takes to design AI systems that are cost-effective, maintainable, and safe to use in critical systems, we spoke with Dunsworth about his use of synthetic data, model quantization, open-weight LLMs, and risk validation under operational load.You can watch the complete interview and read the transcript here or scroll down for our synthesis of what it takes to build mission-ready AI with small teams, tight constraints, and hardly any margin for error.Ending on August 25 11:00 AM PTLearn from top-rated books such as C++ Memory Management,C++ in Embedded Systems, Asynchronous Programming with C++, and more. Elevate your C++ skills and help support The Global FoodBanking Network with your purchase!Get the BundleSign Up |AdvertiseBuilding Emergency-Ready AI: Scaling Down to Meet Constraints —with Tony DunsworthHow engineers in critical systems can design reliable, resource-efficient AI to meet hard limits on privacy, compute, and risk.AI adoption in the public sector is accelerating but slowly. A June 2025 EY survey of government executives found 64% see AI’s cost-saving potential and 63% expect improved services, yet only 26% have integrated AI across their organizations. The appetite is there, but so are steep barriers. 62% cited data privacy and security concerns as a major hurdle – the top issue – along with lack of a clear data strategy, inadequate infrastructure and skills, unclear ROI, and funding shortfalls. Public agencies face tight budgets, limited tech staff, legacy hardware, and strict privacy mandates, all under an expectation of near-100% uptime for critical services.Public safety systems epitomize these constraints. Emergency dispatch centers can’t ship voice transcripts or medical data off to a cloud API that might violate privacy or go down mid-call. They also can’t afford fleets of cutting-edge GPUs; many 9-1-1 centers run on commodity servers or even ruggedized edge devices. AI solutions here must fit into existing, resource-constrained environments. For engineers building AI systems in production, scale isn't always the hard part—constraints are.By treating public safety as a high-constraint exemplar, we can derive patterns applicable to other domains like healthcare (with HIPAA privacy and limited hospital IT), fintech (with heavy regulation and risk controls), logistics (where AI might run on distributed edge devices), embedded systems (tiny hardware, real-time needs), and regulated enterprises (compliance and uptime demands). In all such cases, “bigger” AI is not necessarily better – adaptability, efficiency, and trustworthiness determine adoption.Leaner Models for Mission-Critical SystemsOpen models come with transparent weights and permissive licenses that allow self-hosting and fine-tuning, which is crucial when data cannot leave your premises. In 2025, several open large language models (LLMs) have emerged that combine strong capabilities with manageable size:Meta LLaMA 3: Released in 2025, with 8B and 70B parameter versions. LLaMA 3 offers state-of-the-art performance on many tasks and improved reasoning, and Meta touts it as “the best open source models of their class”. However, its license restricts certain commercial uses and the training data is not fully disclosed. In practice, the 70B model is powerful but demanding to run, while the 8B version is much more lightweight.Mistral 7B / Mixtral: The French startup Mistral AI has focused on efficiency. Mistral 7B (a 7-billion-parameter model) punches above its weight, often outperforming larger 13B models, especially on English and code tasks. They also introduced Mixtral 8×7B, a sparse Mixture-of-Experts model with 46.7B total parameters where only ~13B are active per token. This clever design means “Mixtral outperforms Llama 2 70B on most benchmarks with 6× faster inference” while maintaining a permissive Apache 2.0 open license. It matches or beats GPT-3.5-level performance at a fraction of the runtime cost. Mixtral’s trick of not using all parameters at once lets a smaller server handle a model that behaves like a much larger one.Swiss “open-weight” LLM: This is a new 70B-parameter model developed by a coalition of academic institutions (EPFL/ETH Zurich) on the public Alps supercomputer. The Swiss LLM is fully open: weights, code, and training dataset are released for transparency. Its focus is on multilingual support (trained on data in 1,500+ languages) and sovereignty – no dependency on Big Tech or hidden data. Licensed under Apache 2.0, it represents the “full trifecta: openness, multilingualism, and sovereign infrastructure,” designed explicitly for high-trust public sector applications. Importantly, the Swiss model was developed to comply with EU AI Act and Swiss privacy laws from the ground up.Other open models like Falcon 180B (UAE’s giant model) or BLOOM 176B (the BigScience community model) exist, but their sheer size makes them less practical in constrained settings. The models above strike a better balance. Table 1 compares these representative options by size, hardware needs, and privacy posture:Table 1: Open-source LLMs suited for constrained deployments, compared by size, infrastructure needs, and privacy considerations.Choosing an open model allows agencies to avoid vendor lock-in and meet governance requirements. By fine-tuning these models in-house on domain-specific data, teams can achieve high accuracy without sending any data to third-party services. However, open models do come with trade-offs.The biggest of these Dunsworth says:“ is understanding that the speed is going to be a lot slower. Even with my lab having 24 gigs of RAM, or my office lab having 32 gigs of RAM, they are still noticeably slower than if I'm using an off-site LLM to do similar tasks. So, you have to model your trade-off, because I have to also look at what kind of data I'm using—so that I'm not putting protected health information or criminal justice information out into an area where it doesn't belong and where it could be used for other purposes. So, the on-premises local models are more appealing for me because I can do more with them—I don't have the same concern about the data going out of the networks.”That’s where techniques like Quantization and altering the model architecture come in – effectively scaling down the model to meet your hardware where it is:Quantization: Dunsworth defines quantization as:"a way to optimize an LLM by making it work more efficiently with fewer resources—less memory consumption. It doesn’t leverage all of the parameters at once, so it’s able to package things a little bit better. It packages your requests and the tokens a little more efficiently so that the model can work a little faster and return your responses—or return your data—a little quicker to you, so that you can be more interactive with it.”By reducing model weight precision (e.g. from 16-bit to 4-bit), quantization can shrink memory footprint dramatically and speed up inference with minimal impact on accuracy. For example, a 70B model quantized to 4-bit effectively behaves like a ~17B model in memory terms, often retaining ~95% of its performance. Combined with efficient runtimes (like Meta’s GGML for CPU and GPU kernels optimized for int4/int8 arithmetic), quantization lets even a single GPU PC host models that previously needed a whole cluster.Altering the model architecture for efficiency: The Mixture-of-Experts (MoE) design in Mixtral increases parameter count (for capacity) but only activates a subset of experts per token, so you don’t pay the full compute cost every time. This architecture is a natural fit when you need bursts of capability without constant heavy throughput – much like emergency systems that must handle occasional complex queries quickly, but don’t see GPT-scale volumes continuously. The result: big-model performance on small-model infrastructure.Dunsworth’s field lab architecture offers a practical view into how these techniques are actually used. “I’ve been doing more of the work with lightweight or smaller LLM models because they’re easier to get ramped up with,” he says—emphasizing that local deployments reduce risk of data exposure while enabling fast iteration. But even with decent hardware (24–32 GB RAM), resource contention remains a bottleneck:“The biggest challenge is resource base… I’m pushing the model hard for something, and at the same time I’m pushing the resources… very hard—it gets frustrating.”That frustration led him to explore quantization hands-on, particularly for inference responsiveness. “I’ve got to make my work more responsive to my users—or it’s not worth it.” Quantization, local hosting, and iterative fine-tuning become less about efficiency for its own sake, and more about achieving practical performance under constraints—especially when “inexpensive” also has to mean maintainable.In practice, deploying a lean model in a mission-critical setting also demands robust inference software. Projects like vLLM have emerged to maximize throughput on a given GPU by intelligently batching and streaming requests. vLLM’s architecture can yield 24× higher throughput than naive implementations by scheduling token generation across multiple requests in parallel.Synthetic Data Pipelines: Fidelity with PrivacyData is the fuel for AI models, but in public safety and healthcare, real data is often sensitive or scarce. This is where synthetic data pipelines have become game-changers, allowing teams to generate realistic, statistically faithful data that mimics real-world patterns without exposing real personal information. By using generative models or simulations to create synthetic call logs, incident reports, sensor readings, etc., engineers can vastly expand their training and testing datasets while staying privacy-compliant.Dunsworth, who builds AI infrastructure for emergency services, describes this approach. Rather than rely on real 911 call logs, Dunsworth reconstructs patterns from operational data to generate synthetic equivalents. “I take it apart and find the things I need to see in it… so when I make that dataset, it reflects those ratios properly,” he explains. This includes recreating distributions across service types—for e.g. police, fire, medical—and reproducing key statistical features like call arrival intervals, elapsed event times, or geospatial distribution.“For me, it’s a lot of statistical recreation… I can feed that into an AI model and say, ‘OK, I need you to examine this.’”Dunsworth’s pipeline is entirely Python-based and open source. He uses local LLMs to iteratively refine the generated datasets: “I build a lot of it, and then I pass it off to my local models to refine what I’m working on.” That includes teaching the model to correct for misleading assumptions—such as when synthetic time intervals defaulted to normal distributions, even though real data followed Poisson or gamma curves. He writes scripts to analyze and feed the correct distributions back into generation:“Then it tells me, ‘Here’s the distribution, here are its details.’ And I feed that back into the model and say, ‘OK, make sure you’re using this distribution with these parameters.’”Workforce and Organizational ImplicationsThe shift to synthetic pipelines can solve multiple problems at once: data scarcity, privacy compliance, and edge-case testing. For training, synthetic records make it easy to balance class frequency—whether you’re modeling rare floods or unusual fraud patterns. For validation, they offer controlled stress tests that historical logs simply can’t provide.“I use it in testing my analytics models… then I can have my model do the same thing. I make sure that they match.”Unlike real-world events, synthetic scenarios can be manufactured to simulate extreme or simultaneous failures—testing the AI under precise conditions.Adoption Grows with Education and PrecisionEarly adoption wasn’t smooth, Dunsworth says. “The biggest hurdle was pushback from peers at first,” he noted. But that changed as datasets improved in realism, and the utility of using synthetic data for demos, teaching, or sandbox testing became obvious.“Now people are more interested… I keep it under an open-source license. Just give me the improvements back—that’s the last rule.”A crucial distinction is that synthetic ≠ anonymized. Rather than redact real identities, Dunsworth starts from a clean slate, using only statistical patterns from real data as seed material. He avoids copying event narratives and even manually inspects Faker-generated names to ensure no accidental leakage:“I don’t reproduce narratives… I go through my own list of people I know to make sure that name doesn’t show up.”He also aligns his work with formal ethical frameworks.“I was very fortunate throughout my education—through my software engineering courses, my analytics and data science courses at university—that ethics was stressed as one of the most important things we needed to focus on alongside practice. So, I have very solid ethical programming training.”Dunsworth also reviews frameworks like the NIST AI RMF to maintain development guardrails.These practices map directly onto any domain where real data is hard to access—medical records, financial logs, customer transcripts, or operational telemetry. The principles are universal:Reconstruct statistical structure from clean seedsValidate outputs against known metricsStress test systematically, not opportunisticallyNever copy real content—synthesize structure, not substanceBuild ethical discipline into your generation workflowFor teams building AI tools without access to real production data, this is a practical playbook. You don’t need a GPU farm or proprietary toolchain. You need controlled pipelines, structured validation, and a robust sense of responsibility. As Dunsworth says:“I feel confident that the people I work with… are all operating from the same place: protecting as much information as we can… making sure we're not exposing anything that we can’t.”Stress Before Success: Risk Management and Resilience EngineeringBuilding AI systems for constrained environments isn’t only about latency, memory, or cost. It’s also about failure and how to survive it.Dunsworth’s work in emergency response illustrates the stakes clearly, but his framework for risk mitigation is widely transferable: define the use case tightly, control where the data flows, and validate under load—not just in ideal cases.“One of the biggest risk mitigations is starting out from the beginning—knowing what you want to use AI for and how you define how it’s working well.”Instead of treating vendor-provided models as turnkey solutions, Dunsworth interrogates the entire data path—from ingestion through inference to retention. That includes third-party dependencies:“What data am I feeding, and how do I work with that vendor to make sure the data is being used the way I intend?” In sensitive environments, he keeps training in-house: “That way… it doesn't leave my organization.”Success is measured operationally:“If you're using it(AI) just to say, ‘Well, we're using AI,’ I'm going to be the first one to raise my hand and say, ‘Stop.’” Instead, AI is validated through concrete outcomes: “It’s enabled our QA manager to process more calls… improving our ability to service our community.”Stress It Twice, Then ShipFor AI systems that might break under pressure, Dunsworth prescribes a straightforward and brutal regimen:“Get synthetic data together to test it (the model)—and then just, in the middle of your testing lab, hit it all at once. Hit it with everything you've got, all at the same time.”Only if the system remains responsive under full overload does it move forward. “If it continues to perform well, then you have some confidence… it’s still going to be reliable enough for you to continue to operate.”Failure is expected but it must be observable and recoverable.“Even if it breaks… we know it can still recover and come back to service quickly.”One real-world implementation of this mindset is LogiDebrief, a QA automation system deployed in the Metro Nashville Department of Emergency Communications. Developed to audit 9-1-1 calls in real time, LogiDebrief formalizes emergency protocol as logical rules and then uses an LLM to interpret unstructured audio transcripts, match them against those rules, and flag any deviations. As Chen et al. explain: “The framework formalizes call-taking requirements as logical specifications, enabling systematic assessment of 9-1-1 calls against procedural guidelines”.In practice, it executes a three-stage pipeline:Context extraction (incident type, responder actions),Formal rule evaluation using Signal-Temporal Logic,Deviation reporting for any missed steps.This enables automated QA for both AI and human decisions—a form of embedded auditing that surfaces failure as it happens. In deployment, LogiDebrief reviewed 1,701 calls and saved over 311 hours of manual evaluator time. More importantly, when something procedural is missed—like a mandatory question for a specific incident type—it gets flagged and can be corrected in downstream training, improving both model performance and human compliance.From Monoliths to Micro-SolutionsWhen one early AI analytics platform failed under edge-case data—“it just said, ‘I got nothing’”—Dunsworth scrapped the codebase entirely. Why? The workflow made sense to him, but not to his users.“I assumed I could develop an analytics flow that would work for everybody… it worked well for me, but it didn’t work well for my target audience.”This led to a major design shift. Instead of building one global solution, he pivoted to “micro-solutions that will do different things inside the same framework.” This insight should be familiar to any engineer who’s seen a service fail not because it was wrong, but because no one could use it.“If they’re not going to use it—it doesn’t work.”Anticipating the Next FrontiersLooking forward, Dunsworth is focused on redirecting complexity, not increasing it. One focus area: offloading non-emergency calls using AI assistants. “It really is a community win-win, because now we can get those services out faster.”Another: multilingual responsiveness. In cities where services span four or more languages, Dunsworth sees multilingual AI as a matter of equity and latency:“If we can improve the quality and speed of translation… (that can save a life.)”Takeaways for Engineers Weighing AI Adoption in Critical SystemsTo wrap up, here are some key risk mitigation strategies – from technical safeguards to policy measures – that can enable engineers and organizations to confidently adopt AI in sensitive environments:Model Compression (Quantization & Pruning): We’ve discussed quantization as a way to make models smaller and faster. This not only enables using cheaper hardware, but also reduces power consumption (important for e.g. mobile or field deployments) and even attack surface (smaller models are slightly easier to analyze for vulnerabilities). Pruning (removing redundant weights) is another technique to shrink models. The overall effect is a lean model less likely to overload your systems.Encryption and Secure Execution: In high-trust domains, data encryption is mandatory not just at rest but in transit – and increasingly during computation. Self-hosting an LLM doesn’t automatically guarantee security; teams must ensure all connections are encrypted (HTTPS/TLS) so that input/output data can’t be intercepted. Tools like Caddy (a web server with automatic TLS) are often used as front-ends to internal AI APIs to enforce this. Moreover, techniques like homomorphic encryption and secure enclaves (Intel SGX, etc.) are emerging so that even if someone got a hold of the model runtime, they couldn’t extract sensitive data. While these techniques can be expensive computationally, they’re improving.Robust Vendor Governance: If using any third-party models or services, public sector teams impose strict governance – similar to vetting a physical supplier. Open-source models don’t come from a vendor per se, but they still warrant a security review (has the model or its code been audited? is there a risk of embedded trojans?). It is also important to focus on what vendors bring: requiring transparency about model training data (to avoid hidden biases or privacy violations), demanding uptime SLAs if it’s a cloud API, and ensuring models meet regulatory standards.In-House Fine-Tuning & Monitoring: Rather than rely on a vendor’s generic model, high-constraint deployments should favor owning the last-mile training of the model. By fine-tuning open models on local data, organizations not only boost performance for their specific tasks, they also retain full control of the model’s behavior. This makes it easier to mitigate bias or inappropriate behavior – if the model says something it shouldn’t, you can adjust the training data or add safety filters and retrain. Continuous monitoring is part of this loop: logs of the AI’s outputs should be reviewed (often with tools like LogiDebrief or simple dashboards) to catch any drift or errors. Essentially, the AI should be treated as a critical piece of infrastructure that gets constant telemetry and maintenance, not a “set and forget” software. This reduces the risk of unseen failure modes accumulating over time.Fallback and Redundancy: Finally, a practical strategy – always have a Plan B. In emergency systems, if the AI fails or is uncertain, it should gracefully hand off to a human or a simpler rule-based system. While this isn’t unique to AI (classic high-availability design), it’s worth noting that large AI models can fail in novel ways (e.g. getting stuck in a hallucination loop). Having a watchdog process that can kill and restart an AI service if it behaves oddly is a form of automated risk mitigation too.Each of these strategies – from squeezing models to encrypting everything, from vetting vendors to fine-tuning internally – contributes to an overall posture of trust through transparency and control. They turn the unpredictable black box into something that engineers and auditors can reason about and rely on. Dunsworth repeatedly comes back to the theme of discipline in engineering choices. Public safety and other critical systems can’t afford guesswork.By enforcing these risk mitigations, engineers can build systems that move fast and not break things beyond rapid recovery.🛠️ Tool of the WeekBlueSky Statistics – A GUI-Driven Analytics Platform for R UsersBlueSky Statistics is a desktop-based, open-source analytics platform designed to make R more accessible to non-programmers—offering point-and-click simplicity without sacrificing statistical power. It supports data management, traditional and modern machine learning, advanced statistics, and quality engineering workflows, all through a rich graphical interface.Highlights:Drag-and-Drop Data Science for R: BlueSky lets users load, browse, edit, and analyze datasets through interactive data grids—no scripting required.Modeling, Machine Learning & Deep Learning: BlueSky supports over 50 modeling algorithms, including decision trees, SVMs, KNN, logistic regression, and ANN/CNN/RNN.Full Statistical Suite + DoE + Survival Analysis: The platform includes descriptive and inferential statistics, survival models (Kaplan-Meier, Cox), and advanced modules for longitudinal analysis and power studies.Quality, Process, and Six Sigma Tools: Tailored for manufacturing and process improvement, BlueSky integrates tools aligned with the DMAIC cycle: Pareto and fishbone diagrams, SPC control charts, Gage R&R, process capability analysis, and equivalence testing.Integrated R IDE for Programmers: For technical users, BlueSky offers a built-in R IDE to write, import, execute, and debug R scripts—bridging GUI simplicity with code-based extensibility.Learn more about BlueSky Statistics📎Tech BriefsA New Perspective On AI Safety Through Control Theory Methodologies | Ullrich et al. | IEEE: Proposes a novel approach to AI safety using principles from control theory—specifically “data control”—to provide a top-down, system-theoretic framework for analyzing and assuring the safety of AI systems in real-world, safety-critical environments.Can We Make Machine Learning Safe for Safety-Critical Systems? | Dr. Thomas G. Dietterich | Distinguished Professor Emeritus Oregon State University: Outlines a comprehensive framework for integrating machine learning into safety-critical systems by combining risk-driven data collection, formal verification, and continuous anomaly and near-miss detection.AI Safety vs. AI Security: Demystifying the Distinction and Boundaries | Lin et al. | The Ohio State University: Establishes clear conceptual and technical boundaries between AI Safety (unintentional harm prevention) and AI Security (defense against intentional threats), arguing that precise definitions are essential for effective research, governance, and trustworthy system design—especially as misuse increasingly straddles both domains.Making certifiable AI a reality for critical systems: SAFEXPLAIN core demo | Barcelona Supercomputing Center (BSC): Introduces the SafeExplain platform which offers a structured safety lifecycle and modular architecture for AI-based cyber-physical systems, integrating explainable AI, functional safety patterns, and runtime monitoring.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next. Do take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief,Deep EngineeringIf your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Visually different images
Subscribe to Packt _Deep Engineering
Deep Engineering is a weekly newsletter for developers and software architects featuring expert-led insights, deep dives into modern systems, and clear thinking on real-world software design.

Divya Anne Selvaraj
04 Aug 2025
Save for later

Deep Engineering Specials: Vibe Coding—Promise, Pressure, and Practical Limits

Divya Anne Selvaraj
04 Aug 2025
Coding by prompt is still a desired dream—appealing, but not yet reliableSpecialsVibe Coding—Promise, Pressure, and Practical LimitsWhat recent research tells us about vibe coding: where it accelerates, where it breaks, and how to adopt it without undermining engineering disciplineLive Virtual Workshop: Securing Vibe CodingJoin Snyk's Staff Developer Advocate Sonya Moisset on August 28th at 11:00AM ET to learn:✓ How Vibe Coding is reshaping development and the risks that come with it✓ How Snyk secures your AI-powered SDLC from code to deployment✓ Strategies to secure AI-generated code at scaleEarn 1 CPE Credit!Register today!Hi Welcome to this special issue of Deep Engineering.With this issue we go beyond the hype of vibe coding. Drawing on first-party research from Microsoft, Google, IFS, and independent academics, we examine where this paradigm helps, where it breaks, and what it asks of software teams if it scales. For architects, leads, and developers navigating a shifting toolchain, this piece aims to provide some coordinates: empirical findings, adoption thresholds, and governance strategies.Coming soon...Launching today (Monday, August 4) at 11:00 AM PT and ending on August 25 11:00 AM PT.Master the ultimate high-performance, general-purpose programming language with our C++ lessons bundle from the experts at Packt. Learn from top-rated books such as C++ Memory Management,C++ in Embedded Systems,Asynchronous Programming with C++, and more. Elevate your C++ skills and help support The Global FoodBanking Network with your purchase!Save the Link (Goes live at 11:00 a.m. PT today)Sign Up |AdvertiseTo Vibe or Not to Vibe: That is the QuestionA research-based examination of vibe coding’s promises, pitfalls, and what it means for the future of software teams.According to Stack Overflow’s 2025 Developer Survey, nearly 72% of developers said *“vibe coding” – defined as generating entire applications from prompts – is not part of their workflow, with an additional 5% emphatically rejecting it as ever becoming a part of their workflow.Empirical research and position papers published this year provide some more context.Sarkar, A., (University of Cambridge and University College London) and Drosos, I., (Microsoft) conducted an observational study (June, 2025) with 12 professional developers from Microsoft, all experienced in programming and familiar with tools like GitHub Copilot. Participants used a conversational LLM-based coding interface to complete programming tasks, with the researchers analyzing session transcripts, interaction logs, and follow-up interviews to identify usage patterns and cognitive strategies. They found that while participants reported efficiency gains for familiar or boilerplate tasks, particularly when generating or modifying standard patterns, these benefits diminished for more complex assignments.Debugging AI-generated code remained a major friction point, often requiring developers to mentally reverse-engineer the logic or manually rewrite portions of the output. Importantly, users expressed consistent uncertainty about the correctness and reliability of generated code, underscoring that trust in the AI remained limited.Gadde, A., (May, 2025), in their literature review based paper, positions vibe coding as the next evolution in AI-assisted software development, arguing that it significantly lowers barriers to entry by enabling users to generate working software from natural language prompts. Gadde characterizes vibe coding as a practical middle ground between low-code platforms and agentic AI systems, combining human intent expression with generative code synthesis. Unlike traditional development workflows, Gadde claims vibe coding empowers users—even those without formal programming experience—to act as high-level specifiers, while generative models handle much of the underlying implementation.Sapkota, R., et al. (2025) conducted a structured literature review and conceptual comparison of two emerging AI-assisted programming paradigms: vibe coding and agentic coding. The paper defines vibe coding as an intent-driven, prompt-based programming style in which humans interact with an LLM through conversational instructions, iteratively refining output. By contrast, agentic coding involves AI agents that autonomously plan, code, execute, and adapt with minimal human input. The authors argue that these paradigms represent distinct axes in AI-assisted development—one human-guided and interactive, the other goal-oriented and autonomous.They propose a comparative taxonomy based on ten dimensions, including autonomy, interactivity, task granularity, execution environment, and user expertise required. They claim that vibe coding excels in creative, exploratory, and early-stage prototyping contexts, while agentic coding shows promise in automating repetitive, well-scoped engineering tasks. However, both approaches face common challenges, including error handling, debugging, quality assurance, and system integration. The authors conclude that hybrid systems combining the strengths of vibe coding and agentic coding—conversational guidance with agentic automation—may be the most practical path forward.Stephane H. Maes, CTO and CPO at IFS & ESSEM Research, in their literature review and enterprise experience-based position paper (April 2025), state that code written through vibe coding often lacks documentation, architectural coherence, and design rationale. Without rigorous standards and tooling for verification, maintainability, and lifecycle control, the adoption of AI-generated code introduces operational risks. Maes proposes that successful adoption of vibe coding in production environments requires not just technical integration but structured governance—workflows, tooling, and cultural norms that enforce accountability, traceability, and testability. The core thesis is that “real coding is support and maintenance,” and vibe coding, in its current form, largely sidesteps these responsibilities.And yet, despite these limitations and negative developer experience, vibe coding remains very much a part of the conversation. Why? Not because it works at scale today, but because it gestures toward a future where programming feels more like intent-driven design than manual construction. It flatters a seductive idea: that software can be summoned by describing it, rather than engineered line by line.Gadde, A., (May, 2025), in their literature review based paper, positions vibe coding more positively as the next evolution in AI-assisted software development, arguing that it significantly lowers barriers to entry by enabling users to generate working software from natural language prompts. Gadde characterizes vibe coding as a practical middle ground between low-code platforms and agentic AI systems, combining human intent expression with generative code synthesis. Unlike traditional development workflows, Gadde claims vibe coding empowers users—even those without formal programming experience—to act as high-level specifiers, while generative models handle much of the underlying implementation.Engineers don’t just build systems for today, they chart trajectories. And so, with today’s special feature, we aim to:Identify where vibe coding works today (early-stage prototypes, educational contexts, speculative design),Understand why it falls short elsewhere (debugging, integration, maintainability),Anticipate the organizational and skill implications, so you can lead with context when the tooling matures.Where and How Vibe Coding HelpsVibe coding works best when the goal is to explore, not to ship; to experiment, not to scale. In these scenarios, its limitations are tolerable, and its productivity gains are real.Contexts where vibe coding is most effective:Rapid prototyping and ideation: The AI-assisted conversational workflow drastically accelerates early development. What once took weeks can often be scaffolded in hours. Solo developers, according to Ardor Labs, report building functional prototypes—from simple web apps to plugin systems—by iteratively prompting an LLM, adjusting results, and redeploying within a single day.Startups and hackathons: Early-stage teams exploit vibe coding to punch above their weight. Y Combinator managing partner Jared Friedman has said that, “A quarter of the W25 startup batch have 95% of their codebases generated by AI.” In this context, code maintainability is a secondary concern; speed to demo or MVP is paramount.Exploratory use by professionals: Developers may use vibe coding for spinning up proof-of-concepts or exploring unfamiliar frameworks, even if they ultimately rewrite the code manually. AI researcher Andrej Karpathy (the originator of the term vibe coding) himself has described this as ideal for “weekend projects” or “rapid ideation” scenarios.One-click deployment pipelines: Google’s guide notes that coupling vibe coding with integrated cloud deployment creates “the fastest path from concept to a live, shareable application,” especially when platforms like Replit or Google Cloud streamline backend provisioning.Lowering the barrier to entry: Because it uses natural language, vibe coding attracts those with minimal programming background. Google highlights that it makes “app building more accessible,” while Gadde frames it as the next phase in no-code evolution—enabling domain experts to act as high-level specifiers without writing syntax-bound code.Educational and learning contexts: Sapkota et al. note that vibe coding performs well in educational and exploratory settings, particularly when the emphasis is on learning through experimentation rather than delivering production-ready systems. Students can engage in prompt-driven debugging or request scaffolded solutions to better understand programming constructs.For all its speed and surface-level convenience, vibe coding introduces architectural liabilities that make experienced developers cautious—if not outright resistant—to using it beyond disposable or exploratory projects.Limitations: Maintainability, Debugging, and Technical DebtThe issues with vibe coding are not liked just to code that fails to run, but about code that fails to last. Vibe coding shortcuts implementation, but often bypasses the rigor, clarity, and accountability that production-grade systems require.Why vibe-coded software tends to erode under pressure:Poor structural hygiene. AI-generated code often lacks internal consistency and coherent design. As Ardor Labs reports, repetitive prompting typically results in a patchwork of quick fixes, duplicated logic, and workarounds that accumulate into technical debt.Invisible complexity. Maes notes that repeated AI-driven edits can produce systems even their authors no longer understand. Without documentation or rationale, the code becomes opaque—even to its original creator.Debugging burdens. Because developers often see AI-generated code only after an error appears, root cause analysis becomes guesswork. IBM’s overview highlights the lack of clear architectural structure, making it harder to trace failures through unfamiliar logic paths.Prompting is not a substitute for engineering judgment. While it's tempting to patch issues by prompting another fix, this iterative loop can obscure responsibility and create brittle dependencies. As some developers now observe, “using one AI to debug another” may sound clever but is often insufficient without human involvement.Production pitfalls: performance, scale, and securityScalability bottlenecks. Sarkar & Drosos observed that developers often had to switch from vibe coding to manual optimization as application complexity increased. AI-generated prototypes may appear functional but suffer from poor resource usage and brittle error handling when scaled.Security vulnerabilities. A 2021 NYU cybersecurity study found that around 40% of GitHub Copilot’s generated code contained exploitable flaws, from SQL injection risks to use of deprecated libraries. These same vulnerabilities can silently propagate in vibe-coded applications, especially when users copy output without review.False confidence. Vibe coding’s conversational interface can lull developers—particularly those with limited experience—into accepting functional output as production-ready. As Ardor Labs warns, this “move fast” approach may ship apps that run but cannot be maintained, audited, or secured.Neglected lifecycle thinking. Maes (2025) captures this gap directly: “coding can be done with ‘no code’” via AI, “but such code is not maintainable”—a critical failure if the system is expected to evolve beyond a demo.For all its promise, vibe coding comes with serious “gotchas” that make seasoned engineers hesitant to use it in production. But as all the attention the paradigm continues to attract it is still very much something developers and enterprises are not giving up on yet.Workforce and Organizational ImplicationsThe rise of vibe coding raises important questions about software engineering roles, required skills, and how organizations should adapt. Who stands to benefit the most, and whose work might be displaced or transformed?Democratization vs. De-skillingVibe coding lowers the barriers to entry. Non-developers and junior developers can now build software that once required full-stack expertise. A solo entrepreneur, equipped only with a vision and the right AI tools, can ship a working prototype. In this framing, the AI serves as a kind of expert consultant, accelerating iteration and enabling domain specialists to turn ideas into software without hiring a team. This democratization of software creation is one of vibe coding’s most widely advertised benefits.But this accessibility comes with a paradox. Heavy reliance on AI for everyday coding tasks can cause skills to atrophy. Ray, P. (May 2025) identifies this as a core concern: if developers grow accustomed to prompting and accepting output without deep understanding, they risk losing the foundational skills required to validate, debug, and maintain that software.The illusion of productivity can further obscure the issue. A 2025 METR study, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” found that “When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.”Without strong engineering judgment, the value of AI assistance can quickly become negative.Who Benefits—and Who Might Be Left Behind?In its current form, vibe coding offers the greatest leverage to small, agile teams and individuals operating under time constraints. For early-stage startups, the appeal is obvious: speed to prototype, speed to market. For these teams, robustness is a secondary concern—shipping something that works, even partially, is often enough to secure feedback, funding, or traction. Similarly, larger organizations may use vibe coding to prototype features quickly without committing senior developer time, particularly in product discovery phases.By contrast, engineers at companies with established production systems remain cautious. The architectural demands of long-lived systems, along with maintainability and security concerns, make “pure” vibe coding untenable. Google’s guidance distinguishes between two modes: an “experimental” vibe coding mode suited to rapid ideation, and a “disciplined” mode in which the AI acts as a subordinate pair-programmer, with the human remaining accountable for quality.This bifurcation in usage reflects a broader split in how developers perceive AI's impact on the profession. According to the 2025 Stack Overflow Developer Survey, 64% of respondents said they do not view AI tools—including coding assistants—as a threat to their employment. Instead, many see these tools as a way to offload repetitive work and focus on higher-order engineering problems. However, that figure has dropped from 68% the previous year, indicating a subtle but real shift: developers increasingly recognize that roles are evolving, and that staying competitive will require new skills.The differentiator is not whether one uses AI, but how. Engineers who add prompt engineering, AI supervision, and LLM-aware debugging to their toolset will likely outperform those who default to traditional workflows for all tasks. Conversely, those who resist this shift entirely may find themselves outpaced—not by the AI, but by peers who know how to manage it effectively.Leadership Response: Strategic Adoption with GuardrailsFor CTOs, software architects, and engineering leads, the responsible response to vibe coding is neither rejection nor blind adoption, but strategic containment. Its introduction should be scoped to workflows where quality risk is minimal and speed adds clear value—such as internal prototypes, automated test generation, or scaffolding of non-critical features that engineers can later refactor. Governance is essential. Maes proposes structured frameworks like VIBE4M, which emphasize verification, maintainability, and monitoring as prerequisites for accepting AI-generated code into supported systems. Even in the absence of formal frameworks, the principle holds: all AI contributions must undergo human review. Review checklists may need to explicitly flag AI-authored code for scrutiny, and CI pipelines should incorporate tools like Snyk or ESLint with AI-focused rules to catch common faults. These checks inevitably introduce friction—but they are precisely what distinguish engineering from experimentation. As Maes notes, rigorous validation “goes against the trend [of] AI makes developers more productive” in the short term, but is non-negotiable for sustainable practice.Equally critical is the cultural framing of vibe coding within teams. Leaders should position it not as a shortcut, but as a collaboration—one that still demands comprehension, accountability, and domain judgment. Encouraging developers to re-express or review AI-generated solutions—whether to a colleague or back to the model—can ensure they understand the logic they are deploying. This guards against blind acceptance and reinforces human agency. Forward-looking leaders will also recognize and reward the kinds of work AI cannot yet replicate: deep architectural reasoning, creative problem decomposition, and user empathy. These capabilities will define developer impact in a world where code generation is easy but understanding remains hard.When it comes to delivering reliable, maintainable systems at scale, the fundamentals of software engineering still apply. The organizations that will benefit most are those that blend the “vibes” with vigilance: embracing AI-driven development to speed up outcomes, while doubling down on human expertise in architecture, validation, and security to ensure those outcomes stand the test of time. In doing so, we can harness the promise of vibe coding – conversational and intuitive development – without losing the hard-won lessons of decades of engineering practice.🛠Vibe Coding in Practice: The Tooling LandscapeIn the pre-publication paper, "A Review on Vibe Coding: Fundamentals, State-of-the-art, Challenges and Future Directions," Ray, P., presents a qualitative, exploratory analysis of non-peer-reviewed sources such as product blogs, documentation, and public demos. The paper surveys a wide range of vibe coding tools—natural language-driven development environments—and maps them across an interaction spectrum (delegation to pairing) and a layered stack architecture extending from prompt interfaces to deployment infrastructure. It highlights the growing sophistication of both browser-native platforms and IDE-integrated agents. Here is a summary.Browser-native platforms feature prominently. Tools such as v0 by Vercel, Bolt.new, Create, and Lazy AI allow users to scaffold, preview, and deploy full-stack applications from prompt-based workflows. These platforms commonly embed frontend frameworks like Next.js and Tailwind, along with real-time CI/CD, auth, and database orchestration. Others—Trickle AI and Napkins.dev—generate UIs from screenshots or sketches, while HeyBoss, Softgen, and Rork focus on zero-config application builds with export to GitHub or direct deployment.IDE-integrated tools like Cursor, Cody, and Zed offer agent-assisted development with context-aware completions, semantic diffs, and local vector search. More advanced platforms such as Windsurf and Zencoder AI incorporate retrieval-augmented generation, multi-agent workflows, and enterprise readiness features. Some, including Cline and Trae AI, extend into terminal and plugin-based workflows, supporting Git integration, shell execution, and modular agent control.Finally, autonomous coding agents—notably Devin AI and All Hands AI—aim to handle entire software lifecycles: building, testing, debugging, and deploying with minimal human intervention.Ray’s survey suggests that these tools do not converge on a single model or interface. Instead, they reflect a broader shift: from programming as manual construction to software as orchestrated dialogue between developer intent and agentic execution. Read the complete paper.That’s all for today. Thank you for reading this special issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next. Just reply to this email to tell us what you think.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringIf your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
31 Jul 2025
Save for later

Deep Engineering #11: Quentin Colombet on Modular Codegen and the Future of LLVM’s Backend

Divya Anne Selvaraj
31 Jul 2025
LLVM’s backend is going modular#11Quentin Colombet on Modular Codegen and the Future of LLVM’s BackendHow LLVM’s modular backend improves code generation across targets—by breaking down instruction selection into testable, reusable passesGoLab 2025: Celebrating a Decade of Go InnovationGoLab 2025 promises a rich and diverse program crafted to elevate the skills and insights of every attendee, from aspiring Gophers to seasoned experts. The agenda features a comprehensive array of:>In-depth Workshops: Hands-on learning experiences for practical skill development.>Technical Talks: Presentations on the latest advancements, best practices, and challenges in Go development.>Lightning Talks: Quick, insightful interventions that spark new ideas and discussions.Use code PKT15SP for a 15% discount on all ticket types.Register nowHi Welcome to the eleventh issue of Deep EngineeringLLVM has long been celebrated for its modular frontend and optimizer. But for years, its backend—the part responsible for turning IR into machine code—remained monolithic, with instruction selectors like SelectionDAG and FastISel combining multiple responsibilities in a single, opaque pass. That’s now changing, as modular pipelines begin to reshape how LLVM handles instruction selection.This issue’s delves into GlobalISel, the instruction selection framework designed to replace SelectionDAG and FastISel with a more modular, testable, and maintainable architecture. Built around a pipeline of distinct passes—IR translation, legalization, register bank selection, and instruction selection—GlobalISel improves backend portability, supports new Instruction Set Architectures (ISAs) like RISC-V, and makes it easier to debug and extend LLVM across targets.To understand the design decisions behind GlobalISel—and the broader implications for backend engineering—we spoke with its architect, Quentin Colombet. A veteran LLVM contributor who joined Apple in 2012, Colombet has worked across CPU, GPU, and DSP backends and is also the code owner of LLVM’s register allocators. His perspective anchors our analysis of the trade-offs, debugging strategies, and real-world impact of modular code generation.We also include an excerpt from LLVM Code Generation (Packt, 2025), Colombet’s new book. The selected chapter introduces TableGen, LLVM’s domain-specific language for modeling instructions and backend logic—a central tool in GlobalISel's extensibility, despite its sharp edges.You can watch the complete interview and read the transcript here or scroll down to read the feature and book excerpt.Sign Up |AdvertiseFor Friday news roundups, trend insights, and quick takes between issues, follow Deep Engineering on LinkedIn.Deconstructing Codegen with Quentin ColombetHow LLVM’s Modular Backends Enable Portable, Maintainable OptimizationLLVM’s instruction selection was long dominated by SelectionDAG and FastISel, both monolithic frameworks that performed legalization, scheduling, and selection in a single pass per basic block. This design limited code reuse and optimization scope. GlobalISel was created to improve performance, granularity, and modularity. It operates on whole functions and uses Machine IR (MIR) directly, avoiding the need for a separate IR like SelectionDAG. This reduces overhead and improves compile times. While AArch64’s GlobalISel was initially slower than x86’s DAG selector at -O0, ongoing work has closed the gap; by LLVM 18, GlobalISel’s fast path was within 1.5× of FastISel.Perhaps more importantly, GlobalISel breaks down instruction selection into independent passes. Rather than one big conversion, it has a pipeline: IR translation, legalization of unsupported types, register bank selection, and actual instruction selection. Quentin Colombet, LLVM’s GlobalISel architect, explains that in SelectionDAG“all those steps happen in one monolithic pass…It’s a black box. But with GlobalISel, it’s a set of distinct optimization passes. Between those passes, you can insert your own target-specific or generic optimizations. That modularity gives you better flexibility, more opportunities for code reuse, and makes debugging and testing easier.”GlobalISel is designed as a toolkit of reusable components. Targets can share the common Core Pipeline and customize only what they need. Even the fast-O0 and optimized-O2 selectors now use the same pipeline structure, just configured differently. This is a big change from the past, where ports often had to duplicate logic across FastISel and SelectionDAG. The modular design not only avoids code duplication, it establishes clear debug boundaries between stages. If a bug or suboptimal codegen is observed after instruction selection, a backend engineer can pinpoint whether it originated in the legalization phase, the register banking phase, or elsewhere, by inspecting the MIR after each pass. LLVM’s infrastructure supports dumping the MIR at these boundaries, making it far easier to diagnose issues than untangling a single mega-pass. As Colombet quips,“Instruction selection actually involves multiple steps…From the start, [GlobalISel] has a much more modular design.”The benefit is that each phase (e.g. illegal operation handling) can be tested and understood in isolation.Portability for New Targets and ISAsA clear motivation for this overhaul is target portability. LLVM today must cater to a wide variety of architectures – not just x86 and ARM, but RISC-V (with its ever-expanding extensions), GPUs, DSPs, FPGAs, and more. A monolithic selector makes it hard to support radically different ISAs without accumulating lots of target-specific complexity. GlobalISel’s design, by contrast, forces a clean separation of concerns that parallels how one thinks about a new target. There are four major target hooks in GlobalISel, corresponding to the key decisions a backend must make:CallLowering – how to lower abstract calls and returns into the concrete calling convention (registers, stack slots) of the target.LegalizerInfo – what operations and types are natively supported by the target, and how to expand or break down those that aren’t. For example, if the target lacks a 64-bit multiply, the legalizer might specify to chop it into smaller multiplies or call a runtime helper.RegisterBankInfo – the register file characteristics, such as separate banks (e.g. general-purpose vs. floating-point registers) and the cost of moving data between banks.InstructionSelector – the final pattern matching that turns “generic” machine ops into actual target opcodes.Each of these components is relatively self-contained. When bringing LLVM to a new architecture, developers can implement and test them one by one. Colombet advises keeping the big picture in mind:“There’s no single right way to do instruction selection…because GlobalISel is modular, it’s easy to look at just one piece at a time. But if you’re not careful, those pieces may not fit together properly, or you may end up implementing functionality that doesn’t even make sense in the broader pipeline.”In practice, the recommended approach is to first ensure you can lower a simple function end-to-end (even if using slow or naive methods), then refine each stage knowing it fits into the whole. This incremental path is much more feasible with a pipelined design than it was with SelectionDAG’s all-or-nothing pattern matching.Real-world experience shows the value of this approach. RISC-V, for instance, has been rapidly adding standard and vendor-specific extensions. LLVM 20 and 21 have seen numerous RISC-V backend updates – from new bit-manipulation and crypto instructions to the ambitious V-vector extension. With GlobalISel, adding support for a new instruction set extension often means writing TableGen patterns or legality rules without touching the core algorithm. In early 2025, LLVM’s RISC-V backends even implemented vendor extensions like Xmipscmove and Xmipslsp for custom silicon.This kind of targeted enhancement – adding a handful of operations in one part of the pipeline – is exactly what the modular design enables. It’s telling that as soon as the core GlobalISel framework matured, targets like ARM64 and AMDGPU quickly adopted it for their O0 paths, and efforts are underway to make it the default at higher optimizations.New CPU architectures (for example, a prospective future CPU with unusual 128-bit scalar types) can be accommodated by plugging in a custom legalizer and reusing the rest of the pipeline. And non-traditional targets stand to gain as well. Apple’s own GPU architecture, which Colombet has worked on, was one early beneficiary of a GlobalISel-style approach – its unusual register and instruction structure could be cleanly modeled through custom RegisterBank and Legalizer logic, rather than fighting a general-purpose DAG matcher.The result is that LLVM’s backend is better positioned to embrace emerging ISAs. As Colombet noted,“The spec [for RISC-V] is still evolving, and people keep adding new extensions. As those extensions mature, they get added to the LLVM backend…If your processor supports a new, more efficient instruction, LLVM can now use it.”Another aspect of portability is code reuse across targets. GlobalISel makes it possible to write generic legalization rules – for example, how to lower a 24-bit integer multiply using 32-bit operations – once in a target-independent manner. Targets can then opt into those rules or override them with a more optimal target-specific sequence. In SelectionDAG, some of that was possible, but GlobalISel is designed with such flexibility in mind from the start. This pays off when supporting families of architectures (say, many ARM variants or entirely new ones) – one can leverage the existing passes instead of reinventing the wheel. Even the register allocator and instruction scheduling phases (which come after instruction selection) can benefit from more uniform input thanks to GlobalISel producing consistent results across targets.Easier Debugging and MaintenanceThe switch to a modular backend isn’t just about adding features – it also improves the day-to-day experience of compiler engineers maintaining and debugging the code generator. With the old monolithic pipeline, a failure in codegen (like an incorrect assembly sequence or a compiler crash) often required reverse-engineering the entire selection process. By contrast, GlobalISel’s structured passes and the use of MIR make it far more tractable. Engineers can inspect the MIR after each stage (translation, legalize, register assignment, etc.) using LLVM’s debugging flags, to see where things start to diverge from expectations. For instance, if an out-of-range immediate wasn’t properly handled, the issue will be visible right after the Legalizer pass – before it ever propagates to final assembly. This clear separation of concerns reduces the cognitive load in debugging.Colombet emphasizes testing and debugging as first-class considerations. He advocates using tools like llvm-extract and llvm-reduce to isolate the function or instruction that triggers a bug.“Instead of debugging hundreds or thousands of lines, you end up with 10 lines that still reproduce the problem. That’s a huge productivity win,” Colombet says of minimizing test cases.With GlobalISel, this strategy can be taken even further. Each pass in the pipeline can often be run on its own, enabling unit-test-like isolation. LLVM’s verifier checks invariants between passes, so errors tend to surface closer to their source.This modular design yields tangible benefits:Clearer failure boundaries: MIR can be inspected after each phase (translation, legalization, register assignment).Faster diagnosis: bugs can be isolated and reproduced at the level of a single pass.Built-in correctness checks: verifier routines catch many issues early.Reuse over reinvention: less hand-written C++, more declarative TableGen logic.TableGen, for its part, remains a double-edged sword. GlobalISel backends rely heavily on it to define matching rules, allowing reuse across targets. But the tooling is infamously brittle. As Colombet puts it:“TableGen is kind of the hated child in LLVM… The syntax alone doesn't tell you the semantics… what your code means depends on how it’s used in the backend generator. And the error messages are often vague or inconsistent… everyone in the LLVM community kind of dislikes TableGen.”Despite its flaws, TableGen is central to GlobalISel’s maintainability. It helps abstract instruction complexity into compact, reusable rules — a major win for modern ISAs.Backend stability is also reinforced by fuzzing. Tools like llvm-isel-fuzzer generate random IR to stress-test instruction selectors, uncovering obscure failures that user test cases might miss. Colombet highlights their importance, especially in contexts like GPU drivers:“In contexts like GPU drivers, a compiler crash could potentially be exploited, so hardening the backend against unexpected input is vital.”While fuzzing doesn’t improve performance, it ensures each GlobalISel pass handles unexpected inputs robustly. Over time, this approach, combining modularity, reproducibility, automation, and stress-testing, has made LLVM’s backend infrastructure more resilient and easier to evolve.Supporting New Hardware ParadigmsLLVM’s move toward a modular backend reflects two major broader architectural shifts in computing: the rise of heterogeneous computing, which LLVM addresses through MLIR; and the growing use of machine learning to guide compiler decisions, exemplified by projects like MLGO. Both reflect a broader trend toward modularity, data-driven optimization, and architectural flexibility in modern compilers.The Rise of Heterogeneous ComputingAs heterogeneous systems become standard, combining CPUs, GPUs, and specialized accelerators, compilers must generate efficient code across dissimilar targets, and optimize across their boundaries. LLVM’s response is Multi-Level Intermediate Representation (MLIR) which we covered in Deep Engineering #9, a flexible, extensible IR framework that sits above traditional LLVM IR and enables high-level, domain-specific optimizations before lowering to machine code.Colombet explains:“With MLIR, you can model both your CPU and GPU modules within the same IR. That opens up optimization opportunities across different targets… you could move computations between devices more easily or apply cost models to decide what should run where.”This enables compilers to consider cross-device trade-offs early in the pipeline — for example, determining whether a tensor operation should run on a GPU or CPU based on context or cost. MLIR achieves this via a layered, dialect-based design: each dialect captures a different level of abstraction (e.g., tensor algebra, affine loops, GPU kernels), which can be progressively lowered. Once it reaches LLVM IR, the standard code generation path, including GlobalISel, takes over.MLIR’s integration with GlobalISel brings key advantages:Targets like GPUs or DSPs can be supported by implementing GlobalISel hooks for custom codegen.MLIR transformations can assume the backend will honor those hooks, enabling consistent lowering.LLVM 20 improved backend metadata and attribute precision, allowing frontends like Swift and Rust to better express semantic constraints to the optimizer — particularly important in multi-language, multi-device builds.Although GlobalISel doesn’t directly manage CPU–GPU splitting, its modular design makes it easier to support unconventional targets cleanly, whether an Apple GPU or a DSP with custom arithmetic units. The combination of MLIR’s flexible front-end IR and GlobalISel’s extensible backend forms a coherent pipeline for future hardware.Growing Use of Machine Learning to Guide Compiler DecisionsA second major shift, still largely experimental — is the integration of machine learning inside the compiler itself. Research tools like Machine Learning Guided Optimization (MLGO) have shown promising results in replacing fixed heuristics with learned policies. In 2021, Trofin et al. used reinforcement learning to drive LLVM’s inliner, achieving ~5% code size reductions at -Oz with only ~1% additional compile time. The same framework was applied to register allocation, learning spill strategies that occasionally outperformed the default greedy allocator.Colombet sees real potential here:“Compilers are full of heuristics, and machine learning is great at discovering heuristics we never would’ve thought of.”But he’s also clear about the practical challenges. First is the problem of feature extraction — the task of encoding program state into meaningful inputs for a model:“To use an analogy: could you price a house just by counting the number of windows? There’s probably some correlation, but it’s not enough. Similarly, in something like register allocation, the features you use to train your model may not carry enough information.”Even with good features, integration into the backend is nontrivial. LLVM’s register allocator and GlobalISel weren’t built with explicit “decision points” for ML models to hook into.“If all you can do is tweak some knobs from the outside, you may not be able to make meaningful improvements… do we need to write our own instruction selector or register allocator to take full advantage of machine learning? I think the answer is yes – but we’ll see.”The implication is that further modularization may be needed — isolating backend subproblems (like spill code insertion or instruction choice) into well-defined, pluggable interfaces. This would allow learned components to replace or guide specific decisions without requiring wholesale rewrites. Such a hybrid model — rule-based infrastructure augmented by ML at critical junctures — aligns with the trajectory GlobalISel already began: decoupling backend logic into testable, replaceable units.Whether through MLIR’s IR layering or MLGO’s data-driven policies, the common trend is clear: LLVM’s backend is evolving toward composability, configurability, and adaptability by refactoring it into pieces that are easier to understand, reuse, and eventually learn. By decomposing code generation into well-defined passes, LLVM has made it easier to support new ISAs such as RISC-V, extend to targets like GPUs and DSPs, and integrate with tools like MLIR. The transition is still ongoing, and trade-offs remain—compile-time costs, tooling gaps, and the complexity of mixing TableGen with C++—but the payoff is clear: a backend that is more debuggable, more maintainable, and better prepared for architectural change. As machine learning and domain-specific IRs reshape the frontend, GlobalISel ensures that the backend can evolve in parallel. It is not just a rewrite; it is infrastructure for the next era of compilers.If the architectural case for modular code generation in LLVM caught your attention, Quentin Colombet’s book, LLVM Code Generation offers the definitive deep dive. Colombet, the architect behind GlobalISel, takes readers inside the backend machinery of LLVM—from instruction selection and register allocation to debugging infrastructure and TableGen. The following excerpt—Chapter 6: TableGen – LLVM’s Swiss Army Knife for Modeling—introduces the declarative DSL that powers much of LLVM’s backend logic. It explains how TableGen structures instruction sets, eliminates boilerplate, and underpins the extensibility that modular backends depend on.TableGen – LLVM Swiss Army Knife for Modeling by Quentin ColombetThe complete “Chapter 6: TableGen – LLVM Swiss Army Knife for Modeling” from the book LLVM Code Generation by Quentin Colombet (Packt, May 2025).For every target, there are a lot of things to model in a compiler infrastructure to be able to do the following:Represent all the available resourcesExtract all the possible performanceManipulate the actual instructionsThis list is not exhaustive, but the point IS that you need to model a lot of details of a target in a compiler infrastructure.While it is possible to implement everything with your regular programming language, such as C++, you can find more productive ways to do so. In the LLVM infrastructure, this takes the form of a domain-specific language (DSL) called TableGen.In this chapter, you will learn the TableGen syntax and how to work your way through the errors reported by the TableGen tooling. These skills will help you be more productive when working with this part of the LLVM ecosystem.This chapter focuses on TableGen itself, not the uses of its output through the LLVM infrastructure. How the TableGen output is used is, as you will discover, TableGen-backend-specific and will be covered in the relevant chapters. Here, we will use one TableGen backend to get you accustomed to the structure of the TableGen output, starting you off on the right foot for the upcoming chapters.Read the Complete ChapterLLVM Code Generation is for both beginners to LLVM and experienced LLVM developers. If you’re new to LLVM, it offers a clear, approachable guide to compiler backends, starting with foundational concepts. For seasoned LLVM developers, it dives into less-documented areas such as TableGen, MachineIR, and MC, enabling you to solve complex problems and expand your expertise.Use codeLLVM20 for 20% off at packtpub.com.Get the Book🛠️Tool of the Week⚒️DirectX Shader Compiler (DXC) – HLSL Compiler Based on LLVM/ClangDXC is Microsoft’s official open-source compiler for High-Level Shader Language (HLSL), built on LLVM and Clang. It supports modern shader development for Direct3D 12 and Vulkan via SPIR-V, and is widely used in production graphics engines across the gaming and visual computing industries.Highlights:LLVM-Based Shader Compilation: Leverages the LLVM infrastructure to provide robust parsing, optimization, and code generation for HLSL, targeting both DXIL (DirectX Intermediate Language) and SPIR-V.Cross-Platform Targeting: Supports SPIR-V output for Vulkan through the -fspv-target-env flag, making it viable for multi-platform engines needing portability between Direct3D and Vulkan.Modern Shader Features: Enables developers to use Shader Model 6.x features, including wave operations, ray tracing, and mesh shaders, with forward compatibility for future models.Active Development and Tooling Improvements: The June 2025 release (v1.8.2406.1) added new diagnostics, SPIR-V fixes, -ftime-trace support for compilation profiling, and improvements to the dxcompiler API surface.Learn more on GitHub📰 Tech Briefs2025 AsiaLLVM - Understanding Tablegen generated files in LLVM Backend | Prerona Chaudhuri: This beginner-focused talk covers how TableGen generates key C++ backend files in LLVM—such as CodeEmitter, DisassemblerTables, and RegisterInfo—using AArch64 examples to explain how MIR instructions are encoded, decoded, and mapped to target-specific definitions.Type-Alias Analysis: Enabling LLVM IR with Accurate Types | Zhou et al.: Introduces TypeCopilot, a type-alias analysis framework for LLVM IR that overcomes the limitations of opaque pointers by inferring multiple concrete pointee types per variable, enabling accurate, type-aware static analyses with up to 98.57% accuracy and 94.98% coverage.LLVM 22 Compiler Enters Development With LLVM 21 Now Branched: LLVM 21 has been officially branched for release—introducing support for AMD GFX1250 (RDNA 4.5?), NVIDIA GB10, and expanded RISC-V features—while LLVM 22 development begins with continued backend enhancements, Clang 21 updates for C++2c and AVX10 changes, and LLVM 22.1 expected around March 2026.The Architecture of Open Source Applications (Volume 1): LLVM | Chris Lattner: This book chapter presents LLVM as a modular, retargetable compiler infrastructure built around a typed intermediate representation (LLVM IR), designed from the outset as a set of reusable libraries rather than a monolithic toolchain.2024 LLVM Dev Mtg - State of Clang as a C and C++ Compiler | Aaron Ballman: In this talk, Clang's lead maintainer outlines ongoing progress across C and C++ standards support, tooling, diagnostics, and community growth—highlighting Clang’s expanding role within LLVM, its near-complete C++20 and C23 conformance, and persistent challenges like compile-time overhead and documentation.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
24 Jul 2025
Save for later

Deep Engineering #10: Prof. Elías F. Combarro on Programming Quantum Systems in Flux

Divya Anne Selvaraj
24 Jul 2025
Writing code for quantum computers that don’t fully exist yet—and why it matters now.#10Prof. Elías F. Combarro on Programming Quantum Systems in FluxWhat it takes to design, debug, and reason about quantum programs when the hardware, abstractions, and rules are all still evolvingHi Welcome to the tenth issue of Deep EngineeringLast week, analysts at Bank of America (BofA) released a note on quantum computing saying, “This could be the biggest revolution for humanity since discovering fire.” It may seem like an audacious comparison at first for a field known till now to be abstract with hardware that is not there yet. But IBM has already laid out a comprehensive roadmap to build a large-scale, fault-tolerant quantum computer by 2029 and expects to achieve practical quantum advantage by 2026.If quantum computing is to deliver on its promise, it won’t be physicists alone who get us there—it will be software teams building the abstractions, compilers, and algorithms that bridge theory and hardware. Engineers now face a peculiar challenge: to write software for machines that don’t fully exist, on hardware that changes year to year, using abstractions that must bridge mathematical theory, noisy processors, and unpredictable outcomes.To understand how industry professionals can prepare to face this challenge, we spoke to Prof. Elías F. Combarro, co-author of A Practical Guide to Quantum Computing (Packt, 2025). Combarro is a full professor in the Department of Computer Science at the University of Oviedo in Spain, with degrees in both Mathematics and Computer Science and national academic honors in each. He completed his PhD in Mathematics in 2001, with research spanning computability theory and logic, and has since authored over 50 papers across quantum computing, algebra, machine learning, and fuzzy systems. His recent work focuses on applying quantum methods to problems in optimization and algebraic structures. He has held research appointments at CERN and Harvard and served on the Advisory Board of CERN’s Quantum Technology Initiative from 2021 to 2024.You can watch the full interview and read the transcript here—or read on for our synthesis of what it means to design and debug quantum code in the context of real-world constraints and developments.Sign Up |AdvertiseTwilio Segment: Data you can depend on, Built your wayTwilio Segment was purpose-built so that you don’t have to worry about your data. Forget the data chaos, dissolve the silos between teams and tools, and bring your data together with ease. So that you can spend more time innovating and less time integrating.Learn moreExecutable Abstractions in an Unfinished Machine with Prof. Elías F. CombarroWhat it means to build quantum software before the hardware — or the rules — are fully written.Analysts at BofA earlier this month made quite a riveting statement: quantum computing “could be the biggest revolution for humanity since discovering fire.” For a field known for being very abstract, this claim underscores how concretely disruptive its proponents now expect it to be—reshaping computation, shifting global power, and pressuring industries well ahead of full-scale machines. In fact, in an interview with CNBC International Live, Haim Israel, Head of Global Thematic Research at BofA, stated that quantum computing is no longer “20 years away.” He credits recent breakthroughs—largely enabled by AI—with accelerating progress to a point where early commercial applications are already emerging. Israel projects that quantum advantage will be achieved by 2030, with quantum supremacy arriving five to six years later.Yet, realizing that potential requires software developers and researchers to think very differently about programming. Quantum programs don’t run on stable, deterministic digital processors; they run on fragile qubits governed by probabilistic physics. As Prof. Elías F. Combarro puts it,“Quantum programs are fundamentally different. You don’t have loops. You don’t have persistent memory or data structures in the way you do in classical programming. What you have is a quantum circuit—a finite sequence of operations that runs once, from start to finish. You can't stop, inspect, or loop within the circuit. You run it, you measure, and then you’re done.”This new paradigm forces a reimagining of everything from algorithm design and debugging to testing and maintenance.From Qubits to Entanglement: New Mental ModelsClassical developers are used to variables holding definite values and code flowing through deterministic steps. By contrast, a single qubit can exist in a superposition of basis states, represented by a two-dimensional complex state vector. A single qubit can be represented geometrically using the Bloch sphere where every point on the surface corresponds to a possible quantum state and operations appear as rotations. As Combarro explains,“Every point on the surface of the sphere represents a possible state of your qubit, and quantum gates—operations—can be visualized as rotations of this sphere.”But as soon as we move beyond one qubit, our everyday intuition falters. Two qubits live in a 4-dimensional state space, ten qubits in a $2^{10}=1024$-dimensional space, and so on – exponential growth that quickly outpaces human imagination.A defining feature of multi-qubit systems is entanglement, a phenomenon with no classical equivalent.“Entangled systems can’t be described by just looking at the states of their individual parts… You need the full global state,” Combarro notes.An entangled pair of qubits shares a joint state that cannot be factored into two independent single-qubit states. Change or measure one part, and the other seems to instantly reflect that change – a mystery so striking that Einstein dubbed it “spooky action at a distance.” This “spookiness” is not just a quirk of physics; it’s a resource for computation.“Entanglement… only exists in quantum systems. It doesn’t happen in classical physics… you can use it to implement protocols and algorithms that are simply impossible with classical resources,” Combarro says.Indeed, algorithms like superdense coding (sending two classical bits by transmitting a single entangled qubit) or quantum teleportation of states require entanglement to work. In quantum computing, entanglement is the magic that enables a kind of collaborative computation across qubits – and it’s central to any future quantum advantage.When Measurement Changes the AnswerAnother fundamental difference between classical and quantum computation lies in how information is retrieved from a system. In classical software, reading a variable doesn’t disturb its value. In quantum software, measurement fundamentally changes the system. A qubit’s rich state is collapsed to a definite outcome (like |0⟩ or |1⟩) when measured, and all the other information encoded in its amplitudes is lost.“In quantum computing, when you perform a measurement, you can't access all that information. You only get a small part of it,” Combarro explains.Measuring a single qubit yields just one classical bit (0 or 1) of information, no matter how complex the prior state.And after measurement, “you’ve lost everything about the prior superposition. The system collapses, and that collapse is irreversible.”This means a quantum program can’t freely check intermediate results or branch on qubit values without destroying the very quantum state it’s computing with.The consequence is that quantum algorithms are often designed to minimize measurements until the end, or to cleverly avoid needing to know too much about the state. Even then, the outcome of a quantum circuit is usually probabilistic. Running the same circuit twice can give different answers, a shock to those accustomed to deterministic code.“For people used to classical programming, that's very strange—how can the same inputs give different outputs? But it’s intrinsic to quantum mechanics,” Combarro says.To manage this randomness, quantum algorithms rely on repetition and statistical analysis. Developers run circuits many times (often thousands of shots) and aggregate the results. For example, a quantum classifier might be run 100 times, yielding say 70 votes for “cat” and 30 for “dog,” which indicates a high probability the input was a cat. Many algorithms, like phase estimation, improve their accuracy by repeated runs:“In quantum phase estimation… you repeat the procedure to get better and better approximations. The more you repeat it, the more accurate the estimate.”In other words, you rarely trust a single run of a quantum program – you gather evidence from many runs to reach a reliable answer.Developers must also separate intrinsic quantum uncertainty from extrinsic hardware noise. The randomness of quantum measurement is unavoidable, but today’s quantum processors add extra uncertainty via errors (decoherence, gate faults, crosstalk). Mitigating these is an active area of research. Techniques like error mitigation calibrate and correct for known error rates in the readouts. More ambitiously, quantum error correction (QEC) encodes a “logical” qubit into multiple physical qubits to detect and fix errors on the fly. This too flips classical assumptions: in quantum, you can’t simply copy bits for redundancy (the no-cloning theorem forbids cloning an unknown quantum state). Instead, QEC uses entanglement and syndrome measurements to indirectly monitor errors.Researchers at QuEra achieved a milestone in this regard through magic state distillation on logical qubits – a technique proposed 20 years ago as essential for universal, fault-tolerant computing. As Sergio Cantu, vice president of quantum systems at QuEra even said, “Quantum computers would not be able to fulfill their promise without this process of magic state distillation. It’s a required milestone.”Even as such advances bring fully error-corrected quantum computers closer, they underline that today’s hardware is still very much unfinished.Circuits, Qubits, and the Tools of the TradeHow do you write software for machines that operate under these strange rules? The answer is to raise the level of abstraction—while keeping physics in mind. Modern quantum programming frameworks like Qiskit, Cirq, PennyLane, and others allow developers to describe quantum programs as circuits: sequences of quantum gates and operations applied to qubits. This is a low-level, assembly-like model of computation, but it’s the lingua franca of quantum algorithms. High-level constructs familiar from classical languages (loops, if-else branches, function recursion) are largely absent inside a quantum circuit. Instead, any classical logic (like looping until a condition is met) has to run outside the quantum computer, orchestrating multiple circuit executions. As Combarro recounts, the shift can be jarring:“I remember the first student who asked, ‘How do you implement a loop in a quantum computer?’ And I had to say, ‘Come in and sit down—I have bad news.’”In practice, a quantum program might consist of a Python script that calls a quantum circuit many times, adjusting parameters or processing results on a classical computer between calls.Despite these challenges, certain abstractions and libraries have emerged to help manage complexity. IBM’s Qiskit has become a popular choice, especially in education, for its extensive features and cloud access to real quantum processors.“Qiskit has the largest number of features, and it’s the easiest one for accessing quantum computers online,” Combarro notes.In fact, one can prototype an algorithm on a local simulator and then, with only a few lines changed, run it on a real back-end.“You only need to change three or four lines of code to make that switch, but it’s very satisfying to say, ‘I’m running this on an actual quantum computer.’”This ease of swapping targets is a boon in an environment where hardware is evolving – it lets developers test their abstractions against today’s best machines and see the effects of real noise and connectivity constraints.Quantum compilers (transpilers) play a crucial role here. They take the high-level circuit and map it to the specific gates and qubits of a given device. Unlike a classical compiler, a quantum transpiler must contend with hardware quirks like limited qubit connectivity.“Not all qubits in a quantum computer are connected to each other. So, if you want to apply a gate to two distant qubits, the transpiler has to insert extra operations to move data around — introducing noise and increasing circuit depth,” Combarro explains.The transpiler may also optimize the circuit, combining gates or reordering operations to shorten the runtime (important before qubits decohere). Understanding what the transpiler is doing – and sometimes guiding it – has become part of the quantum developer’s skill set. For example, a programmer might constrain their circuit to use only certain qubits that have higher fidelity or explicitly insert swap gates to relocate qubits logically. It’s a delicate dance between abstract algorithm design and the very concrete limitations of hardware. Every additional gate is a risk when devices have error rates around 0.1–1% per operation.Debugging an Algorithm You Can’t Fully SeeWorking with quantum software can feel like coding with one eye closed. Because measuring qubits destroys their state, developers can’t step through a quantum program in the same way as a classical one. You can’t pause midway and inspect all qubit values – that would collapse the superpositions and entanglements you painstakingly created. Instead, quantum developers lean heavily on simulation and mathematical reasoning to debug.“To untangle issues, you start by running your code on a classical simulator. These simulators are deterministic and noise-free – they give you the exact mathematical result of the circuit, assuming perfect qubits. This lets you validate whether your logic is correct before moving to actual quantum hardware,” Combarro says.Simulators can output the full statevector of 20 or 30 qubits, allowing a developer to verify that, say, an entangled state or an amplitude amplification step is correct. Visualization tools can display probability distributions or Bloch sphere orientations for small circuits, providing insights that no current hardware can directly reveal.However, simulation has its limits. The memory required grows exponentially with qubit count, so beyond roughly 30 qubits (needing 16 GB of RAM or more), it becomes intractable to simulate general states. This is why today’s quantum algorithms for larger qubit numbers either rely on theoretical reasoning or are tested on actual quantum chips. When running on hardware, developers adopt statistical approaches to debugging: varying parameters, collecting lots of runs, and comparing aggregate results against expectations. They also must account for the possibility that an unexpected result is due to a device error rather than a flaw in the algorithm. As a safeguard, many will run the same circuit on multiple back-end devices (or noise models) to see if a result persists. This is quantum computing’s version of cross-platform testing. Even then, true reproducibility in the classical sense is unattainable on a quantum device – you can’t demand the same random outcome twice. Instead, reproducibility is about getting the same probability distribution of outcomes when conditions are repeated.As Combarro succinctly puts it, “Quantum computations are inherently probabilistic, so you can’t reproduce the exact same measurement result every time. What you can do is ensure a high probability of success.”The Hardware Frontier: Evolving and UncertainPerhaps the biggest challenge in writing quantum software today is that the machine itself is a moving target. Every year brings new devices with more qubits, different noise characteristics, and even new fundamental approaches to quantum bits. Superconducting qubits (used by IBM, Google, and others) dominate the current landscape with devices at 127 qubits and beyond, but they require cryogenic cooling and still have very short coherence times (microseconds). Trapped-ion qubits offer longer-lived states and all-to-all connectivity, but operations are slower and scaling to hundreds of qubits is difficult in practice. Photonic quantum computers, neutral atoms in optical tweezers, silicon spin qubits – each technology comes with trade-offs in coherence, gate fidelity, connectivity, and scalability. No one knows which approach (or fusion of approaches) will ultimately deliver a large-scale, fault-tolerant quantum computer. In a moderated virtual panel titled ‘Future of Quantum Computing’ at the 8thInternational Conference on Quantum Techniques in Machine Learning hosted by the University of Melbourne, Scott Aaronson said,“We do not have a clear winner between architectures such as trapped ion, neutral atoms, superconducting qubits, photonic qubits. Very much still a live race.”This uncertainty means quantum software must be somewhat hardware-agnostic yet ready to embrace new capabilities as they come. A few years ago, for instance, most cloud quantum computers did not support mid-circuit measurement or dynamic circuit logic; now some do, allowing new hybrid algorithms where measurement outcomes can influence subsequent operations. The “rules” of what a quantum program can do in one run are still being rewritten by hardware advances. Developers also contend with frequent library updates and deprecations. “Quantum software libraries evolve very quickly,” Combarro notes, reflecting on how code from his first book had to be updated as Qiskit advanced. This pace has started to stabilize – Qiskit’s major 2.0 release, for example, made relatively few breaking changes – but keeping code working may require more vigilance than in mature fields. Documentation sometimes lags behind new features, requiring quantum coders to read research papers or even source code to understand the cutting edge.Amid the rapid progress, it’s important to recognize that quantum computing is still largely in a pre-advantage era. While researchers have begun to demonstrate quantum advantage on carefully structured tasks, one recent milestone stands out: in July 2025, a team from USC and Johns Hopkins used IBM’s 127-qubit Eagle processors to show an unconditional exponential speedup on a modified version of Simon’s algorithm—a first in the field that doesn’t rely on unproven assumptions about classical limits. But even this breakthrough, as the lead researcher noted, has no immediate practical application beyond demonstrating capability. In fact, the 2025 MIT Quantum Index Report found that large-scale commercial applications of quantum computing remain “far off” despite the surge in patents and investments. Practical quantum advantage is an ongoing race: early claims can evaporate if improved classical algorithms catch up.Google’s much-publicized 2019 quantum supremacy experiment, for example, was soon matched by classical methods, nullifying that particular “advantage.” So, we are in a stage where the promise is undeniable and enormous (quantum computing could “change everything” from drug discovery to encryption), but the delivery is incremental and challenging.Navigating the Coming Quantum AgeIBM has laid out a comprehensive roadmap to build a large-scale, fault-tolerant quantum computer by 2029, called Quantum Starling, capable of running 100 million gates on 200 logical qubits. The plan integrates modular architecture, bivariate bicycle codes for quantum error correction, efficient logical processing units, universal adapters for inter-module communication, and magic state distillation to enable universal computation. IBM’s confidence rests on meeting successive milestones with custom hardware (like the upcoming Nighthawk processor), improved connectivity, and a newly introduced real-time decoder architecture. The company expects to achieve practical quantum advantage by 2026, with Starling serving as the scalable platform for fault tolerance.Lanes et al., researchers at IBM Quantum and PASQAL SAS, in their July 2025 paper have proposed a formal framework for quantum advantage that is platform-agnostic and empirically testable. They argue that advantage should mean outperforming classical systems on specific tasks with rigorously validated results—not theoretical superiority or isolated hardware feats, but measurable, reproducible performance gains in fields like chemistry, materials science, or optimization.In this environment, how should software professionals and technology leaders prepare? The consensus is to start small and start now. Even without large-scale quantum computers at hand, there is much to learn about quantum algorithms, error mitigation techniques, and integration with classical systems.“My advice is simple: start now,” urges Combarro. “If you think quantum computing might be relevant to your domain, begin exploring it as early as possible. The learning curve is steep… If you wait until quantum computing is mainstream, it may be too late to catch up.”This means building up quantum programming skills (in linear algebra, complex probability, and Quantum Processing Unit (QPU)-specific idioms), experimenting with simulators and cloud QPUs, and following the rapid research developments in both hardware and algorithms. Companies are already establishing small quantum teams or partnerships to identify long-term use cases – not because a quantum solution can be deployed today, but to be ready when the hardware crosses key thresholds in the next few years.There is a palpable excitement in the field, tempered by an understanding that quantum computing’s unfinished machine is being completed step by step. Writing quantum software today requires building abstractions for hardware that is still evolving, with each new qubit, error-correction scheme, and algorithm incrementally advancing the field toward practical, fault-tolerant systems. Until then, the work is foundational: preparing tools, methods, and mental models that future machines will depend on.If you found the insights in our feature on quantum software illuminating, A Practical Guide to Quantum Computing by Elías F. Combarro and Samuel González-Castillo (Packt, July 2025) offers a comprehensive and hands-on introduction to the field.Using Qiskit 2.1 throughout, the book walks readers through foundational quantum concepts, key algorithms like Grover’s and Shor’s, and practical techniques for writing and running real quantum programs. It’s ideal for professionals and self-learners looking to build solid, executable intuition—from single qubits to full-stack algorithm design.Use code QUANTUM20 for 20% off at packtpub.com.Get the Book🛠️Tool of the Week⚒️Qiskit – Python‑based Quantum SDK & Compiler StackQiskit is an open-source, Python-first SDK and compiler stack for quantum computing, developed by IBM and widely adopted across industry and academia. It enables developers to design, simulate, transpile, and deploy quantum circuits—whether running on local simulators or real quantum hardware.Highlights:Complete Quantum Software Workflow: Create quantum circuits using a flexible Python API, simulate them with Aer backends (statevector or noisy models), optimize and map circuits to hardware via transpilation, then run them on supported quantum devices like IBM’s QPUs—without changing your code structure.Optimizing Compiler & Hardware-Agnostic Deployment: Qiskit’s advanced transpiler performs qubit mapping, gate fusion, and noise-aware optimizations tailored to target hardware. It supports multiple backends (not just IBM), provides OpenQASM export, and has emerged as a performance leader in gate-depth reduction.Rich Application & Tooling Ecosystem: Includes domain-specific libraries (chemistry, finance, machine learning), visualizers for circuits and Bloch spheres, and profiling tools—empowering debugging and performance analysis across the entire quantum software stack.Actively Maintained & Rapidly Evolving: Since its major v2.0 release in March 2025, Qiskit has continued to advance with v2.1 (June–July 2025), adding a C API for high-throughput workflows, new synthesis and Clifford+T optimizations, multiqubit-gate support, and enhanced dynamic circuit constructs—showing vibrant and ongoing development.Learn more about Qiskit📰 Tech BriefsQuantum Computing Architecture and Hardware for Engineers -- Step by Step -- Volume II by H. Y. Wong (July, 2025): Extends Wong’s earlier work by providing a step-by-step, engineering-focused introduction to trapped-ion quantum computers, covering their physics, mathematics, laser control, and electronics in relation to DiVincenzo's criteria.Scientists make 'magic state' breakthrough after 20 years — without it, quantum computers can never be truly useful: Scientists at QuEra have, for the first time, demonstrated fault-tolerant magic state distillation using logical qubits—an essential breakthrough for running non-Clifford gates and enabling scalable, error-corrected quantum computation.Quantum Computers Just Reached the Holy Grail – No Assumptions, No Limits: Researchers from USC and Johns Hopkins have demonstrated, for the first time, an unconditional exponential speedup on a real quantum computer—solving a variation of Simon’s problem using IBM’s Eagle processors, marking a major milestone in proving quantum advantage without relying on unproven assumptions.The dawn of quantum advantage: A new white paper from IBM and Pasqal outlines a rigorous, empirically testable framework for quantum advantage—defining it as a validated performance edge over classical systems in real-world tasks—and argues that such advantage will emerge from hybrid quantum-classical workflows, likely beginning with variational algorithms and error-mitigated circuits by 2026.2025 MIT Quantum Index Report: The report finds that while investment, research, and job growth in quantum computing are accelerating, large-scale commercial applications remain “far off” due to current limitations in quantum processor performance and scalability.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
17 Jul 2025
Save for later

Deep Engineering #9: Unpacking MLIR and Mojo with Ivo Balbaert

Divya Anne Selvaraj
17 Jul 2025
MLIR’s impact on compilers and Mojo’s promise for AI scale development#9Unpacking MLIR and Mojo with Ivo BalbaertHow MLIR is reshaping compilers for heterogeneous hardware despite adoption challenges—and how Mojo builds on it to unify Pythonic ease with AI‑scale performanceTwilio Segment: Data you can depend on, Built your wayTwilio Segment was purpose-built so that you don’t have to worry about your data. Forget the data chaos, dissolve the silos between teams and tools, and bring your data together with ease. So that you can spend more time innovating and less time integrating.Learn moreHi Welcome to the ninth issue of Deep Engineering.As CPUs, GPUs, TPUs, and custom accelerators proliferate, compilers have become the thin yet critical layer that enables both abstraction and performance.Our feature this week looks at Multi-Level Intermediate Representation (MLIR)—a compiler infrastructure that promises to unify optimization across wildly different domains. Born at Google and now adopted in projects like OpenXLA, LLVM Flang, NVIDIA’s CUDA Quantum, and even hardware DSLs like Chisel, MLIR offers a powerful foundation—but one that comes with real‑world friction: steep learning curves, ecosystem fragmentation, and legacy integration challenges. We unpack where MLIR delivers, where developers struggle with it, and what its future might mean for software architects.Building on this theme, we’re also kicking off a new series on Mojo🔥, a programming language built entirely on MLIR. Written by Ivo Balbaert, Lector at CVO Antwerpen and author of The Way to Go and Packt introductions to Dart, Julia, Rust, and Red, Building with Mojo (Part 1): A Language Born for AI and Systems explores Mojo’s origins, its design goals, and its promise to unify Pythonic ergonomics with AI‑scale performance. Future parts will go deeper—covering Mojo’s tooling, metaprogramming, hardware abstraction, and its role in simplifying development pipelines that currently span Python, CUDA, and systems languages.Read on for our take on MLIR’s trajectory—and then take your first step into Mojo, a language built for the next wave of AI and systems programming.Sign Up |AdvertiseMLIR’s Promise, Pain Points, and the Path ForwardTo use a cliched statement: hardware and software are becoming increasingly diverse and complex. And because modern workloads must run efficiently across this diversity and complexity in form of CPUs, GPUs, TPUs, and custom accelerators, compilers are now critical for both abstraction and performance. MLIRemerged to tame this complexity by enabling multiple layers of abstraction in one framework. MLIR has rapidly grown from a Google research project into an industry-wide technology. After being open-sourced and contributed to LLVM in 2019, MLIR’s modular design attracted a broad community.Today MLIR underpins projects beyond Google’s TensorFlow. For example, it is the foundation of OpenXLA, an open compiler ecosystem co-developed by industry leaders (AMD, Apple, NVIDIA, etc.) to unify ML model deployment on diverse hardware. It’s also inside OpenAI’s Triton (for GPU kernel optimization) and even quantum computing compilers like NVIDIA’s CUDA Quantum (which defines a “Quake” IR on MLIR). In hardware design, the LLVM-affiliated experimental CIRCT project applies MLIR to circuit design and digital logic – so much so that a modern hardware DSL like Chisel moved its back-end to MLIR for richer analysis than standard RTL provides. MLIR’s multi-dialect flexibility has proven useful well beyond machine learning.MLIR has also made inroads into a traditional compiled language. The new LLVM Fortran compiler (Flang) adopted MLIR to represent high-level Fortran IR (FIR), allowing more powerful optimizations than the old approach of jumping straight to LLVM IR. This MLIR-based Flang already achieves performance on par with classic Fortran compilers in many benchmarks (within a few percent of GCC’s Fortran). In fact, in 2024, AMD announced its next-gen Fortran compiler will be based on Flang/MLIR to target AMD GPUs and CPUs in a unified way.However, MLIR’s adoption remains uneven across domains. For example, the LLVM C/C++ frontend (Clang) still uses its traditional monolithic pipeline. There is work in progress on a Clang IR dialect (“CIR”) to eventually bring C/C++ into MLIR, but Clang’s large legacy and stability requirements mean it won’t rewrite itself overnight.MLIR is proving itself in new or specialized compilers (AI, HPC, DSLs) faster than it can retrofit into long-established general-purpose compilers. It is technically capable of being a general compiler framework, but the industry is still in transition.The Hard Gaps – Adoption Challenges in PracticeEngineers may be enthusiastic about MLIR’s potential but also hit real pain points when evaluating it for production. Some key challenges include:Steep learning curve and tooling maturity: The MLIR ecosystem is complex and still maturing, which can intimidate new developers. Ramalho et al., in a 2024 conference paper note that “the MLIR ecosystem has a steep learning curve, which hinders adoption by new developers.” Building a new dialect or pass often means delving into MLIR’s internals (C++ templates, TableGen definitions, etc.) with sparse documentation. In fact, MLIR’s flexibility can be a double-edged sword – there are many moving parts to learn (dialects, ops, attributes, patterns, builders), and patterns are still emerging. Google’s engineers originally writing machine-learning kernels directly in MLIR found it “a productivity challenge”, which led them to create the Mojo language to get a higher-level syntax on top of MLIR. The lack of out-of-the-box IDE support or debugging tools for MLIR IR further adds friction. Adopting MLIR often requires hiring or developing compiler expertise, and that investment can be hard to justify for every team.Integration with Legacy Compiler Stacks: For organizations with existing compilers, taking advantage of MLIR might mean significant refactoring or a total rewrite of the front-end or middle-end. The LLVM community has been careful with Clang for this reason: “Clang also has a legacy to protect, so it is unlikely to fully adopt MLIR quickly.” Instead, they are introducing MLIR gradually via a new CIR dialect for C/C++. Retrofitting MLIR into a mature compiler is expensive and risky because you must maintain feature-parity during the transition. Unless starting a compiler from scratch or facing a dead-end with current tools, it can be hard to justify MLIR’s long-term benefits over short-term upheaval.Dialect Fragmentation and Ecosystem Maturity: One strength of MLIR is its dialect system – you can create domain-specific IR “dialects” and compose them. However, in practice this has led to an explosion of dialects, especially in the AI domain, not all of which are stable or even compatible. As Chris Lattner (MLIR’s co-creator) observed:“Unfortunately, this explosion happened very early in MLIR’s design, and many design decisions in these dialects weren’t ideal for the evolving requirements of GenAI. For example, much of this early work was directed towards improving TensorFlow and building OpenXLA, so these dialects weren’t designed with first-class PyTorch and GenAI support.”The result was that by the time generative AI and PyTorch use cases rose, the upstream MLIR dialects (like linalg or tensor) were not a perfect fit for new workloads. Companies ended up forking or inventing their own dialects (e.g., Google’s StableHLO vs. others), leading to ecosystem fracture. Lattner describes it as an “identity crisis.” Architecturally, it is difficult to determine which dialects to build on or standardize around. On the bright side, the MLIR project recently established a new governance structure and an MLIR area team to improve consistency, but it will take time to harmonize the dialect zoo.Unpredictable Performance in Niche Scenarios: MLIR adds its own layer of transformations and scheduling – if the compiler pipeline isn’t expertly constructed, you might not hit peak performance for a given target. Until more of these optimizations are shared in the community, teams adopting MLIR in new domains might face a period of performance tuning and even uncertainty. (On the flip side, MLIR’s structure can enable new performance tools. For example, Lücke et al. in their CGO 2025 Main Conference paper demonstrate through five case studies that the transform dialect enables precise, safe composition of compiler transformations and allows for straightforward integration with state-of-the-art search methods.)But probably the most practical pain point is day-to-day developer experience. Debugging an MLIR-based compiler can be challenging – error messages often come from deep in the MLIR/LLVM machinery, and stepping through multi-dialect lowering is hard. So, there are challenges and tradeoffs in MLIR adoption at both the organizational and individual levels. But how have these trade-offs played out in the real world: who is successfully using MLIR today, and what did they learn from it?MLIR in the Real WorldDespite the hurdles, some teams have embraced MLIR and demonstrated tangible benefits. Let’s explore four use cases:Fortran & HPC Applications: The LLVM Flang project’s adoption of MLIR is a showcase for using MLIR in a non-ML domain. By inserting MLIR into the compilation flow (via FIR dialects), Flang keeps more high-level semantics available for optimization than the old approach that dropped straight to LLVM IR. This enabled powerful transformations for array operations, loop optimizations, and OpenMP parallelism, all within the MLIR framework. Notably, an MLIR dialect for OpenMP was created so Flang could represent parallel loops in a higher form than just runtime calls. Software engineers at Linaro showed that the new Flang compared favorably with Classic Flang and was not far behind GFortran on benchmarks. Researchers at national labs have run full applications through Flang and confirmed its output is efficient, while also praising the new compiler’s extensibility for future needs. This hints that MLIR can deliver HPC performance while providing a more modern, maintainable codebase. It’s not all rosy – Flang is still catching up on full Fortran 2018 feature support – but it’s a concrete proof that MLIR can anchor a production compiler for a decades-old language. It also drove industry involvement: Fujitsu and ARM are contributing to Flang’s MLIR optimizations, and AMD is aligning its own Fortran compiler with Flang’s MLIR pipeline. For HPC architects, MLIR’s holds potential to unify CPU/GPU optimization (Flang will emit GPU offload code to AMD and NVIDIA through LLVM) and to lower maintenance in the long run by leveraging common infrastructure.SiFive RISC-V Intelligence Products: Hardware startups and AI accelerator teams can adopt MLIR as their compiler toolkit rather than writing everything from scratch. For example, SiFive RISC-V Intelligence Products use Google’s open-source MLIR-based compiler IREE as the core of their ML software stack. SiFive added their own custom dialect (VCIX) to MLIR so that IREE could target SiFive’s vector extensions and custom AI accelerators. This allowed them to lower deep learning models (like LLaMA LLMs) onto RISC-V hardware with relative ease, reusing IREE’s many optimization passes and then adding just the pieces needed for SiFive’s architecture. The result was the ability to run LLMs on RISC-V and get real-time performance – something that would have been immensely difficult without a framework like MLIR.NVIDIA’s CUDA Quantum platform: MLIR can be leveraged to build compilers for quantum computing and other novel processors. NVIDIA’s CUDA Quantum platform uses MLIR under the hood, mapping quantum IR into MLIR’s SSA form (the Quake dialect) and allowing compiler optimizations on quantum circuits. The same infrastructure enabling tensor optimizations can also optimize quantum gate pipelines. For software architects at companies making custom chips (AI or otherwise), MLIR provides a common compiler backbone where you plug in hardware-specific pieces (dialects, cost models) rather than reinventing entire compilers.OpenXLA: On the enterprise side, MLIR is creeping into data centers. OpenXLA, which as noted uses MLIR in components like StableHLO and IREE, has been used in production at companies like DeepMind, Waymo, and Amazon. A Google blog noted that OpenXLA (with MLIR inside) has been used for training AlphaFold, serving large Transformer models in self-driving car systems, and even accelerating Stable Diffusion inference on AMD GPUs. These are real workloads where the MLIR-based compiler achieved better throughput or latency than default frameworks, often by performing advanced optimizations (fusions, layout optimizations, multi-host parallelization) that framework runtimes alone couldn’t.Torch-MLIR: This is an open project to compile PyTorch models via MLIR. While not yet mainstream in PyTorch deployments, it’s gaining traction among researchers trying to optimize PyTorch beyond what TorchScript or Inductor can do. The mere existence of Torch-MLIR underscores the interest in MLIR’s ability to serve as a common IR bridge – here, between the dynamic PyTorch ecosystem and lower-level backends like LLVM, SPIR-V, or custom accelerators.CIRCT in hardware design: companies designing FPGAs and ASICs (e.g., in the FPGA EDA industry) are experimenting with MLIR to replace or augment HDLs. Chisel, a high-level hardware language, now emits MLIR (via CIRCT) instead of a custom IR, allowing use of MLIR’s analysis to optimize hardware generators. This could streamline chip design workflows by enabling cross-optimization of hardware and software models. While still experimental, it’s a real adoption in a traditionally conservative domain (EDA).MLIR’s value multiplies in “greenfield” projects or where incumbents are hitting limits. New hardware with no legacy compiler, new languages (like Mojo, which we will talk about shortly) or AI serving stacks that need every ounce of performance – these are where MLIR has shined. The most effective MLIR deployments often abstract MLIR behind a higher-level interface. Flang hides MLIR behind normal Fortran semantics for end-users; SiFive’s users see an AI runtime API, not MLIR directly; even OpenXLA exposes a compiler API and uses MLIR internally. This suggests a potential best practice to ease adoption: shield developers from MLIR’s complexity via good APIs or DSLs, so they benefit from it without needing to write MLIR from scratch.Mojo & MLIRNo discussion of MLIR in 2025 is complete without Mojo – a new programming language from Modular (a company founded by Chris Lattner and others) that has been making waves. Mojo is essentially a distilled essence of what MLIR can enable in software design. It’s billed as a superset of Python, combining Python’s ease with C++/Rust-like performance. Under the hood, Mojo is built entirely on MLIR – in fact, Mojo’s compiler is an MLIR pipeline specialized for the language. This design choice sheds light on what MLIR brings that classic LLVM IR could not:Multi-level abstraction and optimization: Mojo uses MLIR to represent Python-like high-level features (e.g., list comprehensions, dynamic dispatch) in rich intermediate forms, then progressively lowers them to efficient native code via LLVM dialects—something impractical with LLVM IR alone.Hardware abstraction with performance: By leveraging MLIR dialects for CPUs, GPUs, and TPUs, Mojo can specialize code for diverse hardware while keeping a single high-level language surface, preserving type and shape information longer for deeper optimizations.Seamless Python interoperability: MLIR enables Mojo to handle Python’s dynamic typing and runtime behaviors, compiling only what benefits from optimization while falling back to the Python runtime, allowing a smooth transition from interpreted to compiled execution.Mojo’s success so far validates MLIR’s promised benefits. Within a few months of Mojo’s preview release, the Modular team itself used Mojo to write all the high-performance kernels in their AI engine. Like we mentioned earlier, Mojo was born because writing those kernels in pure MLIR was too slow – by creating a high-level language that compiles via MLIR, the Modular team combined productivity with performance.Figure 1.1: “Mojo is built on top of MLIR, which makes it uniquely powerful when writing systems-level code for AI workloads.” (Source: Modular Blog)Mojo’s compile-time cost is mitigated by MLIR’s design as well – parallelizing and caching in the compiler are easier with MLIR’s explicit pass pipeline, so Mojo can afford to do more heavy analysis without long build times. The language is still young, but it shines a promising light on what’s possible.(As an aside for readers, Mojo’s use of MLIR is a deep topic on its own. In Building with Mojo (Part 1): A Language Born for AI and Systems, Ivo introduces Mojo’s origins, design goals, and its promise to unify Pythonic ergonomics with AI-scale performance—but only at a high level. Later parts of the series will go deeper into Mojo’s internals, including how MLIR enables compile-time metaprogramming, hardware-specific optimizations, and seamless Python interoperability. To receive these articles in your inbox as soon as they are published, subscribe here)Wrapping UpMLIR’s trajectory over the past year shows cautious but real momentum toward broader adoption. The community has addressed key pain points like dialect fragmentation with new governance and curated core dialects, while new tooling—such as the Transform dialect presented at CGO 2025—lowers the barrier for tuning compiler optimizations. Proposed additions like a WebAssembly dialect and Clang CIR integration suggest MLIR is expanding beyond its “ML-only” roots into systems compilers and web domains. Industry trends reinforce its relevance: heterogeneous compute continues to grow, and MLIR already underpins projects like OpenXLA with backing from NVIDIA, AMD, Intel, Apple, and AWS. Still, its success depends on balancing generality with usability and proving its value beyond Google and Modular; competing approaches like SPIR‑V and TVM remain viable alternatives. Yet with advocates like Chris Lattner, ongoing research from firms like Meta and DeepMind, and AMD and Fujitsu adopting MLIR for HPC compilers, it’s likely to become a cornerstone of future compiler infrastructure if it maintains this pace.Read the Article🛠️Tool of the Week⚒️IREE – MLIR-Based Compiler & RuntimeIntermediate Representation Execution Environment (IREE) is an open-source end-to-end compiler and runtime for machine learning models, built on MLIR. In the OpenXLA ecosystem, IREE serves as a modular MLIR-based compiler toolchain that can lower models from all major frameworks (TensorFlow, PyTorch, JAX, ONNX, etc.) into highly optimized executables for a wide variety of hardware targets.Highlights:Broad Framework & Hardware Support: IREE can import models from multiple frontends (TensorFlow, PyTorch, JAX, ONNX, TFLite, etc.) and target nearly any platform – from x86 or Arm servers to mobile GPUs, DSPs, and custom NPUs.Intuitive Tooling & Integration: IREE provides a command-line compiler tool (iree-compile) and libraries that are straightforward to use. Models are compiled ahead-of-time into an efficient binary format, and runtime APIs are available in C and multiple languages (with language bindings) to easily load and execute the compiled models in your application. The tool comes with clear documentation and examples on its official site.Debugging & Profiling Support: Unlike many experimental compilers, IREE doesn’t treat the compiled model as a black box – it includes developer-friendly features like IR inspection, logging flags, and integration with MLIR’s debugging tools. There are guides for debugging model issues and profiling performance (e.g., integration with CPU/GPU profilers and the Tracy profiler).Active Community & Extensibility: Because IREE is built on MLIR, it is highly extensible – you can author custom MLIR dialects or passes and plug them into IREE’s pipeline if you have domain-specific optimizations or new hardware. The project’s community (spanning industry and academia) is very active, offering support and continuously adding features.Learn more about IREE📰 Tech BriefsWAMI: Compilation to WebAssembly through MLIR without Losing Abstraction by Kang et al. from Carnegie Mellon University and Yale University: Introduces a new MLIR-based compilation pipeline that preserves high-level abstractions by adding Wasm-specific MLIR dialects, enabling direct, modular generation of WebAssembly code with better support for evolving Wasm features and comparable performance to LLVM-based compilers.2025 AsiaLLVM - Sanitizing MLIR Programs with Runtime Operation Verification by Matthias Springer: Introduces MLIR's new runtime operation verification interface, which enables dynamic checks for undefined behavior—complementing static verification, improving debugging, and supporting tools like memory leak sanitizers, though with trade-offs in runtime overhead and adoption maturity.Leveraging the MLIR infrastructure for the computing continuum by Bi et al. presented at the CPSW’24: CPS Workshop: This WIP paper presents a node-level compiler and deployment framework built on MLIR for the MYRTUS project, targeting heterogeneous computing across the cloud-edge continuum by extending dataflow dialects, optimizing for CGRAs and FPGAs, and enabling adaptive execution with tools like Mocasin and CIRCT.Precise control of compilers: a practical approach to principled optimization | Doctoral thesis by Martin Paul Lücke, The University of Edinburgh: Demonstrates how integrating principled program representations like Rise and flexible transformation control mechanisms such as the Transform dialect and Elevate into MLIR enables production compilers to achieve systematic, verifiable optimizations while giving developers fine-grained control over complex optimization strategies.2025 AsiaLLVM - Data-Tiling in IREE: Achieving High Performance Through Compiler Design by Han-Chung Wang: Explains how the IREE MLIR-based compiler uses tensor encodings and progressive lowering to optimize data layout, memory access, and instruction scheduling across CPUs and GPUs, enabling efficient, retargetable compilation for heterogeneous hardware.That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
10 Jul 2025
Save for later

Deep Engineering #8: Gabriel Baptista and Francesco Abbruzzese on Architecting Resilience with DevOps

Divya Anne Selvaraj
10 Jul 2025
How culture, CI/CD, and cloud-native thinking enable speed without sacrificing stability#8Gabriel Baptista and Francesco Abbruzzese on Architecting Resilience with DevOpsHow culture, CI/CD, and cloud-native thinking enable speed without sacrificing stabilityHi Welcome to the eighth issue of Deep Engineering.The 2024 DORA Accelerate State of DevOps report, published last October, showed a sharp divide in engineering performance: just 19% of teams qualify as elite, while 25% lag far behind. The difference lies in how teams approach DevOps—not as tooling, but as a foundation for resilient architecture and continuous delivery.To understand what this means for software architects today, we spoke with Gabriel Baptista and Francesco Abbruzzese, authors of Software Architecture with C# 12 and .NET 8. Baptista is an Azure Platform-as-a-Service (PaaS) specialist, university instructor, and advisor to early-stage startups. Abbruzzese is the creator of the MVC and Blazor Controls Toolkits and has worked across domains—from early AI decision support systems in finance to top 10 video game titles.Together, they emphasize that DevOps is foundational to architectural resilience. “Applications can no longer afford downtime,” Baptista tells us. Abbruzzese adds: “DevOps is designed specifically to align technical outcomes with business goals.”You can watch the full interview and read the transcript here—or scroll down for our take on how resilient delivery is being redefined at the intersection of DevOps, AI, and cloud.Sign Up |AdvertiseThe conference to learn, apply, and improve your craftdev2next is the premier conference designed for software developers, architects, technology leaders, development managers, and directors. Explore cutting-edge strategies, tools, and essential practices for building powerful applications using the latest trends and good practices.When: September 29 - October 2, 2025Where: Colorado Springs, COUse the discount code PACKT-DEV2NEXT to get a $50 discountBuy Conference and Workshop TicketsWhy DevOps Is Key to Architecting Resilience in the Age of AI and the Shift to Cloud-Native with Gabriel Baptista and Francesco AbbruzzeseThe 2024 DORA report showed that only 19% of teams qualify as elite performers and they:•Deploy multiple times per day• Have a lead time of under 1 day• Have a change failure rate of around 5%• Have a recovery time of under 1 hourIn contrast, 25% of teams sit in the lowest performance tier, deploying as infrequently as once every six months, with failure rates nearing 40% and recovery times stretching to a month.Teams that perform well on DORA’s Four Keys metrics also exhibit greater adaptability and lower burnout. These findings cut across industries and geographies.The architecture landscape, meanwhile, has shifted. AI-assisted development is now widespread (75.9% of developers use AI tools for at least one task), but DORA found that higher AI usage correlates with a 1.5% drop in throughput and a 7.2% decline in release stability. Similarly, platform engineering is nearly ubiquitous (89% adoption) yet often introduces latency and fragility if developer autonomy is not preserved.But as Baptista states:“Applications can no longer afford downtime, especially enterprise applications that need to run 24/7.To achieve that, you need to write good code—code that provides visibility into what’s happening, that integrates with retries, that enables better performance. A software architect has to consider these things from the very beginning—right when they start analyzing the application’s requirements.”But how can architects ensure this level of stability and resilience that meets business needs? According to Abbruzzese:“The best tool for this is DevOps. DevOps is designed specifically to align technical outcomes with business goals.”Microsoft defines DevOps as:“The integration of development, quality assurance, and IT operations into a unified culture and set of processes for delivering software.”In their book, Baptista and Abbruzzese concur:“Although many people define DevOps as a process, the more you work with it, the better you understand it as a philosophy.”DevOps as Philosophy and Culture: Not just a ToolConsidering DevOps as a philosophy, the authors say, helps architects focus on service design thinking which means:“Keeping in mind that the software you design is a service offered to an organization or part of an organization. …The highest priority (for you as an architect) is the value your software gives to the target organization. … you are not just offering working code and an agreement to fix bugs but also a solution for all the needs that your software was conceived for. In other words, your job includes everything …to satisfy those needs, such as monitoring users’ satisfaction and quickly adapting the software when the user needs change, due to issues or new requirements.”To explain what the role of the software architect is in this context, they add:“DevOps is a term that is derived from the combination of the wordsDevelopmentandOperations, and the DevOps process simply unifies actions in these two areas. However, when you start to study a little bit more about it, you will realize that just connecting these two areas is not enough to achieve the true goals of this philosophy.We can also say that DevOps is the process that answers the current needs of people regarding software delivery.Donovan Brown has a spectacular definition of what DevOps is:DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.A way to deliver value continuously to our end users, using processes, people, and products: this is the best description of the DevOps philosophy. We need to develop and deliver customer-oriented software. …your task as a software architect is to present the technology that will facilitate the process of delivery.”DORA’s recommendation to “Be relentlessly user-centric” also supports this framing of DevOps:“teams that have a deep desire to understand and align to their users’ needs and the mechanisms to collect, track, and respond to user feedback have the highest levels of organizational performance. In fact, organizations can be successful even without high levels of software velocity and stability, as long as they are user focused.… Teams that focus on the user make better products. Not only do products improve, but employees are more satisfied with their jobs and less likely to experience burnout.Fast, stable software delivery enables organizations more frequent opportunities to experiment and learn. Ideally, these experiments and iterations are based on user feedback. Fast and stable software delivery allows you to experiment, better understand user needs, and quickly respond if those needs are not being met.”Experienced Amazon technologist and Principal Resilience Architect at Arpio, Seth Eliot, enriches the framing of DevOps as a culture saying that DevOps must not be seen as a toolchain or process stack, but as a shift in mindset—one rooted in ownership, autonomy, and tight integration between historically siloed roles.The canonical DevOps problem, he says, is:“The wall that traditionally has existed between development and operations,” a wall that “prevented these roles... from having shared goals.”He urges architects to remember that:“DevOps is all about culture and the tools are based on that. The tools come after that.”If you are wondering how such a culture can be fostered, author of The Phoenix Project and The DevOps Handbook, Gene Kim’s "Three Ways" framework offers foundational principles:The First Way-Flow/Systems Thinking: This principle focuses on optimizing the entire system rather than individual silos. It stresses the importance of ensuring that value flows smoothly from development to IT operations, with the goal of preventing defects from passing downstream. The emphasis is on improving flow across all business value streams and gaining a deep understanding of the system to avoid local optimizations that could cause global degradation.The Second Way-Amplify Feedback Loops: The Second Way focuses on shortening and amplifying feedback loops throughout the process. Continuous, real-time feedback is essential to make quick corrections, improve customer satisfaction, and embed knowledge where it's most needed. The principle encourages responding to feedback, both from customers and internal stakeholders, to enhance the development and operations process.The Third Way-Culture of Continual Experimentation and Learning: This principle advocates for creating a culture that encourages experimentation, risk-taking, and learning from failure. It emphasizes the importance of mastering skills through repetition and practice, as well as introducing faults into the system deliberately to enhance resilience. Continuous improvement is fostered by allocating time for work improvements and creating rituals that reward taking risks.These principles also emphasize a cultural shift toward continuous improvement which naturally supports the adoption of engineering practices that enable resilience and stability in software delivery.How DevOps Practices Enable Resilience by DesignGoing back to the 2024 DORA report, elite teams achieve both speed and stability through five disciplined engineering practices:Small batch developmentAutomated testingTrunk-based workflowsContinuous integration (CI)Real-time monitoringBaptista and Abbruzzese demonstrate these using a case study (the WWTravelClub platform) showing how to implement multi-stage pipelines, enforce quality gates, and build visibility into the delivery process. Here is a breakdown:Small Batch Changes and Change Isolation: DORA states that “small batch sizes and robust testing mechanisms” are essential for high performance. Baptista and Abbruzzese, echo this by warning about the risks of incomplete features and unstable merges. They advise using feature flags and pull requests to ensure that “only entire features will appear to your end users.” Pull requests enable peer review, while flags control feature exposure at runtime, both critical for keeping systems stable in a CD environment.Trunk-Based Workflows and Pull Request Gates: Trunk-based development refers to a workflow where all developers commit to a shared mainline branch. But gatekeeping is essential to ensure the safety of this workflow. Baptista and Abbruzzese show how to integrate pull request reviews and automated build validations to ensure code quality. They recommend:Static analysisPre-merge testsConsistent peer review.They also note that many teams believe they’ve implemented CI/CD simply by enabling a build pipeline, “but this is far from saying you have CI/CD available in your solution.”Baptista adds that adopting DevSecOps practices can further strengthen the review process.“Instead of just implementing DevOps, why not implement DevSecOps? With DevSecOps, you can include static analysis tools that identify security issues early. These tools help architects and senior developers review the code produced by the team and ensure security practices are being followed.”Multi-Stage Pipelines and Controlled Deployments: DORA’s report cautions that indiscriminate deployment automation, especially with large change sets, often leads to stability regressions. Baptista and Abbruzzese demonstrate a multi-stage pipeline structure to reduce deployment risk:Development/TestingStagingProductionThey note that “you need to create a process and a pipeline that guarantees that only good and approved software (reaches) the production stage.Automated Testing as Early Fault Detection: DORA finds that elite teams use automation not for speed alone, but for stability. Baptista and Abbruzzese emphasize the importance of unit and functional tests integrated into CI pipelines. Failing tests prevent bad commits from reaching staging, and the presence of automated feedback loops improves developer confidence.Continuous Feedback and Observability: DORA claims that resilience is enhanced not only by automation, but by visibility and rapid iteration. Baptista and Abbruzzese recommend integrating Application Insights and Microsoft’s Test and Feedback browser extension to close the feedback loop, capture live production behavior, and turn user feedback into structured work items.When these practices are implemented with discipline, architecture becomes both adaptable and durable. As Abbruzzese puts it, “You don’t have to rewrite large portions of the application.” You just need to be able to change the right parts, safely, at speed.Balancing AI-Driven Velocity and Resilience in DevOpsThe 2024 DORA report confirms that roughly 75% of teams now use AI tools daily – with over one-third of respondents reporting “moderate” to “extreme” productivity gains from AI-assisted coding. Higher AI adoption correlates with modest improvements (e.g. ~7.5% better documentation, 3.4% better code quality) and faster code review times. However, these gains come with trade-offs: AI use was accompanied by an estimated –1.5% delivery throughput and –7.2% change stability.“Improving the development process does not automatically improve software delivery” To balance the trade-offs between AI’s benefits and challenges, DORA makes three recommendations:Focus AI on empowering developers (e.g. automating boilerplate and documentation) rather than blindly pushing codeEstablish clear guidelines and feedback loops for AI use, fostering open discussion about when AI help is appropriateAllocate dedicated exploration time so teams can build trust in AI tools (rather than hastily deploying suggestions)Baptista talking about how AI will impact architects says:“As architects, we’ll be impacted by AI—positively or negatively—depending on how we work with it. Let me give two examples. Today, it's possible to upload an architecture diagram into an AI tool like ChatGPT and discuss with it whether you’re creating a good or bad design. That’s already possible. In some cases, I’ve used AI to give me feedback or suggest changes to my designs. It can do that.But as a software architect, you still need to be a good analyst. You need to evaluate whether the output from the AI is actually correct. Especially in enterprise systems, that’s not always easy to do. So, yes, AI will change the world, but we—as individuals—need to use our intelligence to critically analyze whether the AI output is good or not, in any context.”Regarding how software architects can prepare for this impact he says:“We, as architects, need to understand that a good AI solution first requires a good software architecture—because AI only works with good data. Without good data, you cannot have a good AI.As software architects, we need to understand that we have to build architectures that will support good AI—because if you don’t provide quality data, you won’t get quality AI.”Abbruzzese adds:“I think AI is a valuable tool, but at least for now, it can’t completely replace the experience of a professional.It helps save time. It can suggest choices—but sometimes those suggestions are wrong. Other times, those suggestions can be useful as a starting point for further investigation. AI can write some code, some diagrams, some designs that you might use as a base. That saves time.Sometimes it suggests something you hadn’t thought of, or reminds you of a possibility you forgot to consider. That doesn’t mean it’s the best solution—but it’s a helpful suggestion. It’s a tool to avoid missing things and to save time.At the moment, AI can’t replace the experience of a real professional—whether it’s an architect, a programmer, or someone else. For instance, I’ve never seen AI come up with a completely new algorithm. If you have to invent a new one, it’s not capable of doing that.…And I think this won’t change much over time—at least not until we reach actual artificial general intelligence, something human-like.”But AI is not the only shift architects need to prepare for. Baptista states:“In the near future, I believe most applications will be cloud-native… This is something that everyone working in software development today needs to think about.”The Shift to Cloud-NativeMaking the case for the shift to cloud-native architecture, Baptista says:“We’re discovering new ways to build solutions every single day. We can’t always keep up that same pace on the architecture side, which is why we need to think carefully about how to design a software architecture that can be both adaptable and resilient.”DORA’s report also identifies a link between team success and leveraging flexible architecture:“We see that successful teams are more likely to take advantage of flexible infrastructure than less successful teams.”Abbruzzese adds:“In my opinion, it’s quite simple—cloud computing basically means distributed computing, with added adaptability. It allows you to change your hardware dynamically.Cloud computing is really about reliability and adaptability. But you have to use the right architecture—that means applying the theory behind modern microservices and cloud-native systems.I’m talking about reliable communication, orchestrators like Kubernetes, and automatic scaling—these are all provided by cloud platforms and also by Kubernetes itself. You also have tools for collecting metrics and adjusting the software’s behavior automatically based on those metrics. This is the essence of the theory we’re dealing with.For example, in microservices architectures, reliable communication is essential. These applications are often structured like assembly lines—processing and transferring data step by step. That means it’s unacceptable to lose data. Communication must at least eventually succeed. It can be delayed, but it has to succeed.”However, it is important to ascertain first whether you and your team are willing to invest fully in the shift. Else this can lead to decreased organizational performance. DORA warns:“Cloud enables infrastructure flexibility. Flexible infrastructure can increase organizational performance. However, moving to the cloud without adopting the flexibility that cloud has to offer may be more harmful than remaining in the data center. Transforming approaches, processes, and technologies is required for a successful migration.”This is because, very much in line with adopting DevOps as a culture, accomplishing the shift to cloud-native does not simply require:"Tools or technologies, but often an entire new paradigm in designing, building, deploying, and running applications.”DORA recommends:“Making large-scale changes (because they are) easier when starting with a small number of services, (and)… an iterative approach that helps teams and organizations to learn and improve as they move forward.”The role of the architect when it comes to navigating these shifts is more crucial than ever. And the adoption of DevOps as a culture along with DevOps engineering best practices can serve both businesses and architects well so they can better serve business needs and remain relevant.If Baptista and Abbruzzese’s perspective on DevOps as a foundation for resilient architecture resonated with you, their book Software Architecture with C# 12 and .NET 8 goes further—connecting high-level design principles with hands-on implementation across the .NET stack.The following excerpt—Chapter 8: Understanding DevOps Principles and CI/CD—breaks down the architectural mindset behind effective delivery pipelines. It covers core DevOps concepts, the role of CI/CD in aligning code with business needs, and how to embed quality, automation, and visibility throughout the deployment process.🧠Expert Insight: Understanding DevOps Principles and CI/CD by Gabriel Baptista and Francesco AbbruzzeseThe complete “Chapter 8: Understanding DevOps Principles and CI/CD” from the book Software Architecture with C# 12 and .NET 8 by Gabriel Baptista and Francesco Abbruzzese (Packt, February 2024).Although many people define DevOps as a process, the more you work with it, the better you understand it as a philosophy. This chapter will cover the main concepts, principles, and tools you need to develop and deliver your software with DevOps....The following topics will be covered in this chapter:Understanding DevOps principles: CI, CD, and continuous feedbackUnderstanding how to implement DevOps using Azure DevOps and GitHubUnderstanding the risks and challenges when using CI/CDRead the Complete ChapterNow in its fourth edition, Software Architecture with C# 12 and .NET 8, combines design fundamentals with hands-on .NET practices, covering everything from EF Core and DevOps pipelines to Blazor, OpenTelemetry, and a complete case study centered on a real-world travel agency system.Use code DEEPENGINEER for an exclusive subscriber discount—20% off print and 30% off eBook, valid until 17th July, 2025.Get the Book🛠️Tool of the Week⚒️OpenTofu — Terraform-Compatible Cloud Infrastructure-as-CodeOpenTofu is an open-source, community-driven IaC tool, designed as a fork of Terraform. It enables teams to define, manage, and deploy cloud infrastructure declaratively, ensuring reproducibility and resilience.Highlights:Multi-Cloud IaC: Works across AWS, Azure, GCP, and Kubernetes, ensuring consistent, reproducible environments.Modular & Scalable: Reusable modules simplify complex architectures like multi-region deployments.Collaborative: Infrastructure changes are versioned and peer-reviewed, ensuring alignment with business goals.Fault-Tolerant: Optimized for error handling, ensuring stable infrastructure changes.Active Development: Recently updated (v1.10.0 in June 2025) and hosted under the Linux Foundation, ensuring long-term stability.Learn more about OpenTofuSponsored:Agentic AI You Don’t Have to BabysitIf you’ve been burned by clunky GenAI tools that need constant handholding, this will feel different.Shield’s AmplifAI is using agentic AI to go beyond simple prompts. Think: intelligent agents that can reason, plan, and actually do things, like reviewing communication threads or spotting compliance risks, without you manually clicking through 30 tabs.Whether you’re building fintech, regtech, or just tired of reactive workflows, AmplifAI gives you a head start. It’s already making life easier for devs.👉 See how AmplifAI is redefining what AI can actually do📰 Tech Briefs🎥DevOps in the Cloud: Case Studies of Amazon.com teams and their resilient architectures - Seth Eliot: Introduces the concept of DevOps in the cloud, using examples from Eliot’s experience at Amazon, where teams—organized into "two pizza teams"—take ownership of services from design to deployment, emphasizing the importance of culture over tools, and how frameworks enable resilience through continuous experimentation, chaos engineering, and disaster recovery solutions.Why Is My Docker Image So Big? A Deep Dive with ‘dive’ to Find the Bloat by Chirag Agrawal, Senior Software Engineer | Alexa+ AI Agent Engineering: Explores how to diagnose and reduce Docker image bloat using tools like docker history and dive to pinpoint inefficiencies in image layers, with a special emphasis on AI Docker images that often become oversized due to large AI libraries and base OS components.Azure AI Foundry Agent Service Gains Model Context Protocol (MCP) Support in Preview: Microsoft has announced the preview release of MCP support in its Azure AI Foundry Agent Service, enhancing interoperability for AI agents by simplifying integration with internal services and external APIs, and enabling seamless access to tools and resources from any compliant MCP server.The DevOps engineer’s handbook: Covers key DevOps practices, including automation, continuous integration, continuous delivery, and the importance of evolving team culture to improve software delivery and collaboration across the development lifecycle.That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
03 Jul 2025
Save for later

Deep Engineering #7: Managing Software Teams in the Post-AI Era with Fabrizio Romano

Divya Anne Selvaraj
03 Jul 2025
How to decide whether you want to move into management and lead without losing touch#7Managing Software Teams in the Post-AI Era with Fabrizio RomanoFrom lean organizations and AI tools to Gen Z teams, the software team manager’s job has changed. Here’s how to lead without losing touch and decide whether you want to move into management.Workshop: Unpack OWASP Top 10 LLMs with SnykJoin Snyk and OWASP Leader Vandana Verma Sehgal on Tuesday, July 15 at 11:00AM ET for a live session covering:✓ The top LLM vulnerabilities✓ Proven best practices for securing AI-generated code✓ Snyk’s AI-powered tools automate and scale secure dev.See live demos plus earn 1 CPE credit!Register todayHi Welcome to the seventh issue of Deep Engineering.The software manager’s role is being remade—less by choice than by necessity. The old playbook, where managers translated product priorities into sprints and stayed one layer removed from the code, no longer holds. In 2025, development managers are navigating leaner organizations, AI-assisted teams, hybrid work models, and a workforce increasingly shaped by Gen Z expectations.To understand this shift and glean best practices, we spoke with Fabrizio Romano, author of Learn Python Programming and development manager at Sohonet. We also examine what the transition from senior engineer to manager really entails—and how to know if that’s the right move for you. Throughout, we draw on Romano’s experience, alongside insights from other engineering leaders like Gergely Orosz, Mirek Stanek, Nick Centino, and Vladimir Klepov, to unpack the changing traits, tensions, and tradeoffs of modern development management.You can watch Romano’s complete interview which covers both his experiences with Python and as an engineering manager and read the transcript here, or read on for an engineering management focussed deep dive.Sign Up |AdvertiseLeading Software Teams in Changing Times with Fabrizio RomanoWhile a desire to nurture growth in others is crucial to success in management, the evolving landscape of software development today brings a set of external challenges that shape how development managers must lead. As Romano suggests, becoming a development manager isn’t just about mastering technical skills, but about understanding and adapting to the broader trends reshaping the industry—particularly in a post-AI world. The role has become more complex and dynamic than ever, influenced by forces like leaner organizations and teams, more millennials and Gen Zs in the workforce, remote-first work, and AI-powered development tools, and an increasing focus on efficiency over expansion. These shifts have led to new expectations for managers, testing their ability to balance people development with technical leadership.The Current State of Development ManagementThe post-COVID world is seeing significant changes in how development teams are structured, with many organizations flattening their hierarchies to reduce layers of management. This shift to leaner teams, combined with the increasing use of AI tools like GitHub Copilot, Cursor, and others, has led to new challenges for development managers.Leaner OrganizationsAs Mirek Stanek, PL Engineering Site Lead at Papaya Global points out, one of the most profound changes in development management is the trend towards fewer managers and a greater emphasis on individual contributors (ICs). In organizations where budget cuts and performance metrics dominate, managers are now expected to maximize the productivity of their teams with fewer resources. This is in line with Amazon's directive shared in a letter from CEO, Andy Jassy, to employees in September 2024, to increase the ratio of ICs to managers by 15% by Q1 2025. This shift reflects a broader trend where leadership roles are being scrutinized more heavily, and managers must justify their position by demonstrating tangible value to the organization.The hands-on expectations of development managers have therefore increased. In previous decades, a manager could expect to focus on strategy, vision, and team alignment, while ICs handled the bulk of coding tasks. Today, however, many engineering managers (Ems) are expected to stay deeply involved in the technical aspects of development. As Vladimir Klepov, EM at Ozon Bank, discusses in his reflections, a manager who is disconnected from the technical work risks losing touch with the challenges their team faces on the ground. Therefore, hands-on leadership—being embedded in the development process—is now a critical competency for effective development managers.Managing Gen-Z, Millennials, and the New Workforce ExpectationsAnother change reshaping development management is the increasing presence of Gen-Z and Millenials in the workforce. According to Elizabeth Faber, Deloitte Global Chief People & Purpose Officer,“Projected to make up roughly two-thirds of the labor force within the next few years, Gen Zs and millennials are likely to be a defining force in the future of work—one that looks less like a ladder and more like an interconnected web of growth, values, and reinvention.”Stanek also points out how Gen-Z values work-life balance, professional growth opportunities, and authentic leadership.Concluding from the 14th Deloitte Global Gen Z and Millennial Survey, Faber writes that, for Gen Z and millennial workers to feel truly supported and fulfilled, managers must be empowered to support employee well-being by:Addressing team stressorsPromoting work/life balanceRecognizing contributionsEnabling growthFacilitating access to mental health resourcesFor development managers, this means adapting leadership styles to align with these expectations. Managers must be more emotionally intelligent, open to feedback, and flexible in how they structure their teams.This also reflects the broader trend of remote and hybrid work models. While some companies, like Amazon, are pushing for a return to the office, many development managers will need to navigate the challenges of managing a distributed, remote-first workforce while ensuring cohesion and a sense of purpose within their teams.Working with Distributed and Diverse TeamsManaging teams split across cities or continents adds its own set of challenges – and opportunities. Stanek writes,“The pandemic showed us how teams can function effectively remotely, but it also highlighted the limitations of remote work, such as the lack of nonverbal communication cues and the blurring of work-life boundaries.”Nataliia Peterheria, Operations Manager, Django Stars, recommends the following practices to overcome dissonance in remote and hybrid team setups:Choose one primary communication channel (e.g., Slack, Google Hangouts) and stick to it to reduce information loss. Supplement with one or two backups only when necessary. Every team member should maintain a complete profile—with a real photo, job title, contact number, and bio—so others can quickly understand roles and reach out when needed.Set up a single “source of truth” for documentation and decisions—like Confluence or a shared wiki—structured simply (no more than three nested levels). Keep specs, requirements, and changes in one place, and annotate directly on the relevant topic pages to avoid fragmentation.Create a structured work schedule with overlapping hours for live collaboration. Use this shared window for time-sensitive interactions like team calls or joint problem-solving. Schedule overlapping meetings in advance, prioritize ruthlessly, and stay consistent to avoid drifting into 24/7 work mode.Use daily checklists to track questions, progress, and blockers. Organize them by project and link them to your source of truth or project repo. Checklists help ensure timely answers and keep asynchronous work from stalling.Standardize request communication to prevent missed inputs. Assign a single person (often the PM) to collect product owner requests, or reserve regular meeting slots to introduce new requirements to the full team.Require approval for all logic changes or scope updates, no matter how minor. Even well-intentioned “improvements” by developers must be signed off by business stakeholders to prevent misalignment or scope creep.Define escalation paths clearly. Publish a diagram showing who is responsible for what and who to contact when something goes wrong. Team members should know exactly how to escalate unresolved issues—internally or with the client.Align on a common task tracking and documentation toolset before kickoff. Avoid fragmented tracking (e.g., team members using their own spreadsheets). Centralize around one system, even if it means switching from a personal favorite.Codify remote technical workflows. Set clear guidelines for pull request handling, commit hygiene, and review expectations. Include code style guides to prevent inconsistency and ensure maintainability when multiple people contribute to the same codebase.Assess technical readiness before the project starts. Identify gaps in tooling knowledge, run onboarding sessions where needed, and provide up-to-date guides for any systems that require self-service support.In addition to these, there is the human side to management. Romano describes watching body language and Slack message tones for signs of stress in his team. If a developer seems off or tensions are brewing, he takes time to talk one-on-one and understand the issue. In some cases, he even teaches simple meditation or mindfulness techniques to help his engineers re-center under pressure. “When you’re upset, frustrated, or angry… it triggers a fight-or-flight response… If you keep stimulating that state… it becomes a health risk,” he explains, drawing from his experience in martial arts that a “relaxed mind is a creative mind.” By coaching his team in emotional intelligence and stress management, he not only cares for their well-being but also ensures they stay productive and collaborative. This kind of empathetic leadership – once rare in engineering circles – is increasingly recognized as key to maintaining high-performing teams.AI Tools: A Double-Edged Sword for Development ManagersIn addition to managing shifting workforce dynamics, AI is becoming an integral tool for development teams. AI-driven tools like GitHub Copilot are no longer just productivity boosters but are changing how software is developed at a fundamental level. For example, Gergely Orosz, author of The Software Engineer’s Guidebook, in The Pragmatic Engineer reports that,“90% of the code for Claude Code is written by Claude Code(!).”The rise of AI coding assistants and automation is one of the defining trends reshaping development management. Tools like GitHub Copilot, ChatGPT, and other AI pair programmers are rapidly becoming part of daily software engineering workflows.Gitlab’s 2024 Global DevSecOps Report found that 39% of software professionals are already using AI in development, up 16 percentage points from the year prior. Moreover, 60% say implementing AI is now essential to avoid falling behind competitively.Development managers now face the challenge of integrating AI effectively into their team's workflow while also ensuring that these tools don’t hinder creativity or lead to over-reliance.“We have to use AI. I think a developer who refuses to embrace AI today is probably going to be obsolete very soon,” says Romano, underscoring the urgency of adaptation. He adds: “At Sohonet, in my role, I got everyone on my team set up with GitHub Copilot. I wanted them to start using it, get familiar with it, and understand how to leverage what it can offer.”By equipping his engineers with Copilot, he aimed to help them embrace AI-assisted development rather than fear it. Romano notes,“Copilot is especially helpful for menial or repetitive tasks—like hardcoding different test cases. It’s really good at predicting what the next test case might be.” “Even when it's just acting like a better IntelliSense, it’s still useful… instead of rewriting a line yourself, you just hit Tab and it’s done,” Romano saysFor development managers, the benefit of such tools is twofold: they boost team productivity and free up human developers for more complex, creative work.According to Infragistics’ Reveal 2024 survey report, the top reasons developers leverage generative AI are to increase productivity (49%), eliminate repetitive tasks (38%), and speed up development cycles (36%).Managers who proactively introduce approved AI tools can thus accelerate output and improve developer satisfaction. Romano mentions that his team continually experiments with new AI aides (from code editors like Cursor to AI pair-programming prototypes) to stay on the cutting edge. This reflects a broader best practice: staying up to date with emerging tools and evaluating their potential.However, Romano also points out that over-relying on AI tools can stunt problem-solving skills, as developers might bypass critical thinking or creative solutions in favor of quick, AI-generated responses. 55% of Gitlab’s survey respondents also felt that introducing AI into the software development lifecycle is risky.Effective development management in the AI era means finding a balance between leveraging AI and honing human skill. Romano emphasizes that developers shouldn’t offload all problem-solving to machines:“Part of the job… was to smash my brain against a problem now and then. That’s really beneficial for your thinking… It keeps your mental muscles in shape.” “Relying too much on AI to… figure out the next step… that’s risky. I still want to ‘go to the gym’ up here,” he quips, referring to exercising one’s own mental faculties. Romano encourages each developer to “find the right balance—using AI as a tool, but still keeping their minds fit and challenged.”This balanced approach ensures that while AI accelerates routine coding, it doesn’t “dumb down” the team’s critical thinking. “If you stop challenging the [AI’s] recommendations, they run the risk of dumbing down the reasoning. The true risk is in placing naive faith in quick fixes,” cautions Sammi Li, co-founder and CEO of JuCoin, noting that AI can expedite work but must not replace understanding. It falls on the EM to ensure this balance is maintained both for the team’s and the business’ benefit.What the Shift to Engineering Management Really Looks LikeThe move from senior engineer to EM is often misunderstood—frequently treated as a natural promotion rather than a deliberate change in function. But this is not a bigger version of the same job. It’s a transition into a fundamentally different role, with a new definition of success and a new center of gravity. Here is what development and EMs say about their shift from development to management felt like.You stop being measured by what you ship. Engineers derive a tangible sense of accomplishment from writing code and seeing it run in production. That feedback loop is fast and direct. Management breaks that loop. “As an EM, you’re not the one building the things,” says Nick Centino, Principal Engineering Manager at Microsoft. “You’re helping empower others to build the things more effectively”. This shift—away from hands-on output and toward enabling others—can take years to internalize. Centino himself spent nearly eight years in a dual role before realizing his highest leverage was no longer in the code.You have to redefine what ‘impact’ means. Orosz writes: “As an engineering manager, you’ll need to put company first, team second, and your team members third. And I would also add: yourself as fourth”. That’s a reversal from the individual contributor mindset, where engineers focus on executing their own tasks and helping teammates as needed. The EM role requires strategic alignment across teams—not just personal productivity.You stop optimizing for technical challenges. Engineers advance by solving complex problems. Managers progress by preventing them. As Klepov writes, “Of all the possible career moves a seasoned engineer can make, switching to management gives you the most new challenges...without hitting your salary”. But these challenges are rarely technical. They involve process alignment, team dynamics, emotional management, and cross-functional friction. As Romano puts it: “Most of what we do is fairly routine...The real challenges lie in everything around the code”.Your working memory breaks down. Many new managers underestimate the cognitive overhead of managing a team. Orosz notes that while ICs can often track all their tasks in their head, managers can't: “As a manager, I have far more things to pay attention to…Keeping all of this in my head doesn’t work well for me—so I’ve started to write things down”. Time and task management become not just useful, but essential.You spend less time writing code—and often none at all. The drop is not optional; it’s structural. According to Centino, once you manage five or six people, meaningful individual contribution becomes unsustainable without either cutting corners or burning out. Even if you retain technical context, your job is no longer to build—it’s to coach, unblock, coordinate, and align. “If you feel like you have time to code,” Centino warns, “you’re either working long hours or not spending enough time with your team”.You enter the domain of slow, uncertain feedback. ICs can validate ideas quickly: deploy a fix, measure a metric, refactor a function. Managers don’t get that immediacy. Feedback loops are long and ambiguous. “Very few of your actions produce a visible result in under a month,” Klepov notes. “Even the right changes can make things get worse before they get better”.You have to manage people, not just lead them. This distinction matters. Leadership is about vision and influence. Management is about one-on-ones, reviews, process hygiene, and psychological safety. “There’s a lot of peopling involved,” Centino says. “You need to be listening to people, understand them, spend time with them”. For many introverted engineers, that’s emotionally exhausting—but non-negotiable. Skipping the people work results in burnout, distrust, and attrition.You give up control, but remain accountable. Orosz captures the paradox: as a tech lead, you can write code and drive decisions. As a manager, you may do neither—but you’re still responsible for outcomes. That means learning to influence without coding, to steer without micromanaging, and to delegate without detaching.None of this means the shift is a demotion of technical skill. If anything, it requires expanding your judgment from systems to humans. As Romano puts it, “The skills we learn as developers aren’t confined to software. They transfer to life”. But it is a shift. And for those unprepared, it can be jarring. As Centino warns, “Engineering management and individual contribution are completely different roles”.Is Moving into Management Right for you?A move into management is often seen as the natural career progression after senior developer or tech lead. However, not everyone is suited to be a development manager – and that’s okay.“Managing people is a completely different skill set,” Romano candidly remarks. “If you’re someone who’s drawn to logic, machines, and technical problems—and you’re not interested in helping people grow—then you probably shouldn’t go down the management path.”Strong coding ability alone does not guarantee success in leadership. The core of the development manager role, Romano says, comes down to a genuine desire to care for people:“That’s what this job is really about: doing your best to help the people you manage become healthier, happier, more skilled professionals – and hopefully better human beings too.”If that mission excites you more than writing code yourself, it’s a sign you might find the management path rewarding.Despite the persistent narrative that “eventually you’re going to become an engineering manager,” Centino points out: engineering management and individual contribution are “completely different roles” with different success criteria, daily rhythms, and reward systems. The most common trap is assuming that strong technical performance qualifies someone to lead people. As Romano puts it,“In our industry, we often promote people into management roles just because they’re technically strong. But managing people is a completely different skill set”. For those drawn to logic, systems, and clean abstractions, people management may feel frustrating and opaque. “People aren’t logical like machines,” Romano warns. “Managing them requires effort, empathy, and patience”.The core question isn’t whether you can manage—it’s whether you want to. “I do think it’s important to have a solid foundation in software development before stepping into this role,” Romano says. But that’s table stakes. What distinguishes successful managers is not technical depth, but a “genuine desire to care for people”.Centino echoes this point:“As an engineering manager, I like to focus most of my attention and effort into growing individuals on the team… If I can align that with the direction the business is heading, then I think we have a great recipe”.But if that alignment never comes—if writing code is still your deepest source of satisfaction—management may not be the right move.Self-awareness, not seniority, should drive the decision.“This type of thing will change over time,” Centino notes. “I found myself in a dual role for eight years and didn’t really know until the end… what I really felt would motivate me most”.Regular reflection, honest conversations with your manager, and exposure to the demands of the role are more reliable indicators than promotion ladders or external expectations.As Romano says, “If you’re only doing it because it’s your next step, or because someone handed you the role, it can be tough”. But if helping others grow feels like a worthwhile use of your time—and you’re willing to trade code for conversations, and systems for people, you may be ready to step into the role.Making the Move: Traits of Successful EMsIf you feel you fit the bill and are ready to take on the challenges that come with managing software teams today, start by building a foundation of both technical and leadership experience:Learn to manage time and context switching deliberately: Orosz emphasizes that time management shifts from “maker schedule” to “manager schedule.” Future EMs should practice structuring recurring meetings, protect deep work time, and use lightweight systems, for e.g., Getting Things Done (GTD), to track tasks across people and priorities—not just their own.Get fluent in setting and supporting growth goals—for others and yourself: As a manager, you won’t just pursue your own goals—you’ll guide others in theirs. Orosz suggests practicing this by helping peers articulate growth goals, using role frameworks where available. Future EMs must also apply the same discipline to their own goal-setting, or risk drift.Seek and learn from mentors before the transition: Orosz didn’t wait until he was fully in the role—he proactively asked his management chain to connect him with internal mentors who understood the company’s management expectations. Engineers eyeing management should do the same, asking for guidance and observation opportunities ahead of time.Develop the habit of reflection, not just execution: Romano and Orosz both stress the importance of stepping back. Engineers often optimize for output; future managers must learn to observe team dynamics, reflect on what’s working, and adapt. Orosz models this by reading, writing, attending conferences, and running lightweight experiments with how he works.Strengthen emotional awareness and communication range: Romano explicitly notes that successful managers listen closely, pick up non-verbal cues, and adjust their communication style to fit each team member. Aspiring EMs should build this muscle early by observing tone, response patterns, and interpersonal signals on their teams.Practice coaching and teaching—not just explaining: Romano compares great management to good teaching: if one explanation fails, try another. Aspiring EMs should practice helping others understand by adapting to their learning style—not defaulting to their own.Clarify your own motivation: Denis D., Software Engineering Manager at PaySaaS Technology and Romano both warn that without intrinsic interest in people and leadership, the transition becomes painful. Future EMs should reflect early: do they enjoy unblocking others? Does seeing someone else grow feel like progress? If not, they should reconsider the path.Build a low-friction system to stay tech-adjacent: Denis maintains a Notion glossary, logs unknown terms, and watches short tutorials to stay grounded in the tech domain even after moving into management. Aspiring EMs can adopt this habit early to prevent drift and preserve confidence in technical discussions.On the technical side, credibility matters: working several years as an engineer, shipping projects, and understanding the software development lifecycle from firsthand experience will make you a more empathetic and effective leader. As Romano notes, having been “under deadline pressure” or stuck on a stubborn bug helps you relate to the struggles your team faces – “that empathy makes you more effective as a manager.”Ex software development manager and author of Coding in Delphi, Nick Hodges’ words sum up the job of a software development team manager nicely,“Sometimes being a manager is hard—even impossible. Sometimes you have to give up being right and put the needs of the entire organization over yourself. Sometimes you have to balance protecting your people with being a loyal member of the management team. Sometimes you have to manage up as well as you manage down. Being right isn’t enough—being effective matters more.”If Romano’s reflections on team dynamics and career growth sparked your interest, his book Learn Python Programming offers a different kind of guidance, focused on building solid, modern Python skills. Now in its fourth edition, the book covers everything from core syntax and idioms to web APIs, CLI tools, and competitive programming techniques.Get the Book🛠️Tool of the Week⚒️Backstage: Open-Source Developer PortalBackstage provides a central Software Catalog, project templates, and “docs-as-code” infrastructure (TechDocs) so teams can standardize their architecture, onboarding and documentation. For engineering managers, this means you can enforce coding standards and best practices (via templates and catalogs), keep architecture and ownership information up-to-date, and give developers self-service access to resources.Learn more about Backstage📰 Tech BriefsBuilding Strategic Influence as a Staff Engineer or Engineering Manager by Mark Allen, Engineering Leader & Technical Co-Founder @ Isometric: Outlines how staff engineers and engineering managers can build strategic influence by identifying business priorities, acting with curiosity beyond their role, cultivating cross-functional relationships, shaping their internal brand, and selectively saying yes to high-impact opportunities to grow their organizational visibility and impact.How Staff+ Engineers Can Develop Strategic Thinking by Shweta Saraf, Director of Network and Infra Management @Netflix: Explains how to odevelop strategic thinking by diagnosing organizational needs, aligning technical decisions with business goals, influencing cross-functional stakeholders, and balancing innovation with risk—emphasizing that strategic impact stems as much from mindset and relationship-building as from technical expertise.The AI productivity paradox in software engineering: Balancing efficiency and human skill retention: AI adoption in software engineering is creating a productivity paradox—delivering short-term task efficiency while eroding system performance, cognitive skills, and governance, unless teams integrate AI responsibly with oversight, skill development, and systemic alignment.That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
26 Jun 2025
Save for later

Deep Engineering #6: Imran Ahmad on Algorithmic Thinking, Scalable Systems, and the Rise of AI Agents

Divya Anne Selvaraj
26 Jun 2025
How classical algorithms and real-world trade-offs will shape the next generation of software#6Imran Ahmad on Algorithmic Thinking, Scalable Systems, and the Rise of AI AgentsHow classical algorithms, system constraints, and real-world trade-offs will shape the next generation of intelligent softwareWorkshop: Unpack OWASP Top 10 LLMs with SnykJoin Snyk and OWASP Leader Vandana Verma Sehgal on Tuesday, July 15 at 11:00AM ET for a live session covering:✓ The top LLM vulnerabilities✓ Proven best practices for securing AI-generated code✓ Snyk’s AI-powered tools automate and scale secure dev.See live demos plus earn 1 CPE credit!Register todayHi Welcome to the sixth issue of Deep Engineering.A recent IBM and Morning Consult survey found that 99% of enterprise developers are now exploring or developing AI agents. Some have even christened 2025 as “the year of the AI agent”. We are experiencing a shift from standalone models to agentic systems.To understand what this shift means for developers we spoke with Imran Ahmad, data scientist at the Canadian Federal Government’s Advanced Analytics Solution Center (A2SC) and visiting professor at Carleton University. Ahmad is also the author of 50 Algorithms Every Programmer Should Know (Packt, 2023) and is currently working on his highly anticipated next book with us, 30 Agents Every AI Engineer Should Know, due out later this year. He has deep experience working on real-time analytics frameworks, multimedia data processing, and resource allocation algorithms in cloud computing.You can watch the full interview and read the transcript here—or keep reading for our take on the algorithmic mindset that will define the next generation of agentic software.Sign Up |AdvertiseFrom Models to Agents with Imran AhmadAccording to Gartner by 2028, 90% of enterprise software engineers will use AI code assistants (up from under 14% in early 2024). But we are already moving beyond code assistants to agents: software entities that don’t just respond to prompts, but plan, reason, and act by orchestrating tools, models, and infrastructure independently.“We have a lot of hope around AI – that it can eventually replace a human,” Ahmad says. “But if you think about how a person in a company solves a problem, they rely on a set of tools… After gathering information, they create a solution. An ‘agent’ is meant to replace that kind of human reasoning. It should be able to discover the tools in the environment around it, and have the wisdom to orchestrate a solution tailored to the problem. We're not there yet, but that's what we're striving for.”This vision aligns with where industry leaders are headed. Maryam Ashoori, Director of Product Management, IBM watsonx.ai concurs that 2025 is “the year of the AI agent”, and a recent IBM and Morning Consult survey found 99% of enterprise developers are now exploring or developing AI agents. Major platforms are rushing to support this paradigm: for instance, at Build 2025 Microsoft announced an Azure AI Agent Service to orchestrate multiple specialized agents as modular microservices. Such developments underscore the momentum behind agent-based architectures – which Igor Fedulov, CEO of Intersog, in an article for Forbes Technology Council, predicts will be a defining software trend by the end of 2025. Ahmad’s predicts this to be “the next generation of the algorithmic world we live in.”What is an agent?An AI agent is more than just a single model answering questions – it’s a software entity that can plan, call on various tools (search engines, databases, calculators, other models, etc.), and execute multi-step workflows to achieve a goal. “An agent is an entity that has the wisdom to work independently and autonomously,” Ahmad explains. “It can explore its environment, discover available tools, select the right ones, and create a workflow to solve a specific problem. That’s the dream agent.” Today’s implementations only scratch the surface of that ideal. For example, many so-called agents are basically LLMs augmented with function-calling abilities (tool APIs) – useful, but still limited in reasoning. Ahmad emphasizes that “a large language model is not the only tool. It’s perhaps the most important one right now, but real wisdom lies outside the LLM – in the agent.” In other words, true intelligence emerges from how an agent chooses and uses an ecosystem of tools, not just from one model’s output.The Practitioner’s Lens: Driving vs. Building the EngineEven as new techniques emerge, software professionals must decide how deep to go into theory. Ahmad draws a line between researchers and practitioners when it comes to algorithms. The researcher may delve into proofs of optimality, complexity theory, or inventing new algorithms. The practitioner, however, cares about applying algorithms effectively to solve real problems. Ahmad uses an analogy to explain this:“Do you want to build a car and understand every component of the engine? Or do you just want to drive it? If you want to drive it, you need to know the essentials – how to maintain it – but not necessarily every internal detail. That’s the practitioner role.”A senior engineer doesn’t always need to derive equations from scratch, but they do need to know the key parameters, limitations, and maintenance needs of the algorithmic “engines” they use.Ahmad isn’t advocating ignorance of theory. In fact, he stresses that having some insight under the hood improves decision-making. “If you know a bit more about how the engine works, you can choose the right car for your needs,” he explains. Similarly, knowing an algorithm’s fundamentals (even at a high level) helps an engineer pick the right tool for a given job. For example: Is your search problem better served by a Breath-First Search (BFS) or Depth-First Search (DFS) approach? Would a decision tree suffice, or do you need the boost in accuracy from an ensemble method? Experienced engineers approach such questions by combining intuition with algorithmic knowledge – a very practical kind of expertise. Ahmad’s advice is to focus on the level of understanding that informs real-world choices, rather than getting lost in academic detail irrelevant to your use case.Algorithm Choices and Real-World ScalabilityIn the wild, data is messy and scale is enormous – revealing which algorithms truly perform. “When algorithms are taught in universities… they’re usually applied to small, curated datasets. I call this ‘manicured pedicure data.’ But that’s not real data,” Ahmad quips. In his career as a public-sector data scientist, he routinely deals with millions of records and offers three key insights that shape how engineers should approach algorithm selection in production environments:Performance at scale requires different choices than in theory: Ahmad uses an example from his experience when he applied the Apriori algorithm (a well-known method for association rule mining). “When I used Apriori in practice, I found it doesn’t scale,” he admits. “It generates thousands of rules and then filters them after the fact. There’s a newer, better algorithm called (Frequent Pattern) FP-Growth that does the filtering at the source. It only generates the rules you actually need, making it far more scalable” A theoretically correct algorithm can become unusable when faced with big data volumes or strict latency requirements.Non-functional requirements often determine success: Beyond just picking the right algorithm, non-functional requirements like performance, scalability, and reliability must guide engineering decisions. “In academia, we focus on functional requirements… ‘this algorithm should detect fraud.’ And yes, the algorithm might technically work. But in practice, you also have to consider how it performs, how scalable it is, whether it can run as a cloud service, and so on.” Robust software needs algorithms that meet functional goals and the operational demands of deployment (throughput, memory, cost, etc.).Start simple, escalate only as needed:Simpler algorithms are easier to implement, explain, and maintain – valuable qualities especially in domains like finance or healthcare where interpretability matters. While discussing predictive models, Ahmad describes an iterative approach – perhaps begin with intuitive rules, upgrade to a decision tree for more structure, then if needed move to a more powerful model like XGBoost or an SVM. Jumping straight to a deep neural net can be overkill for a simple classification. “It’s usually a mistake to begin with something too complex – it can be overkill, like using a forklift to lift a sheet of paper,” he says.However, Algorithmic choices don’t occur in a vacuum – they influence and are influenced by software architecture. Modern systems, especially AI systems, have distinct phases (training, testing, inference) and often run in distributed cloud environments. Engineers therefore must integrate algorithmic thinking into high-level design and infrastructure decisions.Bridging Algorithms and Architecture in PracticeTake the example of training a machine learning model versus deploying it. “During training, you need a lot of data... a lot of processing power – GPUs, ideally. It’s expensive and time-consuming,” Ahmad notes. This is where cloud architecture shines. “The cloud gives you elastic architectures – you can spin up 2,000 nodes for 2 or 10 hours, train your model, and then shut it down. The cost is manageable…and you’re done.” Cloud platforms allow an elastic burst of resources: massive parallelism for a short duration, which can turn a week-long training job into a few hours for a few hundred dollars. Ahmad highlights that this elasticity was simply not available decades ago in on-prem computing. Today, any team can rent essentially unlimited compute for a day, which removes a huge barrier in building complex models. “If you want to optimize for cost and performance, you need elastic systems. Cloud computing… offers exactly that” for AI workloads, he says.Once trained, the model often compresses down to a relatively small artifact (Ahmad jokes that the final model file is “like the tail of an elephant – tiny compared to the effort to build it”). Serving predictions might only require a lightweight runtime that can even live on a smartphone. Thus, the hardware needs vary drastically between phases: heavy GPU clusters for training; maybe a simple CPU or even embedded device for inference. Good system design accommodates these differences – e.g., by separating training pipelines from inference services, or using cloud for training but edge devices for deployment when appropriate.So, how does algorithm choice drive architecture? Ahmad recommends evaluating any big design decision on three axes:CostPerformanceTime-to-deliverIf adopting a more sophisticated algorithm (or distributed processing framework, etc.) will greatly improve accuracy or speed and the extra cost is justified, it may be worth it. “First, ask yourself: does this problem justify the additional complexity…? Then evaluate that decision along three axes: cost, performance, and time,” he advises. “If an algorithm is more accurate, more time-efficient, and the cost increase is justified, then it’s probably the right choice.” On the flip side, if a fancy algorithm barely improves accuracy or would bust your budget/latency requirements, you might stick with a simpler approach that you can deploy more quickly. This trade-off analysis – weighing accuracy vs. expense vs. speed – is a core skill for architects in the age of AI. It prevents architecture astronautics (over-engineering) by ensuring complexity serves a real purpose.Classical Techniques: The Unsung Heroes in AI SystemsAhmad views classical computer science algorithms and modern AI methods as complementary components of a solution.“Take search algorithms, for instance,” Ahmad elaborates. “When you're preparing datasets for AI… you often have massive data lakes – structured and unstructured data all in one place. Now, say you're training a model for fraud detection. You need to figure out which data is relevant from that massive repository. Search algorithms can help you locate the relevant features and datasets. They support the AI workflow by enabling smarter data preparation.” Before the fancy model ever sees the data, classical algorithms may be at work filtering and finding the right inputs. Similarly, Ahmad points out, classic graph algorithms might be used to do link analysis or community detection that informs feature engineering. Even some “old-school” NLP (like tokenization or regex parsing) can serve as preprocessing for LLM pipelines. These building blocks ensure that the complex AI has quality material to work with.Ahmad offers an apt metaphor:“Maybe AI is your ‘main muscle,’ but to build a strong body – or a performant system – you need to train the supporting muscles too. Classical algorithms are part of that foundation.”Robust systems use the best of both worlds. For example, he describes a hybrid approach in real-world data labeling. In production, you often don’t have neat labeled datasets; you have to derive labels or important features from raw data. Association rule mining algorithms like Apriori or FP-Growth (from classical data mining) can uncover patterns. These patterns might suggest how to label data or which combined features could predict an outcome. “If you feed transaction data into FP-Growth, it will find relationships – like if someone buys milk, they’re likely to buy cheese too… These are the kinds of patterns the algorithm surfaces,” Ahmad explains. Here, a classical unsupervised algorithm helps define the inputs to a modern supervised learning task – a symbiosis that improves the overall system.Foundational skills like devising efficient search strategies, using dynamic programming for optimal substructure problems, or leveraging sorting and hashing for data organization are still extremely relevant. They might operate behind the scenes of an AI pipeline or bolster the infrastructure (e.g., database indexing, cache eviction policies, etc.) that keeps your application fast and reliable. Ahmad even notes that Google’s hyperparameter tuning service, Vizier, is “based on classical heuristic algorithms” rather than any neural network magic – yet it significantly accelerates model optimization.Optimization: The (Absolute) Necessity of Efficiency“Math can be cruel,” Ahmad warns. “If you’re not careful, your problem might never converge… If you accidentally introduce an exponential factor in the wrong place, it might take years – or even centuries – for the solution to converge. The sun might die before your algorithm finishes!” This colorful exaggeration underscores a serious point: computational complexity can explode quickly, and engineers need to be vigilant. It’s not acceptable to shrug off inefficiencies with “just let it run longer” if the algorithmic complexity is super-polynomial. “Things can spiral out of control very quickly. That’s why optimization isn't a luxury – it’s a necessity,” Ahmad says.Ahmad talks about three levels at which we optimize AI systems:Hardware: Choosing the right compute resources can yield massive speedups. For example, training a deep learning model on a GPU or TPU vs. a CPU can be orders of magnitude faster. “For deep learning especially, using a GPU can speed up training by a factor of 1,000,” Ahmad notes, based on his experience. So, part of an engineer’s algorithmic thinking is knowing when to offload work to specialized hardware, or how to parallelize tasks across a cluster.Hyperparameter tuning and algorithmic settings: Many algorithms (especially in machine learning) have knobs to turn – learning rate, tree depth, number of clusters, etc. The wrong settings can make a huge difference in both model quality and compute time. Traditionally, tuning was an art of trial and error. But now, tools like Google’s Vizier (and open-source libraries for Bayesian optimization) can automate this search efficiently.Ensuring the problem is set up correctly: A common mistake is diving into training without examining the data’s signal-to-noise ratio. Ahmad recommends the CRISP-DM approach – spend ample time on data understanding and preparation. “Let’s say your dataset has a lot of randomness and noise. If there's no clear signal, then even a Nobel Prize–winning scientist won’t be able to build a good model,” he says. “So, you need to assess your data before you commit to AI.” This might involve using statistical analysis or simple algorithms to verify that patterns exist. “Use classical methods to ensure that your data even has a learnable pattern. Otherwise, you’re wasting time and resources,” Ahmad advises.The cost of compute – and the opportunity cost of engineers’ time – is too high to ignore optimization. Or as Ahmad bluntly puts it, “It’s not OK to say, ‘I’m not in a hurry, I’ll just let it run.’” Competitive teams optimize both to push performance and to control time/cost, achieving results that are fast, scalable, and economically sensible.Learning by Doing: Making Algorithms StickMany developers first encounter algorithms as leetcode-style puzzles or theoretical exercises for interviews. But how can they move beyond rote knowledge to true mastery? Ahmad’s answer: practice on real problems. “Learning algorithms for interviews is a good start… it shows initiative,” he acknowledges. “But in interview prep, you're not solving real-world problems… To truly make algorithmic knowledge stick, you need to use algorithms to solve actual problems.”In the artificial setting of an interview question, you might code a graph traversal or a sorting function in isolation. The scope is narrow and hints are often provided by the problem constraints. Real projects are messier and more holistic. When you set out to build something end-to-end, you quickly uncover gaps in your knowledge and gain a deeper intuition. “That’s when you'll face real challenges, discover edge cases, and realize that you may need to know other algorithms just to get your main one working,” Ahmad says. Perhaps you’re implementing a network flow algorithm but discover you need a good data structure for priority queues to make it efficient, forcing you to learn or recall heap algorithms. Or you’re training a machine learning model and hit a wall until you implement a caching strategy to handle streaming data. Solving real problems forces you to integrate multiple techniques, and shows how classical and modern methods complement each other in context. Ahmad puts it succinctly: “There’s an entire ecosystem – an algorithmic community – that supports every solution. Classical and modern algorithms aren’t separate worlds. They complement each other, and a solid understanding of both is essential.”So, what’s the best way to gain this hands-on experience? Ahmad recommends use-case-driven projects, especially in domains that matter to you. He suggests tapping into the wealth of public datasets now available. “Governments around the world are legal custodians of citizen data… If used responsibly, this data can change lives,” he notes. Portals like data.gov host hundreds of thousands of datasets spanning healthcare, transportation, economics, climate, and more. Similar open data repositories exist for other countries and regions. These aren’t sanitized toy datasets – they are real, messy, and meaningful. “Choose a vertical you care about, download a dataset, pick an algorithm, and try to solve a problem. That’s the best way to solidify your learning,” Ahmad advises. The key is to immerse yourself in a project where you must apply algorithms end-to-end: from data cleaning and exploratory analysis, to choosing the right model or algorithmic approach, through optimization and presenting results. This process will teach more than any isolated coding puzzle, and the lessons will stick because they’re tied to real outcomes.Yes, 2025 is “the year of the AI agent”, but as the industry shifts from standalone models to agentic systems, engineers must learn to pair classical algorithmic foundations with real-world pragmatism, because in this era of AI agents, true intelligence lies not only in models, but in how wisely we orchestrate them.If Ahmad’s perspective on real-world scalability and algorithmic pragmatism resonated with you, his book 50 Algorithms Every Programmer Should Know goes deeper into the practical foundations behind today’s AI systems. The following excerpt explores how to design and optimize large-scale algorithms for production environments—covering parallelism, cloud infrastructure, and the trade-offs that shape performant systems.🧠Expert Insight: Large-Scale Algorithms by Imran AhmadThe complete “Chapter 15: Large‑Scale Algorithms” from the book 50 Algorithms Every Programmer Should Know by Imran Ahmad (Packt, September 2023).Large-scale algorithms are specifically designed to tackle sizable and intricate problems. They distinguish themselves by their demand for multiple execution engines due to the sheer volume of data and processing requirements. Examples of such algorithms include Large Language Models (LLMs) like ChatGPT, which require distributed model training to manage the extensive computational demands inherent to deep learning. The resource-intensive nature of such complex algorithms highlights the requirement for robust, parallel processing techniques critical for training the model.In this chapter, we will start by introducing the concept of large-scale algorithms and then proceed to discuss the efficient infrastructure required to support them. Additionally, we will explore various strategies for managing multi-resource processing. Within this chapter, we will examine the limitations of parallel processing, as outlined by Amdahl’s law, and investigate the use of Graphics Processing Units (GPUs).Read the Complete Chapter50 Algorithms Every Programmer Should Know by Imran Ahmad (Packt, September 2023) is a practical guide to algorithmic problem-solving in real-world software. Now in its second edition, the book covers everything from classical data structures and graph algorithms to machine learning, deep learning, NLP, and large-scale systems.For a limited time, get the eBook for $9.99 at packtpub.com — no code required.Get the Book🛠️Tool of the Week⚒️OSS Vizier — Production-Grade Black-Box Optimization from GoogleOSS Vizier is a Python-based, open source optimization service built on top of Google Vizier—the system that powers hyperparameter tuning and experiment optimization across products like Search, Ads, and YouTube. Now available to the broader research and engineering community, OSS Vizier brings the same fault-tolerant, scalable architecture to a wide range of use cases—from ML pipelines to physical experiments.Highlights:Flexible, Distributed Architecture: Supports RPC-based optimization via gRPC, allowing Python, C++, Rust, or custom clients to evaluate black-box objectives in parallel or sequentially.Rich Integration Ecosystem: Includes native support for PyGlove, TensorFlow Probability, and Vertex Vizier—enabling seamless connection to evolutionary search, Bayesian optimization, and cloud workflows.Research-Ready: Comes with standardized benchmarking APIs, a modular algorithm interface, and compatibility with AutoML tooling—ideal for evaluating and extending new optimization strategies.Resilient and Extensible: Fault-tolerant by design, with evaluations stored in SQL-backed datastores and support for retry logic, partial failure, and real-world constraints (e.g., human-evaluated objectives or lab settings).Learn more about OSS Vizier📰 Tech BriefsAI agents in 2025: Expectations vs. reality by Ivan Belcic and Cole Stryker, IBM Think: In 2025, AI agents are widely touted as transformative tools for work and productivity, but experts caution that while experimentation is accelerating, current capabilities remain limited, true autonomy is rare, and success depends on governance, strategy, and realistic expectations.Agent Mode for Gemini added to Android Studio: Google has introduced Agent Mode for Gemini in Android Studio, enabling developers to describe high-level goals that the agent can plan and execute—such as fixing build errors, adding dark mode, or generating UI from a screenshot—while allowing user oversight, feedback, and iteration, with expanded context support via Gemini API and MCP integration.Google’s Agent2Agent protocol finds new home at the Linux Foundation: Google has donated its Agent2Agent (A2A) protocol—a standard for enabling interoperability between AI agents—to the Linux Foundation, aiming to foster vendor-neutral, open development of multi-agent systems, with over 100 tech partners now contributing to its extensible, secure, and scalable design.Azure AI Foundry Agent Service GA Introduces Multi-Agent Orchestration and Open Interoperability: Microsoft has launched the Azure AI Foundry Agent Service into general availability, offering a modular, multi-agent orchestration platform that supports open interoperability, seamless integration with Logic Apps and external tools, and robust capabilities for monitoring, governance, and cross-cloud agent collaboration—all aimed at enabling scalable, intelligent agent ecosystems across diverse enterprise use cases.How AI Is Redefining The Way Software Is Built In 2025 by Igor Fedulov, CEO of Intersog: AI is transforming software development by automating tasks, accelerating workflows, and enabling more intelligent, adaptive systems—driving a shift toward agent-based architectures, cloud-native applications, and advanced technologies like voice and image recognition, while requiring developers to upskill in AI, data analysis, and security to remain competitive.That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Divya Anne Selvaraj
19 Jun 2025
Save for later

Deep Engineering #5: Dhirendra Sinha (Google) and Tejas Chopra (Netflix) on Scaling, AI Ops, and System Design Interviews

Divya Anne Selvaraj
19 Jun 2025
Lessons on designing for failure and the importance of trade-off thinking#5Dhirendra Sinha (Google) and Tejas Chopra (Netflix) on Scaling, AI Ops, and System Design InterviewsFrom designing fault-tolerant systems at Big Tech and hiring for system design roles, Chopra and Sinha share lessons on designing for failure and the importance of trade-off thinkingHi Welcome to the fifth issue of Deep Engineering.With AI workloads reshaping infrastructure demands and distributed systems becoming the default, engineers are facing new failure modes, stricter trade-offs, and rising expectations in both practice and hiring.To explore what today’s engineers need to know, we spoke with Dhirendra Sinha (Software Engineering Manager at Google, and long-time distributed systems educator) and Tejas Chopra (Senior Engineer at Netflix and Adjunct Professor at UAT). Their recent book, System Design Guide for Software Professionals (Packt, 2024), distills decades of practical experience into a structured approach to design thinking.In this issue, we unpack their hard-won lessons on observability, fault tolerance, automation, and interview performance—plus what it really means to design for scale in a world where even one-in-a-million edge cases are everyday events.You can watch the full interview and read the transcript here—or keep reading for our distilled take on the design mindset that will define the next decade of systems engineering.Sign Up |AdvertiseJoin us on July 19 for a 150-minute interactive MCP Workshop. Go beyond theory and learn how to build and ship real-world MCP solutions. Limited spots available! Reserve your seat today.Use Code EARLY35 for 35% off!Designing for Scale, Failure, and the Future — With Dhirendra Sinha and Tejas Chopra“Foundational system design principles—like scalability, reliability, and efficiency—are remarkably timeless,” notes Chopra, adding that “the rise of AI only reinforces the importance of these principles.” In other words, new AI systems can’t compensate for poor architecture; they reveal its weaknesses. Sinha concurs: “If the foundation isn’t strong, the system will be brittle—no matter how much AI you throw at it.” AI and system design aren’t at odds – “they complement each other,” says Chopra, with AI introducing new opportunities and stress-tests for our designs.One area where AI is elevating system design is in AI-driven operations (AIOps). Companies are increasingly using intelligent automation for tasks like predictive autoscaling, anomaly detection, and self-healing.“There’s a growing demand for observability systems that can predict service outages, capacity issues, and performance degradation before they occur,” notes Sam Suthar, founding director of Middleware. AI-powered monitoring can catch patterns and bottlenecks ahead of failures, allowing teams to fix issues before users notice. At the same time, designing the systems to support AI workloads is a fresh challenge. The recent rollout of a Ghibli-style image generator saw explosive demand – so much that OpenAI’s CEO had to ask users to pause as GPU servers were overwhelmed. That architecture didn’t fully account for the parallelization and scale such AI models required. AI can optimize and automate a lot, but it will expose any gap in your system design fundamentals. As Sinha puts it, “AI is powerful, but it makes mastering the fundamentals of system design even more critical.”Scaling Challenges and Resilience in PracticeSo, what does it take to operate at web scale in 2025? Sinha highlights four key challenges facing large-scale systems today:Scalability under unpredictable load: global services must handle sudden traffic spikes without falling over or grossly over-provisioning. Even the best capacity models can be off, and “unexpected traffic can still overwhelm systems,” Sinha says.Balancing the classic trade-offs between consistency, performance, and availability: This remains as relevant as ever. In practice, engineers constantly juggle these – and must decide where strong consistency is a must versus where eventual consistency will do.Security and privacy at scale have grown harder: Designing secure systems for millions of users, with evolving privacy regulations and threat landscapes, is an ongoing battle.The rise of AI introduces “new uncertainties”: we’re still learning how to integrate AI agents and AI-driven features safely into large architectures.Chopra offers an example from Netflix: “We once had a live-streaming event where we expected a certain number of users – but ended up with more than three times that number.” The system struggled not because it was fundamentally mis-designed, but due to hidden dependency assumptions. In a microservices world, “you don’t own all the parts—you depend on external systems. And if one of those breaks under load, the whole thing can fall apart,” Chopra warns. A minor supporting service that wasn’t scaled for 3× traffic can become the linchpin that brings down your application. This is why observability is paramount. At Netflix’s scale (hundreds of microservices handling asynchronous calls), tracing a user request through the maze is non-trivial. Teams invest heavily in telemetry to know “which service called what, when, and with what parameters” when things go wrong. Even so, “stitching together a timeline can still be very difficult” in a massive distributed system, especially with asynchronous workflows. Modern observability tools (distributed tracing, centralized logging, etc.) are essential, and even these are evolving with AI assistance to pinpoint issues faster.So how do Big Tech companies approach scalability and robustness by design? One mantra is to design for failure. Assume everything will eventually fail and plan accordingly. “We operate with the mindset that everything will fail,” says Chopra. That philosophy birthed tools like Netflix’s Chaos Monkey, which randomly kills live instances to ensure the overall system can survive outages. If a service or an entire region goes down, your architecture should gracefully degrade or auto-heal without waking up an engineer at 2 AM. Sinha recalls an incident from his days at Yahoo:“I remember someone saying, “This case is so rare, it’s not a big deal,” and the chief architect replied, “One in a million happens every hour here.” That’s what scale does—it invalidates your assumptions.”In high-scale systems, even million-to-one chances occur regularly, so no corner case is truly negligible. In Big Tech, achieving resilience at scale has resulted in three best practices:Fault-tolerant, horizontally scalable architectures: In Netflix and other companies, such architecture ensure that if one node or service dies, the load redistributes and the system heals itself quickly. Teams focus not just on launching features but “landing” them safely – meaning they consider how each new deployment behaves under real-world loads, failure modes, and even disaster scenarios. Automation is key: from continuous deployments to automated rollback and failover scripts. “We also focus on automating everything we can—not just deployments, but also alerts. And those alerts need to be actionable,” Sinha says.Explicit capacity planning and graceful degradation: Engineers define clear limits for how much load a system can handle and build in back-pressure or shedding mechanisms beyond that. Systems often fail when someone makes unrealistic assumptions about unlimited capacity. Caching, rate limiting, and circuit breakers become your safety net. Gradual rollouts further boost robustness. “When we deploy something new, we don’t release it to the entire user base in one go,” Chopra explains. Whether it’s a new recommendation algorithm or a core infrastructure change, Netflix will enable it for a small percentage of users or in one region first, observe the impact, then incrementally expand if all looks good. This staged rollout limits the blast radius of unforeseen issues. Feature flags, canary releases, and region-by-region deployments should be standard operating procedure.Infrastructure as Code (IaC): Modern infrastructure tooling also contributes to resiliency. Many organizations now treat infrastructure as code, defining their deployments and configurations in declarative scripts. As Sinha notes, “we rely heavily on infrastructure as code—using tools like Terraform and Kubernetes—where you define the desired state, and the system self-heals or evolves toward that.” By encoding the target state of the system, companies enable automated recovery; if something drifts or breaks, the platform will attempt to revert to the last good state without manual intervention. This codified approach also makes scaling and replication more predictable, since environments can be spun up from the same templates.These same principles—resilience, clarity, and structured thinking—also underpin how engineers should approach system design interviews.Mastering the System Design InterviewCracking the system design interview is a priority for many mid-level engineers aiming for senior roles, and for good reason. Sinha points out that system design skill isn’t just a hiring gate – it often determines your level/title once you’re in a company. Unlike coding interviews where problems have a neat optimal solution, “system design is messy. You can take it in many directions, and that’s what makes it interesting,” Sinha says. Interviewers want to see how you navigate an open-ended problem, not whether you can memorize a textbook solution. Both Sinha and Chopra emphasize structured thinking and communication. Hiring managers deliberately ask ambiguous or underspecified questions to see if the candidate will impose structure: Do they ask clarifying questions? Do they break the problem into parts (data storage, workload patterns, failure scenarios, etc.)? Do they discuss trade-offs out loud? Sinha and Chopra offer two guidelines:There’s rarely a single “correct” answer: What matters is reasoning and demonstrating that you can make sensible trade-offs under real-world constraints. “It’s easy to choose between good and bad solutions,” Sinha notes, “but senior engineers often have to choose between two good options. I want to hear their reasoning: Why did you choose this approach? What trade-offs did you consider?” A strong candidate will articulate why, say, they picked SQL over NoSQL for a given scenario – and acknowledge the downsides or conditions that might change that decision. In fact, Chopra may often follow up with “What if you had 10× more users? Would your choice change?” to test the adaptability of a candidate’s design. He also likes to probe on topics like consistency models: strong vs eventual consistency and the implications of the CAP theorem. Many engineers “don’t fully grasp how consistency, availability, and partition tolerance interact in real-world systems,” Chopra observes, so he presents scenarios to gauge depth of understanding.Demonstrate a collaborative, inquisitive approach: A system design interview shouldn’t be a monologue; it’s a dialogue. Chopra says, “I try to keep the interview conversational. I want the candidate to challenge some of my assumptions.” For example, a candidate might ask: What are the core requirements? Are we optimizing for latency or throughput? or How many users are we targeting initially? — “that kind of questioning is exactly what happens in real projects,” Chopra explains. It shows the candidate isn’t just regurgitating a pre-learned architecture, but is actively scoping the problem like they would on the job. Strong candidates also prioritize requirements on the fly – distinguishing must-haves (e.g. high availability, security) from nice-to-haves (like an optional feature that can be deferred).Through years of interviews, Sinha and Chopra have noticed three common pitfalls:Jumping into solution-mode too fast: “Candidates don’t spend enough time right-sizing the problem,” says Chopra. “The first 5–10 minutes should be spent asking clarifying questions—what exactly are we designing, what are the constraints, what assumptions can we make?” Diving straight into drawing boxes and lines can lead you down the wrong path. Sinha agrees: “They hear something familiar, get excited, and dive into design mode—often without even confirming what they’re supposed to be designing. In both interviews and real life, that’s dangerous. You could end up solving the wrong problem.”Lack of structure – jumping randomly between components without a clear plan: This scattered approach makes it hard to know if you’ve covered the key areas. Interviewers prefer a candidate who outlines a high-level approach (e.g. client > service > data layer) before zooming in, and who checks back on requirements periodically.Poor time management: It’s common for candidates to get bogged down in details early (like debating the perfect database indexing scheme) and then run out of time to address other important parts of the system. Sinha and Chopra recommend practicing pacing yourself and be willing to defer some details. It’s better to have a complete, if imperfect, design than a perfect cache layer with no time to discuss security or analytics requirements. If an interviewer hints to move on or asks about an area you haven’t covered, take the cue. “Listen to the interviewer’s cues,” Sinha advises. “We want to help you succeed, but if you miss the hints, we can’t evaluate you properly.”Tech interviews in general have gotten more demanding in 2025. The format of system design interviews hasn’t drastically changed, but the bar is higher. Companies are more selective, sometimes even “downleveling” strong candidates if they don’t perfectly meet the senior criteria. Evan King and Stefan Mai, cofounders of interview preparation startup, in an article in The Pragmatic Engineer observe, “performance that would have secured an offer in 2021 might not even clear the screening stage today”. This reflects a market where competition is fierce and expectations for system design prowess are rising. But as Chopra and Sinha illustrate, the goal is not to memorize solutions – it’s to master the art of trade-offs and critical thinking.Beyond Interviews: System Design as a Career CatalystSystem design isn’t just an interview checkbox – it’s a fundamental skill for career growth in engineering. “A lot of people revisit system design only when they're preparing for interviews,” Sinha says. “But having a strong grasp of system design concepts pays off in many areas of your career.” It becomes evident when you’re vying for a promotion, writing an architecture document, or debating a new feature in a design review.Engineers with solid design fundamentals tend to ask the sharp questions that others miss (e.g. What happens if this service goes down? or Can our database handle 10x writes?). They can evaluate new technologies or frameworks in the context of system impact, not just code syntax. Technical leadership roles especially demand this big-picture thinking. In fact, many companies now expect even engineering managers to stay hands-on with architecture – “system design skills are becoming non-negotiable” for leadership.Mastering system design also improves your technical communication. As you grow more senior, your success depends on how well you can simplify complexity for others – whether in documentation or in meetings. “It’s not just about coding—it’s about presenting your ideas clearly and convincingly. That’s a huge part of leadership in engineering,” Sinha notes. Chopra agrees, framing system design knowledge as almost a mindset: “System design is almost a way of life for senior engineers. It’s how you continue to provide value to your team and organization.” He compares it to learning math: you might not explicitly use the quadratic formula daily, but learning it trains your brain in problem-solving.Perhaps the most exciting aspect is that the future is wide open. “Many of the systems we’ll be working on in the next 10–20 years haven’t even been built yet,” Chopra points out. We’re at an inflection point with technologies like AI agents and real-time data streaming pushing boundaries; those with a solid foundation in distributed systems will be the “go-to” people to harness these advances. And as Chopra notes,“seniority isn’t about writing complex code. It’s about simplifying complex systems and communicating them clearly. That’s what separates great engineers from the rest.”System design proficiency is a big part of developing that ability to cut through complexity.Emerging Trends and Next Frontiers in System DesignWhile core principles remain steady, the ecosystem around system design is evolving rapidly. We can identify three significant trends:Integration of AI Agents with Software Systems: As Gavin Bintz writes in Agent One, an emerging trend is the integration of AI agents with everyday software systems. New standards like Anthropic’s Model Context Protocol (MCP), are making it easier for AI models to securely interface with external tools and services. You can think of MCP as a sort of “universal adapter” that lets a large language model safely query your database, call an API like Stripe, or post a message to Slack – all through a standardized interface. This development opens doors to more powerful, context-aware AI assistants, but it also raises architectural challenges. Designing a system that grants an AI agent limited, controlled access to critical services requires careful thought around authorization, sandboxing, and observability (e.g., tracking what the AI is doing). Chopra sees MCP as fertile ground for new system design patterns and best practices in the coming years.Deepening of observability and automation in system management: Imagine systems that not only detect an anomaly but also pinpoint the likely root cause across your microservices and possibly initiate a fix. As Sam Suthar, Founding Director at Middleware, observes, early steps in this direction are already in play – for example, tools that correlate logs, metrics, and traces across a distributed stack and use machine learning to identify the culprit when something goes wrong. The ultimate goal is to dramatically cut Mean Time to Recovery (MTTR) when incidents occur, using AI to assist engineers in troubleshooting. As one case study showed, a company using AI-based observability was able to resolve infrastructure issues 75% faster while cutting monitoring costs by 75%. The complexity of modern cloud environments is pushing us toward this new normal of predictive, adaptive systems.Sustainable software architecture: There is growing dialogue now about designing systems that are not only robust and scalable, but also efficient in their use of energy and resources. The surge in generative AI has shone a spotlight on the massive power consumption of large-scale services. According to Kemene et al., in an article published by the World Economic Forum (WEF), Data centers powering AI workloads can consume as much electricity as a small city; the International Energy Agency projects data center energy use will more than double by 2030, with AI being “the most important driver” of that growth. Green software engineering principles urge us to consider the carbon footprint of our design choices. Sinha suggests this as an area to pay attention to.Despite faster cycles, sharper constraints and more automation system design remains grounded in principles. As Chopra and Sinha make clear, the ability to reason about failure, scale, and trade-offs isn’t just how systems stay up; it’s also how engineers move up in their career.If you found Sinha and Chopra’s perspective on designing for scale and failure compelling, their book System Design Guide for Software Professionals unpacks the core attributes that shape resilient distributed systems. The following excerpt from the book breaks down how consistency, availability, partition tolerance, and other critical properties interact in real-world architectures. You’ll see how design choices around reads, writes, and replication influence system behavior—and why understanding these trade-offs is essential for building scalable, fault-tolerant infrastructure.Expert Insight: Distributed System Attributes by Dhirendra Sinha and Tejas ChopraThe complete “Chapter 2: Distributed System Attributes” from the book System Design Guide for Software Professionals by Dhirendra Sinha and Tejas Chopra (Packt, August 2024)…Before we jump into the different attributes of a distributed system, let’s set some context in terms of how reads and writes happen.Let’s consider an example of a hotel room booking application (Figure 2.1). A high-level design diagram helps us understand how writes and reads happen:Figure 2.1 – Hotel room booking request flowAs shown in Figure 2.1, a user (u1) is booking a room (r1) in a hotel and another user is trying to see the availability of the same room (r1) in that hotel. Let’s say we have three replicas of the reservations database (db1, db2, and db3). There can be two ways the writes get replicated to the other replicas: The app server itself writes to all replicas or the database has replication support and the writes get replicated without explicit writes by the app server.Let’s look at the write and the read flows:Read the Complete ChapterSystem Design Guide for Software Professionals by Dhirendra Sinha and Tejas Chopra (Packt, August 2024) is a comprehensive, interview-ready manual for designing scalable systems in real-world settings. Drawing on their experience at Google, Netflix, and Yahoo, the authors combine foundational theory with production-tested practices—from distributed systems principles to high-stakes system design interviews.For a limited time, get the eBook for $9.99 at packtpub.com — no code required.Get the Book🛠️Tool of the Week⚒️Diagrams 0.24.4 — Architecture Diagrams as Code, for System DesignersDiagrams is an open source Python toolkit that lets developers define cloud architecture diagrams using code. Designed for rapid prototyping and documentation, it supports major cloud providers (AWS, GCP, Azure), Kubernetes, on-prem infrastructure, SaaS services, and common programming frameworks—making it ideal for reasoning about modern system design.The latest release (v0.24.4, March 2025) adds stability improvements and ensures compatibility with recent Python versions. Diagrams has been adopted in production projects like Apache Airflow and Cloudiscovery, where infrastructure visuals need to be accurate, automatable, and version controlled.Highlights:Diagram-as-Code: Define architecture models using simple Python scripts—ideal for automation, reproducibility, and tracking in Git.Broad Provider Support: Over a dozen categories including cloud platforms, databases, messaging systems, DevOps tools, and generic components.Built on Graphviz: Integrates with Graphviz to render high-quality, publishable diagrams.Extensible and Scriptable: Easily integrate with build pipelines or architecture reviews without relying on external design tools.Visit Diagrams' GitHub Repo📰 Tech BriefsAnalyzing Metastable Failures in Distributed Systems: A new HotOS'25 paper builds on prior work to introduce a simulation-based pipeline—spanning Markov models, discrete event simulation, and emulation—to help engineers proactively identify and mitigate metastable failure modes in distributed systems before they escalate.A Senior Engineer's Guide to the System Design Interview: A comprehensive, senior-level guide to system design interviews that demystifies core concepts, breaks down real-world examples, and equips engineers with a flexible, conversational framework for tackling open-ended design problems with confidence.Using Traffic Mirroring to Debug and Test Microservices in Production-Like Environments: Explores how production traffic mirroring—using tools like Istio, AWS VPC Traffic Mirroring, and eBPF—can help engineers safely debug, test, and profile microservices under real-world conditions without impacting users.Designing Instagram: This comprehensive system design breakdown of Instagram outlines the architecture, APIs, storage, and scalability strategies required to support core features like media uploads, feed generation, social interactions, and search—emphasizing reliability, availability, and performance at massive scale.Chiplets and the Future of System Design: A forward-looking piece on how chiplets are reshaping the assumptions behind system architecture—covering yield, performance, reuse, and the growing need for interconnect standards and packaging-aware system design.That’s all for today. Thank you for reading the first issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey we now run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor in Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Divya Anne Selvaraj
12 Jun 2025
Save for later

Deep Engineering #4: Alessandro Colla and Alberto Acerbis on Domain-Driven Refactoring at Scale

Divya Anne Selvaraj
12 Jun 2025
Why understanding the domain beats applying patterns—and how to refactor without starting over#4Alessandro Colla and Alberto Acerbis on Domain-Driven Refactoring at ScaleWhy understanding the domain beats applying patterns—and how to refactor without starting overWelcome to the fourth issue of Deep Engineering.In enterprise software systems, few challenges loom larger than refactoring legacy systems to meet modern needs. These efforts can feel like open-heart surgery on critical applications that are still running in production. Systems requiring refactoring are often business-critical, poorly modularized, and resistant to change by design.To understand how Domain-Driven Design (DDD) can guide this process, we spoke with Alessandro Colla and Alberto Acerbis—authors of Domain-Driven Refactoring (Packt, 2025) and co-founders of the "DDD Open" and "Polenta and Deploy" communities.Colla brings over three decades of experience in eCommerce systems, C# development, and strategic software design. Acerbis is a Microsoft MVP and backend engineer focused on building maintainable systems that deliver business value. Together, they offer a grounded, pattern-skeptical view of what DDD really looks like in legacy environments—and how teams can use it to make meaningful change without rewriting from scratch.You can watch the full interview and read the full transcript here—or keep reading for our distilled take on the principles, pitfalls, and practical steps that shape successful DDD refactoring.Sign Up |AdvertiseThe conference to learn, apply, and improve your craftdev2next is the premier conference designed for software developers, architects, technology leaders, development managers, and directors. Explore cutting-edge strategies, tools, and essential practices for building powerful applications using the latest trends and good practices.When: September 29 - October 2, 2025Where: Colorado Springs, COBuy Conference and Workshop TicketsPrinciples over Patterns: Applying DDD to Legacy Systems with Alessandro Colla and Alberto AcerbisLegacy systems are rarely anyone’s favorite engineering challenge. Often labeled “big balls of mud,” these aging codebases resist change by design—lacking tests, mixing concerns, and coupling business logic to infrastructure in ways that defy modular thinking. Yet they remain critical. “It’s more common to work on what we call legacy code than to start fresh,” Acerbis notes from experience. Their new book, Domain-Driven Refactoring, was born from repeatedly facing large, aging codebases that needed new features. “The idea behind the book is to bring together, in a sensible and incremental way, how we approach the evolution of complex legacy systems,” explains Colla. Rather than treat DDD as something only for new projects, Colla and Acerbis show how DDD’s concepts can guide the incremental modernization of existing systems.They begin by reinforcing core DDD concepts—what Colla calls their “foundation”—before demonstrating how to apply patterns gradually. This approach acknowledges a hard truth: when a client asks for “a small refactor” of a legacy system, “it’s never small. It always becomes a bigger refactor,” Acerbis says with a laugh. The key is to take baby steps. “Touching a complex system is always difficult, on many levels,” Colla cautions, so the team must break down the work into manageable changes rather than trying an all-at-once overhaul.Modular Monoliths Before MicroservicesOne of the first decisions in a legacy overhaul is whether to break a monolithic application into microservices. But Colla and Acerbis urge caution here—hype should not dictate architecture.“Normally, a customer comes to us asking to transform their legacy application into a microservices system because — you know — ‘my cousin told me microservices solve all the problems,’” Acerbis jokes. The reality is that blindly carving up a legacy system into microservices can introduce as much complexity as it removes. “Once you split your system into microservices, your architecture needs to support that split,” he explains, from infrastructure and deployment to data consistency issues.Instead, the duo advocates an interim step: first evolve the messy monolith into a well-structured modular monolith. “Using DDD terms, you should move your messy monolith into a good modular monolith,” says Acerbis. In a modular monolith, clear boundaries are drawn around business subdomains (often aligning with DDD bounded contexts), but the system still runs as a single deployable unit. This simplification and ordering within the monolith can often deliver the needed agility and clarity. “We love monoliths, OK? But modular ones,” Colla admits. With a modular monolith in place, teams can implement new features more easily and see if further decomposition is truly warranted. Only if needed—due to scale or independent deployment demands—should you “split it into microservices. But that’s a business and technical decision the whole team needs to make together,” Acerbis emphasizes.By following this journey, teams often find full microservices unnecessary. Colla notes that many times they’ve been able to meet all business requirements just by going modular, without ever needing microservices. The lesson: choose the simplest architecture that solves the problem and avoid microservices sprawl unless your system’s scale and complexity absolutely demand it.First Principles: DDD as a Mindset, Not a ChecklistA central theme from Colla and Acerbis is that DDD is fundamentally about understanding the problem domain, not checking off a list of patterns. “Probably the most important principle is that DDD is not just technical — it’s about principles,” says Acerbis. Both engineers stress the importance of exploration and ubiquitous language before diving into code. “Start with the strategic patterns — particularly the ubiquitous language — to understand the business and what you’re dealing with,” Colla advises. In practice, that means spending time with domain experts, clarifying terminology, and mapping out the business processes and subdomains. Only once the team shares a clear mental model of “what actually needs to be built” should they consider tactical design patterns or write any code.Colla candidly shares that he learned this the hard way.“When I started working with DDD, CQRS, and event sourcing, I made the mistake of jumping straight into technical modeling — creating aggregates, entities, value objects — because I’m a developer, and that’s what felt natural. But I skipped the step of understanding why I was building those classes.I ended up with a mess.”Now he advocates for understand the why, then the how. “We spent the first chapters of the book laying out the principles. We wanted readers to understand the why — so that once you get to the code, it comes naturally,” Colla says.This principle-centric mindset guards against a common trap: applying DDD patterns by rote or “cloning” a solution from another project.“I’ve seen situations where someone says, ‘I’ve already solved a similar problem using DDD — I’ll just reuse that design.’ But no, that’s not how it works,” Acerbis warns.Every domain is different, and DDD is “about exploration. Every situation is different.” By treating DDD as a flexible approach to learning and modeling the domain—rather than a strict formula—teams can avoid over-engineering and build models that truly fit their business.From Strategic to Tactical: Applying Patterns IncrementallyOnce the team has a solid grasp of the domain, they can start to apply DDD’s tactical patterns (entities, value objects, aggregates, domain events, etc.) to reshape the code. But which pattern comes first? Colla doesn’t prescribe a one-size-fits-all sequence. “I don’t think there’s a specific pattern to apply before others,” he says. The priority is dictated by the needs of the domain and the pain points in the legacy code. However, the strategic understanding guides the tactical moves: by using the ubiquitous language and bounded contexts identified earlier, the team can decide where an aggregate boundary should be, where to introduce a value object for a concept, and so on.Acerbis emphasizes that their book isn’t a compendium of all DDD patterns—classic texts already cover those. Instead, it shows how to practically apply a selection of patterns in a legacy refactoring context. The aim is to go from “a bad situation — a big ball of mud — to a more structured system,” he says. A big win of this structure is that new features become easier to add “without being afraid of introducing bugs or regressions,” because the code has clear separation of concerns and meaningful abstractions.Exploring the domain comes first. Only then should the team “bring in the tactical patterns when you begin touching the code,” says Colla. In other words, let the problem guide the solution. By iteratively applying patterns in the areas that need them most, the system gradually transforms—all while continuing to run and deliver value. This incremental refactoring is core to their approach; it avoids the risky big-bang rewrite and instead evolves the architecture piece by piece, in sync with growing domain knowledge.Balancing Refactoring with Rapid DeliveryIn theory, it sounds ideal to methodically refactor a system. In reality, business stakeholders are rarely patient—they need new features yesterday. Colla acknowledges this tension:“This is the million-dollar question. As in life, the answer is balance. You can't have everything at once — you need to balance features and refactoring.”The solution is to weave refactoring into feature development, rather than treating it as a separate project that halts new work.“Stakeholders want new features fast because the system has to keep generating value,” Colla notes. Completely pausing feature development for months of cleanup is usually a non-starter (“We’ve had customers say, ‘You need to fix bugs and add new features — with the same time and budget.’”). Instead, Colla’s team refactors in context: “if a new feature touches a certain area of the system, we refactor that area at the same time.” This approach may slightly slow down that feature’s delivery, but it pays off in the long run by preventing the codebase from deteriorating further. Little by little (“always baby steps,” as Colla says), they improve the design while still delivering business value.Acerbis adds that having a solid safety net of tests is what makes this sustainable. Often, clients approach them saying it’s too risky or slow to add features because “the monolith has become a mess.” The first order of business, then, is to shore up test coverage.“We usually start with end-to-end tests to make sure that the system behaves the same way after changes,” he explains.Writing tests for a legacy system can be time-consuming initially, but it instills confidence.“In the beginning, it takes time. You have to build that infrastructure and coverage. But as you move forward, you’ll see the benefits — every time you deploy a new feature, you’ll know it was worth it.”With robust tests in place, the team can refactor aggressively within each iteration, knowing they will catch any unintended side effects before they reach users.Aligning Architecture with OrganizationEven the best technical refactoring will falter if organizational structure is at odds with the design. This is where Conway’s Law comes into play—the notion that software systems end up reflecting the communication structures of the organizations that build them.“When introducing DDD, it’s not just about technical teams. You need involvement from domain experts, developers, stakeholders — everyone,” says Acerbis.In practice, this means that establishing clean bounded contexts in code may eventually require realigning team responsibilities or communication paths in the company.Of course, changing an organization chart is harder than changing code. Colla and Acerbis therefore approach it in phases. “Context mapping is where we usually begin — understanding what each team owns and how they interact,” Colla explains. They first try to fix the code boundaries while not breaking any essential communication between people or teams. For instance, if two modules should only talk via a well-defined interface, they might introduce an anti-corruption layer in code, even if the same two teams still coordinate closely as they always have. Once the code’s boundaries stabilize and prove beneficial, the case can be made to align the teams or management structure accordingly.“The hardest part is convincing the business side that this is the right path,” Acerbis admits. Business stakeholders control budgets and priorities, so without their buy-in, deep refactoring stalls. The key is to demonstrate value early and keep them involved. Ultimately, “it only works if the business side is on board — they’re the ones funding the effort,” he says. Colla concurs: “everyone — developers, architects, business — needs to share the same understanding. Without that alignment, it doesn’t work.” DDD, done right, becomes a cross-discipline effort, bridging tech and business under a common language and vision.Building a Safety Net: Tools and Testing TechniquesGiven the complexity of legacy transformation, what tools or frameworks can help? Colla’s answer may surprise some: there is no magic DDD framework that will do it for you. “There aren’t any true ‘DDD-compliant’ frameworks,” he says. DDD isn’t something you can buy off-the-shelf; it’s an approach you must weave into how you design and code. However, there are useful libraries and techniques to smooth the journey, especially around testing and architecture fitness.“What’s more important to me is testing — especially during refactoring. You need a strong safety net,” Colla emphasizes. His team’s rule of thumb: start by writing end-to-end tests for current behavior. “We always start with end-to-end tests. That way, we make sure the expected behavior stays the same,” Colla shares. These broad tests cover critical user flows so that if a refactoring accidentally changes something it shouldn’t, the team finds out immediately. Next, they add architectural tests (often called fitness functions) to enforce the intended module boundaries. “Sometimes, dependencies break boundaries. Architectural tests help us catch that,” he notes. For instance, a test might ensure that code in module A never calls code in module B directly, enforcing decoupling. And of course, everyday unit tests are essential for the new code being written: “unit tests, unit tests, unit tests,” Colla repeats for emphasis. “They prove your code does what it should.”Acerbis agrees that no all-in-one DDD framework exists (and maybe that’s for the best). “DDD is like a tailor-made suit. Every time, you have to adjust how you apply the patterns depending on the problem,” he says. Instead of relying on a framework to enforce DDD, teams should rely on discipline and tooling – especially the kind of automated tests Colla describes – to keep their refactoring on track. Acerbis also offers a tip on using AI assistance carefully: tools like GitHub Copilot can be helpful for generating code, but “you don’t know how it came up with that solution.” He prefers to have developers write the code with understanding, then use AI to review or suggest improvements. This ensures that the team maintains control over design decisions rather than blindly trusting a tool.Event-Driven Architecture: Avoiding the "Distributed Monolith"DDD often goes hand-in-hand with event-driven architecture for decoupling. Used well, domain events can keep bounded contexts loosely coupled. But Colla and Acerbis caution that it’s easy to misuse events and end up with a distributed mess. Acerbis distinguishes two kinds of events with very different roles: domain events and integration events. “Domain events should stay within a bounded context. Don’t share them across services,” he warns. If you publish your internal domain events for other microservices to consume, you create tight coupling: “when you change the domain event — and you will — you’ll need to notify every team that relies on it. That’s tight coupling, not decoupling.”The safer pattern is to keep domain events private to a service or bounded context, and publish separate integration events for anything that truly needs to be shared externally. That way, each service can evolve its internal model (and its domain event definitions) independently. Colla admits he’s learned this by making the mistakes himself. The temptation is to save effort by reusing an event “because it feels efficient,” but six months later, when one team changes that event’s schema, everything breaks. “We have to resist that instinct and think long-term,” he says. Even if it requires a bit more work upfront to define distinct integration events, it prevents creating what he calls a “distributed monolith that’s impossible to evolve” – a system where services are theoretically separate but so tightly coupled by data contracts that they might as well be a single unit.Another often overlooked aspect of event-driven systems is the user experience in an eventually consistent world. Because events introduce asynchrony, UIs must be designed to handle the delay. Acerbis mentions using task-based UIs, where screens are organized around high-level business tasks rather than low-level CRUD forms, to better set user expectations and capture intent that aligns with back-end processes. The bottom line is that events are powerful, but they come with their own complexities – teams must design and version them thoughtfully, and always keep the end-to-end system behavior in mind.💡What This Means for YouDon’t jump in without understanding the domain: Slow down and truly grasp the domain events and logic before coding them. Focus on principles before patterns. DDD isn’t a bag of technical tricks, but a way to deeply understand the business domain before coding solutions. Align with the business. Technical architecture must reflect the domain and may require organizational buy-in and alignment (think Conway’s Law).Beware the golden hammer: “Use DDD where it makes sense. You don’t need to apply it to your entire system,” Acerbis advises. Focus DDD efforts on the core domain (where the competitive advantage lies), and keep supporting domains simple. Modular monolith first. Instead of rushing into microservices, first untangle your “big ball of mud” into a well-structured modular monolith—often that's enough.No “Franken-events”: If you see an “and” in an event name, that’s a red flag – it likely violates the single responsibility principle for events and will cause trouble when one part of that event changes and the other doesn’t. Refactor in baby steps. Integrate refactoring tasks into regular feature work, supported by a strong safety net of tests, to balance improvement with delivery.Never allow invalid data by design: A subtle but dangerous practice is allowing objects or aggregates in an invalid state (for example, by using flags like isValid). “Your aggregates should always be in a valid state,” Acerbis emphasizes, meaning your constructors or factories should enforce invariants so you don’t have to constantly check validity later.Don’t split the system before it’s ready: Microservices introduce complexity too early. “Once you split your system into microservices, your architecture needs to support that split,” Acerbis warns. Work on converting to a modular monolith first—often that's enough.“Simple” versus “easy” code: “Simple code is not the same as easy code. Simple code takes effort. Easy code is quick, but it’s hard to maintain,” says Acerbis. What feels “easy” in the moment (quick-and-dirty hacks, copy-paste coding, skipping tests) leads to a tangled mess. Writing simple, clear code often requires more thought and discipline—but it pays off with maintainability. Evolve, don’t rewrite. Aim to evolve the system through continuous small changes rather than costly complete rewrites.If you found Colla and Acerbis’ insights useful, their book, Domain-Driven Refactoring offers a deeper, hands-on perspective—showing how to incrementally apply DDD principles in real systems under active development with substantial code examples. Here is an excerpt which covers how to integrate events within a CQRS architecture.Expert Insight: Integrating Events with CQRS by Alessandro Colla and Alberto AcerbisAn Excerpt from “Chapter 7: Integrating Events with CQRS” in the book Domain-Driven Refactoring by Alessandro Colla and Alberto Acerbis (Packt, May 2025)In this chapter, we will explore how to effectively integrate events into your system using the Command Query Responsibility Segregation (CQRS) pattern. As software architectures shift from monolithic designs to more modular, distributed systems, adopting event-driven communication becomes essential. This approach offers scalability, decoupling, and resilience, but also brings complexity and challenges such as eventual consistency, fault tolerance, and infrastructure management.The primary goal of this chapter is to guide you through the implementation of event-driven mechanisms within the context of a CQRS architecture. By the end of this chapter, you will have a clear understanding of how events and commands operate in tandem to manage state changes, communicate between services, and optimize both the reading and writing of data.(In this excerpt) you will learn about the following:The benefits and trade-offs of transitioning from synchronous to asynchronous communicationHow event-driven architectures improve system scalability and decouplingThe difference between commands (which trigger state changes) and events (which signal that something has happened)How to apply proper message-handling patterns for bothThe principles of CQRS and understanding why separating read and write models enhances performance and scalabilityHow to implement the separation of command and query responsibilities with a focus on read and write optimizationHow to introduce a message broker for handling asynchronous communicationHow to capture and replay the history of state changes with event sourcingRead the Complete ExcerptDomain-Driven Refactoring by Alessandro Colla and Alberto Acerbis is a practical guide to modernizing legacy systems using DDD. Through real-world C# examples, the authors show how to break down monoliths into modular architectures—whether evolving toward microservices or improving maintainability within a single deployable unit. The book covers both strategic and tactical patterns, including bounded contexts, aggregates, and event-driven integration.Use code DOMAIN20 for 20% off at packtpub.com — valid through June 16, 2025.Get the Book🛠️Tool of the Week⚒️Context Mapper 6.12.0 — Strategic DDD Refactoring, VisualizedContext Mapper is an open source modeling toolkit for strategic DDD, purpose-built to define and evolve bounded contexts, map interrelationships, and drive architectural refactorings. It offers a concise DSL for creating context maps and includes built-in transformations for modularizing monoliths, extracting services, and analyzing cohesion/coupling trade-offs.The latest version continues its focus on reverse-engineering context maps from Spring Boot and Docker Compose projects, along with support for automated architectural refactorings—making it ideal for teams modernizing legacy systems or planning microservice transitions.Highlights:Iterative Refactoring: Apply “Architectural Refactorings” to improve modularity without rewriting everything.Reverse Engineering: Extract bounded context candidates from existing codebases using the Context Map Discovery library.Multi-Format Output: Export maps to Graphviz, PlantUML, MDSL, or Freemarker-based text formats.IDE Integrations: Available as plugins for Eclipse and VS Code, or use it directly in Gitpod without local setup.Visit the Project Website📰 Tech BriefsArchitecture Refactoring Towards Service Reusability in the Context of Microservices by Daniel et al.: This paper proposes a catalog of architectural refactorings—Join API Operations with Heterogeneous Data, Introduce Metadata, and Extract Pluggable Processors—to improve service reusability in microservice architectures by reducing code duplication, decoupling data from processing logic, and supporting heterogeneous inputs, and validates these patterns through impact analysis on three real-world case studies.DDD & LLMs - Eric Evans - DDD Europe 2024: In this keynote, Evans reflects on the transformative potential of large language models in software development, urging the community to embrace experimentation, learn through hands-on projects, and explore how DDD might evolve—or be challenged—in an era increasingly shaped by AI-assisted systems.Domain Re-discovery Patterns for Legacy Code - Richard Groß - DDD Europe 2024: In this talk, Groß introduces domain rediscovery patterns for legacy systems—ranging from passive analysis techniques like mining repositories and activity logging to active refactoring patterns and visualization tools—all aimed at incrementally surfacing domain intent, guiding safe modernization without full rewrites, and avoiding hidden technical and organizational costs of starting from scratch.Legacy Modernization meets GenAI by Ferri et al., Thoughtworks: This article discusses how GenAI can address core challenges in legacy system modernization—such as reverse engineering, capability mapping, and high-level system comprehension—arguing for a human-guided, evolutionary approach while showcasing Thoughtworks’ internal accelerator, CodeConcise, as one practical application of these ideas.Refactor a monolith into microservices by the Google Cloud Architecture Center: This guide outlines how to incrementally refactor a monolithic application into microservices using DDD, bounded contexts, and asynchronous communication—emphasizing the Strangler Fig pattern, data and service decoupling strategies, and operational considerations like distributed transactions, service boundaries, and database splitting.That’s all for today. Thank you for reading the first issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor in Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0

Divya Anne Selvaraj
05 Jun 2025
Save for later

Deep Engineering #3: Designing for AI and Humans with MoonBit Core Contributor Zihang YE

Divya Anne Selvaraj
05 Jun 2025
From CLI design to AI ergonomics—MoonBit offers patterns worth borrowing#3Designing for AI and Humans with MoonBit Core Contributor Zihang YEFrom CLI design to AI ergonomics—MoonBit offers patterns worth borrowingHi ,Welcome to the third issue of Deep Engineering.AI agents are no longer just code generators, they’re becoming active users of codebases, APIs, and developer tools. From semantic documentation protocols to agent-readable APIs, the systems we design must increasingly expose structure, context, and intent. Software now needs to serve two audiences—humans and machines.This issue explores what that means in practice, through the lens of MoonBit—a new language built from the ground up for WebAssembly (Wasm)-native performance and AI-first tooling.Our feature article examines how MoonBit responds to this dual-audience challenge: not with flashy syntax, but with a tightly integrated toolchain and a runtime model designed to be both fast and machine-consumable. And in a companion tutorial, MoonBit core contributor Zihang YE walks us through building a diff algorithm as a Wasm-ready CLI—an instructive example of the language’s design philosophy in action.Sign Up |AdvertiseSponsored:Web Devs: Turn Your Knowledge Into IncomeBuild the knowledge base that will enable you to collaborate with AI for years to come💰 Competitive Pay Structure⏰ Ultimate Flexibility🚀 Technical Requirements (No AI Experience Needed)Weekly payouts + remote work: The developer opportunity you've been waiting for!The flexible tech side hustle paying up to $50/hourApply NowBeyond Syntax: MoonBit and the Future of Language, Tooling, and AI WorkflowsThe mainstream dominance of Python, JavaScript, and Rust might suggest the age of new programming languages is over. A new breed of languages including MoonBit prove otherwise—not by reinventing syntax, but by responding to two tectonic shifts in software development: AI-assisted workflows, and the rise of Wasm-native deployment in cloud and edge environments.In edge computing and micro-runtime environments, developers need tools that start instantly, consume minimal memory, and run predictably across platforms. MoonBit’s design responds directly to this: it produces compact Wasm binaries optimized for streaming data, making it suitable for CLI tools, embedded components, and other low-overhead tasks.At the same time, AI workloads are exposing the limitations of dynamic languages like Python in large-scale systems. MoonBit’s founders note that Python’s “easy to learn” nature can become a double-edged sword for complex tasks. Even with optional annotations, its dynamic type system can hinder static analysis, complicating maintainability and scalability as codebases grow. In response, MoonBit introduces a statically typed, AI-aware language model with built-in tooling—formatter, package manager, VSCode integration—designed to support both human and machine agents.Rather than replacing Python, MoonBit takes a pragmatic approach. It explicitly embraces an “ecosystem reuse” model: it uses AI-powered encapsulation to lower the barrier for cross-language calls, avoiding reinvention of existing Python tools, and it aims to “democratize” static typing by coupling a strict type system with AI code generation.A Language is Not EnoughMoonBit is a toolchain-native languages, designed from the start to work smoothly with modern build, editing, and AI workflows. Unlike older languages that were retrofitted with new tools, MoonBit bundles its compiler, package manager, IDE, language server, and even an AI assistant as a cohesive whole. As the MoonBit team puts it, they “integrate a comprehensive toolchain from the start” to provide a streamlined coding experience.This stands in contrast to older systems languages like C/C++ and even to modern ones like Rust, which, despite its safety guarantees, still requires extra configuration to target Wasm. MoonBit by design treats Wasm as its primary compilation target – it is “Wasm-first”, built “as easy as Golang” but generating very compact Wasm output.Similarly, MoonBit was conceived to work hand-in-hand with AI tools. It offers built-in hooks for AI code assistance (more on this below) and even considers AI protocols like Anthropic’s Model Context Protocol (MCP) as first-class integration points. In MoonBit, the language + toolchain combo is now a single product, not an afterthought.MoonBit is not alone. Other new languages like Grain, Roc, and Hylo (formerly Val) each explore different priorities—from functional programming for the web to safe systems-level design and simplified developer experience.Grain prioritizes JS interop and functional ergonomics; Roc favors simplicity and speed, though it’s still pre-release; and Hylo experiments with value semantics and low-level control. MoonBit and these other languages make it clear that language design is soon going to become inseparable from its runtime, developer experience, and AI integration.Architecture and Developer ExperienceMoonBit’s architecture reflects a deliberate focus on toolchain integration and cross-platform performance. It is a statically typed, multi-paradigm language influenced by Go and Rust, supporting generics, structural interfaces, and static memory management. The compiler is designed for whole-program optimization, producing Wasm or native binaries with minimal overhead. According to benchmarks cited by the team, MoonBit compiled 626 packages in 1.06 seconds—approximately 9x faster than Rust in the same test set. Its default Wasm output is compact: a basic HTTP service compiles to ~27 KB, which compares favorably to similar Rust (~100 KB) and JavaScript (~8.7 MB) implementations. This is partly due to MoonBit’s support for Wasm GC, allowing it to omit runtime components that Rust must include.The syntax and structure are also optimized for machine parsing. All top-level definitions require explicit types, and interface methods are defined at the top level rather than nested. This flatter structure reportedly improves LLM performance by reducing key–value cache misses during code generation. The language includes built-in support for JSON, streaming data processing via iterators, and compile-time error tracking through control-flow analysis.Tooling is tightly coupled with the language. The moon CLI handles compilation, formatting, testing, and dependency management via the Mooncakes registry. The build system, written in Rust, supports parallel, incremental builds. A dedicated LSP server (distributed via npm) integrates MoonBit with IDEs, enabling features like real-time code analysis and completions. Debugging is supported via the CLI with commands like moon run --target js --debug, which link into source-level tools.A browser-based IDE preview is also available. It avoids containers in favor of a parallelized backend and includes an embedded AI assistant capable of generating documentation, suggesting tests, and offering inline explanations. According to the team, this setup is designed to support both developer productivity and AI agent interaction.MoonBit’s performance profile extends beyond Wasm. A recent release introduced an LLVM backend for native compilation. In one example published by the team, MoonBit outperformed Java by up to 15x in a numeric loop benchmark. The language also supports JavaScript as a compilation target, expanding deployment options across web and server contexts.AI Systems as Language ConsumersLLMs are no longer just helping developers write code—they’re starting to read, run, and interact with it. This shift requires rethinking what it means for a language to be “usable.”MoonBit anticipates this by treating AI systems as first-class consumers of code and tooling. Its team has adopted the MCP, an emerging open standard developed by Anthropic to enable LLMs to interface with external tools and data sources. MCP defines a JSON-RPC server architecture, allowing programs to expose structured endpoints that LLMs can query or invoke. MoonBit’s ecosystem includes a work-in-progress MCP server SDK written in MoonBit and compiled to Wasm, enabling MoonBit components to act as MCP-capable endpoints callable by models such as Claude.This integration reflects a broader shift in tooling. Modern documentation tools like Mintlify now expose semantically indexed content explicitly for AI retrieval. UIs and APIs are being annotated with machine-readable metadata. Even version control is evolving: newer workflows track units of change like (prompt + schema + tests), not just line diffs, enabling intent-aware versioning usable by humans and machines alike.MoonBit’s example agent on GitHub demonstrates this in practice, combining Wasm components (e.g. via Fermyon Spin), LLMs (such as DeepSeek), and MoonBit logic to automate development tasks. Under this model, protocols like MCP enable developers to publish AI-accessible functions directly from their codebases. MoonBit’s support for this workflow—via Wasm and first-party libraries—illustrates a growing view in language design: that AI systems are not just tools for writing code, but active consumers of it.Wasm’s Impact on Performance and PortabilityThree years ago, William Overton, a Senior Serverless Solutions Architect, said, Wasm "starts incredibly quickly and is incredibly light to run," making it well-suited to execute code across CDNs, edge nodes, and lightweight VMs with low startup latency and near-native speed. Today, the growing adoption of Wasm is reshaping expectations for both performance and cross-platform deployment.For MoonBit, Wasm is the default compilation target—not an optional backend. Its tooling is built around producing compact, portable Wasm modules. A simple web server in MoonBit compiles to a 27 KB Wasm binary—significantly smaller than equivalent builds in Rust or JavaScript. This reduction in size translates directly to faster load times and reduced memory usage, making MoonBit viable for constrained environments like embedded systems, CLI tools, and edge deployments.Standardized but still-emerging features like Wasm GC—and experimental ones like the Component Model—further reinforce this model. MoonBit has adopted both: its use of interface types and Wasm GC helps minimize runtime footprint. In a published comparison, MoonBit’s Wasm output was roughly an order of magnitude smaller than that of Rust, largely due to differences in memory management.Taken together, these developments suggest that Wasm is becoming a practical universal format for lightweight applications. For teams building portable utilities or latency-sensitive services, languages with Wasm-native support—such as MoonBit—offer tangible advantages over traditional container- or VM-based approaches.💡What This Means for YouMoonBit offers concrete lessons even if you never write MoonBit code. Key takeaways include:Ecosystem Continuity: Instead of building isolated ecosystems, consider bridging existing ones. MoonBit demonstrates that Python libraries can be reused as external modules—wrapped, if needed, by AI-generated shims. This reduces rewrites and enables gradual migration to safer or more performant languages.Integrated Tooling: Treat your language platform as a cohesive whole. MoonBit’s CLI (moon) unifies compilation, testing, debugging, and package management, minimizing context switches. Its build system exposes project metadata to IDEs via LSP integration. In your own tooling, aim for end-to-end flows powered by a single interface that integrates with the editor.Wasm and Runtime Strategy: For cross-platform deployment, prioritize Wasm as a primary target. MoonBit emits Wasm, JavaScript, or native binaries from a single compiler, and leverages Wasm GC for smaller outputs. Adopt language/toolchain combinations that support compact binaries and multiple backends without sacrificing performance.Data-Oriented Design: MoonBit’s JSON type, Iter abstraction, and pattern matching illustrate a clean model for streaming data. Architect utilities and pipelines to minimize allocations and intermediate state—use iterators, stream transforms, and statically analyzable data access patterns where possible.AI-Friendliness: MoonBit enforces top-level type annotations and flattens scope structures to support linear token generation. If you expect AI tooling to generate, refactor, or analyze your code, avoid deep nesting and implicit state—prefer clarity and structure that LLMs can parse efficiently.Static Checking + AI: MoonBit combines a strict type and error system with AI assistance to ease onboarding and boilerplate generation. This model lets developers write in a safe language without sacrificing velocity. For your own teams, consider pairing statically typed languages (or gradually typed ones like Python with type hints) with copilots that bridge ergonomics and enforcement.CLI Extensibility: The moon CLI supports modular growth—commands like moon new, moon run, and moon add are extensible by design. It can even serve as an LSP or MCP server. Treat your own CLIs as platform interfaces: design for plugin support, programmatic inspection, and long-term integration with AI and editor tooling.To see these ideas in practice—especially MoonBit’s type system, performance model, and Wasm-native tooling—Zihang YE, one of MoonBit’s core contributors, offers a hands-on walkthrough. His article walks us through the implementation of a diff algorithm using MoonBit, building a CLI tool that’s usable both by developers and AI systems via the MCP.Expert Insight: Implementing a Diff Algorithm in MoonBit by Zihang YEA hands-on introduction to MoonBit through the implementation of a version-control-grade diff tool.MoonBit is an emerging programming language that has a robust toolchain and relatively low learning curve. As a modern language, MoonBit includes a formatter, a plugin supporting VSCode, an LSP server, a central package registry, and more. It offers the friendly features of functional programming languages with manageable complexity.To demonstrate MoonBit’s capabilities, we’ll implement a core software development tool—a diff algorithm. Diff algorithms are essential in software development, helping identify changes between different versions of text or code. They power critical tools in version control systems, collaborative editing platforms, and code review workflows, allowing developers to track modifications efficiently. If you have ever used git diff then you are already familiar with such algorithms.The most widely used approach is Eugene W. Myers Diff algorithm, proposed in the paper “An O(ND) Difference Algorithm and Its Variations”. This algorithm is widely used for its optimal time complexity. Its space-efficient implementation and ability to find the shortest edit script make it superior to alternatives like patience diff or histogram diff and make it the standard in version control systems like Git and many text comparison tools such as Meld.In this tutorial, we’ll implement a version of the Myers Diff algorithm in MoonBit. This hands-on project is ideal for beginners exploring MoonBit, offering insight into version control fundamentals while building a tool usable by both humans and AI through a standard API.We will start by developing the algorithm itself, then build a command line application that integrates the Component Model and the MCP, leveraging MoonBit’s WebAssembly (Wasm) backend. Wasm is a blooming technology that provides privacy, portability, and near-native performance by running assembly-like code in virtual machines across platforms —qualities that MoonBit supports natively, making the language well-suited for building efficient cross-platform tools.By the end of this tutorial, you’ll have a functional diff tool that demonstrates these capabilities in action.Project SetupLet’s first create a new moonbit project by running:moon new --lib diffThe following will be the project structure of the code. The moon.mod.json contains the configuration for the project, while the moon.pkg.json contains the configuration for each package. top.mbt is the file we'll be editing throughout this post.├── LICENSE├── moon.mod.json├── README.md└── src ├── lib │ ├── hello.mbt │ ├── hello_test.mbt │ └── moon.pkg.json ├── moon.pkg.json └── top.mbtWe will be comparing two pieces of text, divided each into lines. Each line will include its content and a line number. The line number helps track the exact position of changes, providing important context about the location of changes when displaying the differences between the original and modified files.Read the Complete Tutorial🛠️Tool of the Week⚒️MCP Python SDK 1.9.2 — Structured Interfaces for AI-Native ApplicationsThe MCP is a standard for exposing structured data, tools, and prompts to language models. The MCP Python SDK brings this to production-ready Python environments, with a lightweight, FastAPI-compatible server model and first-class support for LLM interaction patterns. The latest release, v1.9.2 (May 2025), introduces:Streamable HTTP Support: Improved transport layer for scalable, resumable agent communication.Lifespan Contexts: Type-safe initialization for managing resources like databases or auth providers.Authentication: Built-in OAuth2 flows for securing agent-accessible endpoints.Claude Desktop Integration: Direct install into Anthropic’s desktop agent environment via mcp install.Async Tooling: Tools, resources, and prompts can now be async functions with full lifecycle hooks.Ideal for teams designing LLM-facing APIs, building AI-autonomous agents, or integrating prompt-based tools directly into Python services. It’s the protocol MoonBit already supports—and the interface LLMs increasingly expect.Read the Project Description📰 Tech BriefsArchitectural Patterns for AI Software Engineering Agents by Nati Shalom, Fellow at Dell NativeEdge: Examines how modern coding agents are being structured like real-world dev teams—using patterns such as code search, AST analysis, and version-controlled prompt templates to enable disciplined, multi-agent collaboration.A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP) by Ehtesham et al.: Offers an in-depth analysis of four emerging protocols designed to enhance interoperability among AI agents, examining their architectures, communication patterns, and security models.When the Agents Go Marching In: Five Design Paradigms Reshaping Our Digital Future by Adrian Levy, Senior UX Expert at CyberArk: Discusses how agentic UX is reshaping everything from collaboration to trust. If MoonBit is what languages might look like in this new world, Levy’s article shows how interfaces and systems are evolving to meet the same challenge, articulating the Agent Experience (AX) paradigm.Beyond augmentation: Agentic AI for software development by the Khare et al., Infosys Knowledge Institute: A practice-oriented report on how autonomous agents are moving from coding assistants to pipeline-integrated actors—handling complex dev tasks end-to-end and delivering measurable productivity gains in database and API generation.Emerging Developer Patterns for the AI Era by Yoko Li, Engineer, a16z: Explores how core concepts like version control, documentation, dashboards, and scaffolding are being reimagined to support AI agents as first-class participants in the software loop—not just code generators, but consumers, collaborators, and operators.That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor-in-Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Divya Anne Selvaraj
29 May 2025
Save for later

Deep Engineering #2: Dave Westerveld on Scalable API Testing and Orchestration

Divya Anne Selvaraj
29 May 2025
Shift-left strategies, parallel test design, and the realities of testing AI-driven APIs at scale#2Dave Westerveld on Scalable API Testing and OrchestrationShift-left strategies, parallel test design, and the realities of testing GraphQL, gRPC, and AI-driven APIs at scaleHi ,Welcome to the second issue of Deep Engineering.Postman’s 2024 State of the API report reveals that 74% of teams now follow an API-first approach, signaling a major shift in software development from the code-first approach. As APIs grow more complex—and as AI agents, gRPC, and GraphQL reshape how services communicate—the question is no longer whether to test early, but how to test well at scale.In this issue, we speak with Dave Westerveld—developer, author of API Testing and Development with Postman, and testing specialist with years of experience across both mature systems and early-stage teams. Drawing from his work on automation strategy, API integration, and scaling quality practices, Dave offers a grounded take on CI pipelines, parallel execution, and the tradeoffs of modern API protocols.You can watch the full interview and read the full transcript here— or keep reading for our distilled take on what makes modern test design both reliable and fastSign Up |AdvertiseSponsored:Webinar: Make Your App a Moving Targetand Leave Attackers GuessingLearn how your app could evolve automatically, leaving reverse engineers behind with every release.Hosted by Guardsquare featuring:Anton Baranenko-Product ManagerDate/time: Tuesday, June 10th at 4 PM CET (10 AM EDT)Register NowFrom REST to Agents: Why Systems Thinking Still Anchors Good API Testing with Dave WesterveldSome testing principles are foundational enough to survive revolutions in tooling. That’s the starting point for Dave Westerveld’s approach to API testing in the post-AI tech landscape.“There are testing principles that were valid in the '80s, before the consumer internet was even a thing. They were valid in the world of desktop and internet computing, and they’re still valid today in the world of AI and APIs.”And while the landscape has shifted dramatically in the last two years—Postman now ships an AI assistant, supports gRPC and GraphQL, and offers orchestration features for agentic architectures—Westerveld believes the best way to scale quality is to combine these new capabilities with timeless habits of mind: systems thinking, structured test design, and a bias for clarity over cleverness.Systems Thinking as the FoundationWesterveld argues that API testers need to operate with a systems-level understanding of the software they’re validating. He calls this:“(The) ability to zoom out and see the entire forest first, and then come back in and see the tree, and realize how it fits into the larger picture and how to approach thinking about and testing it.”In practice, that means asking not just whether an endpoint returns the expected result, but how it fits into the larger architecture and user experience. It means understanding when to run exploratory tests, when to assert workflows, and when to defer to contract validation.These instincts, he says, haven’t changed even as APIs have diversified:“Things like how to approach and structure your testing are … timeless when it comes to REST APIs. They haven’t fundamentally changed in the last 20 years—neither should the way you think about testing them.”What matters more than syntax is structure—how testers reason about coverage, maintainability, and feedback cycles.AI as Accelerant, Not OraclePostman’s Postbot is the most visible new capability in the platform’s AI strategy. Built atop LLM infrastructure, it can suggest test cases, generate assertions, and translate prompts into working scripts. Internally, it draws on your Postman data—collections, environments, history—to provide context-aware assistance.Westerveld sees the benefit, but draws a hard line between skilled and unskilled use:“For a skilled tester, someone with a lot of experience, these AI tools can help you move more quickly through tasks you already know how to do. Often, when you reach that level, you’ve done a lot of testing—you can look at something and say, ‘OK, this is what I need to do here.’ But it can get repetitive to implement some scripts or write things out again and again.”He frames AI as an accelerant: helpful when you understand the underlying logic, risky when you don’t.“For more junior people, there’s a temptation to use AI to auto-generate scripts without fully understanding what those scripts are doing. I think that’s the wrong approach early in your career, because once the AI gets stuck, you won’t know how to move forward.”This caution aligns with Postman’s architectural choices. Postbot uses a deterministic intent classifier to map prompts to supported capabilities, orchestrates tool usage through a controlled execution layer, and codifies outputs as structured in-app actions—such as generating test scripts, visualizing responses, or updating request metadata. Its latest iteration adds a memory-aware agent model that supports multi-turn conversations and multi-action workflows, but with strict boundaries around tool access and state transitions.In this, Westerveld agrees: AI-generated tests are often brittle and opaque. Use them, he advises,“more as a learning tool than an autocomplete tool.”Scaling Through Independence and RestraintOne of Westerveld’s strongest positions concerns test design: automated tests should be independent of each other. This is both a correctness and scalability concern. When teams overuse shared setup code or rely on common state, it breaks test parallelism and increases the chance of cascading failures.In Postman, reusable scripts are managed via the Package Library, which allows teams to store JavaScript test logic in named packages and import them into requests, collections, monitors, and Flows. While this enables consistency and reuse, Westerveld notes that it also introduces new failure points if not applied judiciously.“If something in the shared code breaks—or if a dependency the shared code relies on fails—you can end up with all your tests failing. …So, you have to be careful that a single point of failure doesn’t take everything down.”His solution: only abstract what truly reduces duplication, and mock where necessary.“In cases like that, it’s worth asking: ‘Do we really need this to be a shared script, or can we mock this instead?’ For example, if you're repeatedly calling an authentication endpoint that you're not explicitly testing, maybe you could insert credentials directly instead. That might be a cleaner and faster solution.”He also advocates for test readability. Tests, he says, should act as documentation. Pulling too much logic into shared libraries makes them harder to understand.“A well-written test tells you what the system is supposed to do. It shows real examples of expected behavior, even if it's not a production scenario. You can read it and understand the intent.But when you extract too much into shared libraries, that clarity goes away. Now, instead of reading one script, you’re bouncing between multiple files trying to figure out how things work. That hurts readability and reduces the test's value as living documentation.”Contracts, Specs, and CI IntegrationWith Postman’s new Spec Hub, teams can now author, govern, and publish API specifications across supported formats, helping standardize collaboration around internal and external APIs. As Westerveld puts it:“The whole point of having a specification is that it defines the contract we’re all agreeing to—whether that’s between frontend and backend teams, or with external consumers.”He recommends integrating schema checks as early as possible:“If you're violating that contract, the right response is to stop. … So yes, in that sense, we want development to ‘slow down’ when there’s a spec violation. But in the long run, this actually speeds things up by improving quality. You’re building on a solid foundation.”He advocates running validation as part of the developer CI pipeline—using lightweight checks at merge gates or as part of pull requests.This pattern aligns with what Postman now enables. Spec Hub introduces governance features such as built-in linting to enforce organizational standards by default. For CI integration, Postman’s contract validation tooling can be executed using the Postman CLI or Newman, both of which support running test collections—including those that validate OpenAPI contracts—within continuous integration pipelines. Together, these tools allow teams to maintain a single, trusted specification that anchors both collaboration and automated enforcement across environments.From REST to gRPC and GraphQLProtocol diversity is a reality for modern testers. Westerveld emphasizes that while core principles carry over across styles, testing strategies must adapt to the nuances of each protocol.gRPC, for example, provides low-level access through strongly typed RPC calls defined in .proto files. This increases both the power and the surface area of test logic.“One area where you really see a difference with modern APIs is in how you think about test coverage. The way you structure and approach that will be different from how you’d handle a REST API.That said, there are still similar challenges. For instance, if you’re using gRPC and you’ve got a protobuf or some kind of contract, it’s easier to test—just like with REST, if you have an OpenAPI specification.So, advocating for contracts stays the same regardless of API type. But with GraphQL or gRPC, you need more understanding of the underlying code to test them adequately. With REST, you can usually just look at what the API provides and get a good sense of how to test it.”GraphQL, he notes, introduces different complexities. Because it’s introspective and highly composable:“With GraphQL, there are a lot of possible query combinations… A REST API usually has simple, straightforward docs—‘here are the endpoints, here’s what they do’—maybe a page or two.With GraphQL, the documentation is often dynamically generated and feels more like autocomplete. You almost have to explore the graph to understand what’s available. It’s harder to get comprehensive documentation.”Postman supports both gRPC and GraphQL natively, enabling users to inspect schemas, craft requests, and run tests—all without writing code. But effective testing still depends on schema discipline and clarity. Westerveld points out that with GraphQL, where documentation can feel implicit or opaque, mock servers and contract-first workflows are critical. Postman helps here too, offering design features that can generate mocks and example responses directly from imported specs.Orchestration and Shift-Left StrategiesPostman’s recent support for the Model Context Protocol (MCP) and the launch of its AI Tool Builder mark a shift toward integrating agent workflows into the API lifecycle. Developers can now build and test MCP-compliant servers and requests using Postman’s familiar interface—lowering the barrier to designing autonomous agent interactions atop public or internal APIs.But as Westerveld points out, these advances don’t replace fundamentals. His focus remains on feedback speed, execution reliability, and test independence.“Shift-left and orchestration have been trending for quite a while. As an industry, we’ve been investing in these ideas for years—and we’re still seeing those trends grow. We’re pushing testing closer to where the code is written, which is great. At the same time, we’re seeing more thorough and complete API testing, which is another great development.”He notes a natural tension between shift-left principles and orchestration complexity:“Shift-left means running tests as early as possible, close to the code. The goal is quick feedback. But orchestration often involves more complexity—more setup, broader coverage—and that takes longer to run.So those two trends can pull in different directions: speed versus depth.”The path forward, he argues, lies in test design and execution architecture:“We’re pushing testing left and improving the speed of execution. That’s happening through more efficient test design, better hardware, and—importantly—parallelization.Parallelization is key. If we want fast feedback loops and shift-left execution, we need to run tests in parallel. For that to work, tests must be independent. That ties back to an earlier point I made—test independence isn’t just a nice-to-have. It’s essential for scalable orchestration.”“So I think test orchestration is evolving in a healthy direction. We’re getting both faster and broader at the same time. And that’s making CI/CD pipelines more scalable and effective overall.”💡What This Means for YouPrioritize test independence for parallelization: To scale reliably in CI/CD, design tests that don’t share state. This is a prerequisite for fast, parallel execution and essential for shift-left strategies to succeed at scale.Use AI tools to accelerate, not replace, expertise: Tools like Postbot can speed up repetitive tasks, but they’re most effective in the hands of experienced testers. Treat AI as a companion to structured thinking—not a shortcut for understanding.Be cautious with reusable scripts: Shared logic can improve maintainability, but overuse increases fragility. Mock where appropriate, and abstract only what truly reduces duplication without harming readability.Enforce contracts early through CI: Combine schema-first design with early validation in pull requests. Postman’s Spec Hub and CLI support this model, helping teams catch errors before they spread downstream.Adapt your strategy to protocol complexity: REST, gRPC, and GraphQL each demand different approaches to coverage and validation. Understand the shape of your APIs—and tailor your tooling, mocks, and tests accordingly.If you are looking to implement the principles discussed in our editorial—from contract-first design to CI integration, Westerveld’s book, API Testing and Development with Postman, offers a clear, hands-on walkthrough. Here is an excerpt from the book which explains how contract testing verifies that APIs meet agreed expectations and walks you through setting up and validating these tests in Postman using OpenAPI specs, mock servers, and automated tooling.Expert Insight: Using Contract Testing to Verify an APIAn Excerpt from "Chapter 13: Using Contract Testing to Verify an API" in the book API Testing and Development with Postman, Second Edition by Dave Westerveld (Packt, June 2024)In this chapter, we will learn how to set up and use contract tests in Postman, but before we do that, it’s important to make sure that you understand what they are and why you would use them. So, in this section, we will learn what contract testing is. We will also learn how to use contract testing and then discuss approaches to contract testing – that is, both consumer-driven and provider-driven contracts. To kick all this off, we are going to need to know what contract testing is. So, let’s dive into that.What is contract testing?…Contract testing is a way to make sure that two different software services can communicate with each other. Often, contracts are made between a client and a server. This is the typical place where an API sits, and in many ways, an API is a contract. It specifies the rules that the client must follow in order to use the underlying service. As I’ve mentioned already, contracts help make things run more smoothly. It’s one of the reasons we use APIs. We can expose data in a consistent way that we have contractually bound ourselves to. By doing this, we don’t need to deal with each user of our API on an individual basis and everyone gets a consistent experience.However, one of the issues with an API being a contract is that we must change things. APIs will usually change and evolve over time, but if the API is the contract, you need to make sure that you are holding up your end of the contract. Users of your API will come to rely on it working in the way that you say it will, so you need to check that it continues to do so.When I bought my home, I took the contract to a lawyer to have them check it over and make sure that everything was OK and that there would be no surprises. In a somewhat similar way, an API should have some checks to ensure that there are no surprises. We call these kinds of checks contract testing. An API is a contract, and contract testing is how we ensure that the contract is valid, but how exactly do you do that?Read the Complete ExcerptAPI Testing and Development with Postman, Second Edition by Dave Westerveld (Packt, June 2024) covers everything from workflow and contract testing to security and performance validation, the book combines foundational theory with real-world projects to help developers and testers automate and improve their API workflows.Use code POSTMAN20 for 20% off at packtpub.com.Get the Book🛠️Tool of the Week⚒️Bruno 2.3.0 — A Git-Native API Client for Lightweight, Auditable WorkflowsBruno is an open source, offline-first API client built for developers who want fast, version-controlled request management. The latest release, version 2.3.0 (May 2025), adds capabilities that push it further into production-ready territory:OAuth2 CLI Flows: Streamlined authentication for secure endpoints.Secrets Integration: Native support for AWS Secrets Manager and Azure Key Vault.OpenAPI Sync: Improved support for importing and validating OpenAPI specs.Dev-Centric Design: Files are stored in plain text, organized by folder, and easy to diff in Git.It’s a strong fit for small teams, CI/CD testing, or cases where you want to keep everything under version control—without a heavyweight UI.Westerveld on Bruno“I recently tried Bruno. I liked it—I thought their approach to change management was really well designed. But it didn’t support some of the features I rely on. I experimented with it on a small project, but in the end, I decided I still needed Postman for my main workflows.”“That said, I still open Bruno now and then. It’s useful, simple, and interesting—but we’re not ready to adopt it team-wide.”Westerveld's advice: evaluate new tools with clear use cases in mind. Bruno may not replace your primary API platform overnight, but it’s a valuable addition to your workflow toolkit—especially for Git-native or OpenAPI-first teams.Read more about Bruno📰 Tech Briefs2024 State of the API Report: Postman’s 2024 State of the API report reveals that 74% of teams now follow an API-first approach, linking it to faster API delivery, improved failure recovery, rising monetization, and growing reliance on tools like Postman Workspaces, Spec Hub, and Postbot to navigate collaboration, governance, and security challenges.The MCP Catalog: Postman’s MCP Catalog offers a live, collaborative workspace to discover and test Model Context Protocol (MCP) servers from verified publishers like Stripe, Notion, and Perplexity—enabling developers to prototype LLM-integrated tools quickly using ready-to-run Postman Collections and JSON-RPC 2.0 examples.If an AI agent can’t figure out how your API works, neither can your users: This article argues that improving developer experience (DX) for LLM-powered agents (AX) is now table stakes, advocating for consistent design, clear docs, actionable errors, and golden-path smoke tests as shared foundations for both human and machine usability.15 Best API Testing Tools in 2025: Free and Open-source: Reviews 15 tools covering both established options like Postman, SoapUI, and JMeter, as well as emerging platforms such as Apidog, which offers an all-in-one solution for API design, testing, and mocking—positioning itself as a powerful alternative to fragmented toolchains.The new frontier of API governance: Ensuring alignment, security, and efficiency through decentralization: Decentralized API governance replaces rigid control with shared responsibility, combining design-time standards and runtime enforcement—augmented by AI—to enable secure, scalable, and autonomous API development across distributed teams.That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor in Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0

Divya Anne Selvaraj
22 May 2025
Save for later

Deep Engineering #1: Patrice Roy on Modern Memory Management in C++

Divya Anne Selvaraj
22 May 2025
What RAII, lifetime profiles, and memory-safe languages mean for your codebase#1Patrice Roy on Modern Memory Management in C++What RAII, lifetime profiles, and memory-safe languages mean for your codebaseHi ,Welcome to the very first issue of Deep Engineering.With memory safety behind more than 70% of all known security vulnerabilities (CVEs), the push toward safer programming has become a matter of urgency. Should we rewrite in Rust and Go, or modernize how we write C++?To answer this question, we turned to Patrice Roy—author of C++ Memory Management, long-time member of the ISO C++ Standards Committee, and veteran educator with nearly three decades of experience training systems programmers.You can watch the full interview and read the full transcript here—or keep reading for our distilled take on what modern memory management should look like in practice.Sign Up |AdvertiseRAII and Ownership: Type-Driven Memory Management in Modern C++ with Patrice RoyOne of the most important lessons in modern C++ is clear: "avoid manual memory handling if you can." As Patrice Roy explains, C++’s automatic storage and Resource Acquisition Is Initialization (RAII) mechanisms “work really well” and should be the first tools developers reach for.Modern C++ favors type-driven ownership over raw pointers and new/delete. Smart pointers and standard containers make ownership explicit and self-documenting. For example, std::unique_ptr signals sole ownership in the code itself—eliminating ambiguity about responsibility. As Roy puts it:“You don’t have to ask who will free the memory—it’s that guy. He’s responsible. It’s his job.”Shared ownership is handled by std::shared_ptr, with reference-counted lifetime management. The key idea, Roy stresses, is visibility: ownership should be encoded in the code, not left to comments or convention. This design clarity eliminates entire classes of memory bugs.The same principle applies to Standard Library containers. Types like std::vector manage memory internally—allocation, deallocation, resizing—so developers can focus on program logic, not logistics. RAII and the type system eliminate leaks, double frees, and dangling pointers, and improve exception safety by guaranteeing cleanup during stack unwinding.As C++ veteran Roger Orr quipped:“The most beautiful line of C++ code is the closing brace,”because it signals the automatic cleanup of all resources in scope.The takeaway is simple: default to smart pointers and containers. Use raw memory only when absolutely necessary—and almost never in high-level code.Knowing When to Go Manual (and How to Do It Safely)Manual memory management still has its place in C++, especially in domains where performance, latency, or control over allocation patterns is critical. But as Roy emphasizes, developers should measure before reaching for low-level strategies:“The first thing you should do is measure. Make sure the allocator or memory pool you already have doesn’t already do the job. If you're spending time on something, it has to pay off.”He cites high-frequency trading as an example where even small delays can be unacceptable:“Say you’re working in the finance domain, and you have nanosecond-level constraints because you need to buy and sell very fast—then yes, sometimes you’ll want more control over what’s going on.”In such cases, allocation must be avoided during critical execution windows. One option is to pre-allocate memory buffers on the stack.Modern C++ offers fine-grained control through allocator models. Roy contrasts the traditional type-based model with the polymorphic memory resources (PMR) model introduced in C++17:“Since C++17, we’ve had the PMR (Polymorphic Memory Resource) model... a PMR vector has a member—a pointer to its allocator—instead of having it baked into the type.”While PMR introduces a layer of indirection via virtual function calls, Roy notes that the overhead is usually negligible:“Allocation is a costly operation anyway. So the indirection of a virtual function call isn’t much of a cost—it’s already there in the background.”But when even that cost is too high, the traditional model may be more appropriate:“If you're in a domain where nanoseconds matter, even that indirection might be too much. In that case, the traditional model... may be a better choice, even if you have to write more code.”Roy’s guidance is clear: measure first, optimize only when necessary, and understand the trade-offs each model presents.Trends and Tools for Memory Safety in C++Despite decades of hard-won expertise, C++ developers still face memory safety risks—from dangling references and buffer overruns to subtle use-after-free bugs. The good news: the C++ ecosystem is evolving to tackle these risks more directly, through improved diagnostics, optional safety models, and support from both compilers and hardware.Lifetime Safety, Profiles, and ContractsRoy identifies dangling references as one of the most persistent and subtle sources of undefined behavior in C++:“The main problem we still have is probably what we call dangling references... Lifetime is at the core of object and resource management in C++.”Even modern constructs like string_view can trigger lifetime errors, particularly when developers return references to local variables or temporaries. To address this, the ISO C++ committee has launched several initiatives focused on improving lifetime safety.Roy highlights ongoing work by Herb Sutter and Gašper Ažman (P3656 R1) to introduce lifetime annotations and static analysis to make these bugs less likely:“They’re trying to reduce undefined behavior and make lifetime bugs less likely.”C++23 introduced an optional Lifetime Safety Profile, based on the C++ Core Guidelines, which flags unsafe lifetime usage patterns. This fits into a broader trend toward compiler-enforced profiles—opt-in language subsets proposed by Bjarne Stroustrup that would strengthen guarantees around type safety, bounds checking, and lifetimes.Roy also mentions a proposal of his for C++29, allowing developers to mark ownership transfer explicitly in function signatures—reinforcing ownership visibility and lifetime clarity.Alongside profiles, contracts are expected in C++26. These language features will allow developers to specify preconditions and postconditions directly in code:“Let you mark preconditions and postconditions in your functions... written into the code—not just as prose.”While not limited to memory management, contracts contribute to overall safety by formalizing intent and reducing the likelihood of incorrect usage.Tools for Safer C++Alongside language improvements, developers today have access to a mature suite of static and runtime tools for detecting memory errors.Sanitizers: First-Line DefensesSanitizers have become essential for modern C++ development. Tools like AddressSanitizer (ASan), MemorySanitizer (MSan), and ThreadSanitizer (TSan) instrument the compiled code to detect memory bugs during testing. Roy endorses their use—even if he doesn’t run them constantly:“They’re awesome…I don’t use them much... I think everyone should use them once in a while... They should be part of everyone’s test process.”He encourages developers to experiment and weigh the costs.Compiler Warnings and Static AnalysisRoy also recommends increasing compiler warning levels to catch memory misuse early:“If you're using Visual Studio, try /W4... Maybe not /Wall with GCC, because it's too noisy, or with Clang—but raise the warning levels a bit.”Static analysis tools like the Clang Static Analyzer and Coverity inspect code paths without execution and flag issues such as memory leaks, double frees, and buffer overruns.Hardware Support: MTE and BeyondOn the hardware front, ARM’s Memory Tagging Extension (MTE) offers runtime memory validation through tagged pointers. Available on ARMv9 (e.g. recent Android devices), MTE can catch use-after-free and buffer overflow bugs with minimal runtime impact.Where MTE isn't available, lightweight runtime tools help fill the gap. Google’s GWP-ASan offers probabilistic detection of heap corruption in production, while Facebook’s CheckPointer (in Folly) builds bounds-checking into smart pointer types.Memory-Safe Languages: The New ParadigmNo discussion of memory management today is complete without addressing the elephant in the room: memory-safe languages. Two prominent examples are Rust and Go, which take almost opposite approaches to solve the same problem. “The genius of Go is that it has a garbage collector. The genius of Rust is that it doesn’t need one,” as John Arundel of Bitfield consulting cleverly puts it.Rust: Memory Safety by DesignRust is designed from the ground up to eliminate classes of memory errors common in languages like C and C++. However, Rust’s safety comes with a learning curve, especially for developers accustomed to manually managing lifetimes. Despite this, according to JetBrains' 2025 Developer Ecosystem Survey, Rust has seen significant growth, with over 2.2 million developers using it in the past year and 709,000 considering it their primary language. While Rust's syntax can be initially challenging, many developers find that its multi-paradigm nature and strong safety guarantees make it a robust choice for complex systems development.Researchers have also proposed refinement layers atop C2Rust that automatically reduce the use of unsafe code and improve idiomatic style. One such technique, described in a 2022 IEEE paper, uses TXL-based program transformation rules to refactor translated Rust code—achieving significantly higher safe-code ratios than raw C2Rust output.As one developer quoted by JetBrains put it, Rust is no longer just a safer C++; it's “a general-purpose programming language” powering everything from WebAssembly to command-line tools and backend APIs. And for those coming from legacy C or C++ environments, Rust doesn't demand a full rewrite—interoperability, through FFI and modular integration, allows new Rust code to safely coexist with existing infrastructure.Go: Simplicity Through Runtime SafetyGo adopts a runtime approach to memory safety, deliberately removing the need for developers to manage memory manually. The Go team’s recent cryptography audit—conducted by Trail of Bits and covering core packages like crypto/ecdh, crypto/ecdsa, and crypto/ed25519—underscored this design strength. The auditors found no exploitable memory safety issues in the default packages. Only one low-severity issue was found in the legacy Go+BoringCrypto integration, which required manual memory management via cgo and has since been deprecated. As the Go authors noted, “we naturally rely on the Go language properties to avoid memory management issues.”By sidestepping manual allocation and pointer arithmetic, Go reduces the attack surface for critical bugs like buffer overflows and dangling pointers. While garbage collection does introduce latency trade-offs that make Go less suitable for hard real-time systems, its safety-by-default model and well-tested cryptographic APIs make it ideal for server-side development, cloud infrastructure, and security-sensitive applications where predictable correctness matters more than raw latency.Go’s simplicity also extends to API design. The audit emphasized the team’s emphasis on clarity, safety, and minimalism: prioritizing security over performance, avoiding complex assembly where possible, and keeping code highly readable to support effective review and auditing.The Status of C and C++The rise of memory-safe languages like Rust and Go has put C and C++ under scrutiny—especially in safety-critical domains. The U.S. White House Office of the National Cyber Director now recommends using memory-safe languages for new projects, citing their ability to prevent classes of vulnerabilities inherent in manual memory management.But, replacing C and C++ wholesale is rarely feasible. Most real-world systems will continue to mix languages, gradually modernizing existing C++ code with safer idioms and tooling.Modern C++ is adapting. While the language remains low-level, initiatives like the Core Guidelines, contracts, and lifetime safety proposals are making it easier to write safer code.💡What This Means for YouDefault to RAII and Smart Pointers in C++: Use unique_ptr, shared_ptr, and standard containers to make ownership explicit. Avoid raw new/delete unless absolutely necessary—and never in high-level code.Measure Before You Optimize: Before adopting custom allocators or manual strategies, profile your code. Built-in allocators and containers are often sufficient and safer.Use the Right Allocator Model for the Job: Favor PMR for flexibility. Use the traditional model only if profiling shows the indirection cost matters.Start with Safety-First Defaults: Structure your C++ projects around safe idioms. Apply Core Guidelines, and integrate sanitizers into your CI pipeline to catch memory errors early.Raise Compiler Warnings: Turn on high warning levels (/W4 for MSVC, -Wall -Wextra for Clang/GCC) and treat warnings as errors to surface issues before they reach production.Experiment with Safety Profiles and Contracts: Stay ahead by adopting upcoming C++ features like lifetime annotations and design-by-contract support (C++26 and beyond).Don’t Rely on Comments—Express Ownership in Code: As Roy stresses, ownership must be visible in the code itself. Let types, not prose, determine who frees memory.If you found the insights in our editorial useful, Roy’s book, C++ Memory Management (Packt, March 2025), offers a much deeper exploration, including tips on avoiding common pitfalls and embracing C++17/20/23 features for better memory handling. Here is an excerpt from the book which explains arena-based memory management in C++, using a custom allocator for a game scenario to demonstrate how preallocating and sequentially allocating memory can reduce fragmentation and improve performance.Expert Insight: Arena-based memory managementAn Excerpt from "Chapter 10: Arena-Based Memory Management and Other Optimizations" in the book C++ Memory Management by Patrice Roy (Packt, March 2025)The idea behind arena-based memory management is to allocate a chunk of memory at a known moment in the program and manage it as a “small, personalized heap” based on a strategy that benefits from knowledge of the situation or of the problem domain.There are many variants on this general theme, including the following:In a game, allocate and manage the memory by scene or by level, deallocating it as a single chunk at the end of said scene or level. This can help reduce memory fragmentation in the program.When the conditions in which allocations and deallocations are known to follow a given pattern or have bounded memory requirements, specialize allocation functions to benefit from this information.Express a form of ownership for a group of similar objects in such as way as to destroy them all at a later point in the program instead of doing so one object at a time.The best way to explain how arena-based allocation works is probably to write an example program that uses it and shows both what it does and what benefits this provides. We will write code in such a way as to use the same test code with either the standard library-provided allocation functions or our own specialized implementation, depending on the presence of a macro, and, of course, we will measure the allocation and deallocation code to see whether there is a benefit to our efforts.Read the Complete ExcerptIn this hands-on guide to mastering memory in modern C++, Roy covers techniques to write leaner and safer C++ code, from smart pointers and standard containers to custom allocators and debugging tools. He also dives into examples across real-time systems, games, and more, illustrating how to balance performance with safety.Use code MEMORY20 for 20% off at packtpub.comGet the Book🛠️Tool of the Week⚒️Valgrind 3.25.0 — Classic Memory Debugging, Now with Broader Platform SupportValgrind has long been a staple for memory debugging in C and C++ applications. The latest release, version 3.25.0, brings significant enhancements:Expanded Platform Support: Now includes RISCV64 Linux, ARM/Android, and preliminary support for macOS 10.13.Performance Improvements: Introduces GDB “x” packet support for faster memory reads and zstd-compressed debug sections.Enhanced Tooling: Continues to offer tools like Memcheck for detecting memory leaks and invalid accesses, and Massif for heap profiling.Read the Valgrind User Manual📰 Tech Briefs2025 EuroLLVM - Recipe for Eliminating Entire Classes of Memory Safety Vulnerabilities in C and C++: Apple is addressing memory safety in C-based languages by combining compiler-enforced programming models, developer annotations, and runtime checks to eliminate entire classes of vulnerabilities without requiring a full rewrite in memory-safe languages.Secure by Design Alert: Eliminating Buffer Overflow Vulnerabilities: CISA and the FBI issued a Secure by Design alert urging software manufacturers to eliminate buffer overflow vulnerabilities—designating them as "unforgivable" defects—and recommending memory-safe languages, runtime checks, and secure development practices to prevent exploitation and reduce systemic memory safety risks.Rustls Server-Side Performance: Rustls 0.23.17, an open source TLS library written in Rust that provides a memory-safe alternative to C-based libraries like OpenSSL, has now improved server-side TLS performance by scaling efficiently across cores, reducing handshake latency, and minimizing contention in ticket resumption.Tagged Pointers for Memory Safety: Explains how to implement memory-safe tagged pointers in C++—using runtime-generated tags to detect use-after-free errors—with minimal performance overhead and compatibility with standard allocators.Taking a Look at Database Disk, Memory, and Concurrency Management: Offers a comprehensive, hands-on walkthrough of how modern databases manage disk I/O, memory, transactions, and concurrency—covering buffer pools, write-ahead logging, and locking mechanisms—through a simplified database implementation in Go.That’s all for today. Thank you for reading the first issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.Take a moment to fill out this short survey—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.We’ll be back next week with more expert-led content.Stay awesome,Divya Anne SelvarajEditor in Chief, Deep EngineeringTake the Survey, Get a Packt Credit!If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 65
Success Subscribed successfully to !
You’ll receive email updates to every time we publish our newsletters.
Modal Close icon
Modal Close icon