





















































Join Snyk's Staff Developer Advocate Sonya Moisset on August 28th at 11:00AM ET to learn:
✓ How Vibe Coding is reshaping development and the risks that come with it
✓ How Snyk secures your AI-powered SDLC from code to deployment
✓ Strategies to secure AI-generated code at scale
Earn 1 CPE Credit!
Hi
Welcome to the twelfth issue of Deep Engineering
“The challenge isn’t how to train the biggest model—it’s how to make a small one reliable.”
That’s how Tony Dunsworth sums up his work building AI infrastructure for 911 emergency systems. In public safety, failure can have devastating effects with lives at stake. You’re also working with limited compute, strict privacy mandates, and call centers staffed by only two to five people at a time. There’s no budget for a proprietary AI stack. And there’s no tolerance for downtime.
Dunsworth holds a Ph.D. in data science, with a dissertation focused on forecasting models for public safety answering points. For over 15 years, he’s worked across the full data lifecycle—from backend engineering to analytics and deployment—in some of the most sensitive domains in government. Today, he leads AI and data efforts for the City of Alexandria, where he’s building secure, on-prem AI systems that help triage calls, reduce response time, and improve operational resilience.
To understand what it takes to design AI systems that are cost-effective, maintainable, and safe to use in critical systems, we spoke with Dunsworth about his use of synthetic data, model quantization, open-weight LLMs, and risk validation under operational load.
You can watch the complete interview and read the transcript here or scroll down for our synthesis of what it takes to build mission-ready AI with small teams, tight constraints, and hardly any margin for error.
Ending on August 25 11:00 AM PT
Learn from top-rated books such as C++ Memory Management,C++ in Embedded Systems, Asynchronous Programming with C++, and more. Elevate your C++ skills and help support The Global FoodBanking Network with your purchase!
AI adoption in the public sector is accelerating but slowly. A June 2025 EY survey of government executives found 64% see AI’s cost-saving potential and 63% expect improved services, yet only 26% have integrated AI across their organizations. The appetite is there, but so are steep barriers. 62% cited data privacy and security concerns as a major hurdle – the top issue – along with lack of a clear data strategy, inadequate infrastructure and skills, unclear ROI, and funding shortfalls. Public agencies face tight budgets, limited tech staff, legacy hardware, and strict privacy mandates, all under an expectation of near-100% uptime for critical services.
Public safety systems epitomize these constraints. Emergency dispatch centers can’t ship voice transcripts or medical data off to a cloud API that might violate privacy or go down mid-call. They also can’t afford fleets of cutting-edge GPUs; many 9-1-1 centers run on commodity servers or even ruggedized edge devices. AI solutions here must fit into existing, resource-constrained environments. For engineers building AI systems in production, scale isn't always the hard part—constraints are.
By treating public safety as a high-constraint exemplar, we can derive patterns applicable to other domains like healthcare (with HIPAA privacy and limited hospital IT), fintech (with heavy regulation and risk controls), logistics (where AI might run on distributed edge devices), embedded systems (tiny hardware, real-time needs), and regulated enterprises (compliance and uptime demands). In all such cases, “bigger” AI is not necessarily better – adaptability, efficiency, and trustworthiness determine adoption.
Open models come with transparent weights and permissive licenses that allow self-hosting and fine-tuning, which is crucial when data cannot leave your premises. In 2025, several open large language models (LLMs) have emerged that combine strong capabilities with manageable size:
Other open models like Falcon 180B (UAE’s giant model) or BLOOM 176B (the BigScience community model) exist, but their sheer size makes them less practical in constrained settings. The models above strike a better balance. Table 1 compares these representative options by size, hardware needs, and privacy posture:
Table 1: Open-source LLMs suited for constrained deployments, compared by size, infrastructure needs, and privacy considerations.
Choosing an open model allows agencies to avoid vendor lock-in and meet governance requirements. By fine-tuning these models in-house on domain-specific data, teams can achieve high accuracy without sending any data to third-party services. However, open models do come with trade-offs.
The biggest of these Dunsworth says:
“ is understanding that the speed is going to be a lot slower. Even with my lab having 24 gigs of RAM, or my office lab having 32 gigs of RAM, they are still noticeably slower than if I'm using an off-site LLM to do similar tasks. So, you have to model your trade-off, because I have to also look at what kind of data I'm using—so that I'm not putting protected health information or criminal justice information out into an area where it doesn't belong and where it could be used for other purposes. So, the on-premises local models are more appealing for me because I can do more with them—I don't have the same concern about the data going out of the networks.”
That’s where techniques like Quantization and altering the model architecture come in – effectively scaling down the model to meet your hardware where it is:
"a way to optimize an LLM by making it work more efficiently with fewer resources—less memory consumption. It doesn’t leverage all of the parameters at once, so it’s able to package things a little bit better. It packages your requests and the tokens a little more efficiently so that the model can work a little faster and return your responses—or return your data—a little quicker to you, so that you can be more interactive with it.”
By reducing model weight precision (e.g. from 16-bit to 4-bit), quantization can shrink memory footprint dramatically and speed up inference with minimal impact on accuracy. For example, a 70B model quantized to 4-bit effectively behaves like a ~17B model in memory terms, often retaining ~95% of its performance. Combined with efficient runtimes (like Meta’s GGML for CPU and GPU kernels optimized for int4/int8 arithmetic), quantization lets even a single GPU PC host models that previously needed a whole cluster.
Dunsworth’s field lab architecture offers a practical view into how these techniques are actually used. “I’ve been doing more of the work with lightweight or smaller LLM models because they’re easier to get ramped up with,” he says—emphasizing that local deployments reduce risk of data exposure while enabling fast iteration. But even with decent hardware (24–32 GB RAM), resource contention remains a bottleneck:
“The biggest challenge is resource base… I’m pushing the model hard for something, and at the same time I’m pushing the resources… very hard—it gets frustrating.”
That frustration led him to explore quantization hands-on, particularly for inference responsiveness. “I’ve got to make my work more responsive to my users—or it’s not worth it.” Quantization, local hosting, and iterative fine-tuning become less about efficiency for its own sake, and more about achieving practical performance under constraints—especially when “inexpensive” also has to mean maintainable.
In practice, deploying a lean model in a mission-critical setting also demands robust inference software. Projects like vLLM have emerged to maximize throughput on a given GPU by intelligently batching and streaming requests. vLLM’s architecture can yield 24× higher throughput than naive implementations by scheduling token generation across multiple requests in parallel.
Data is the fuel for AI models, but in public safety and healthcare, real data is often sensitive or scarce. This is where synthetic data pipelines have become game-changers, allowing teams to generate realistic, statistically faithful data that mimics real-world patterns without exposing real personal information. By using generative models or simulations to create synthetic call logs, incident reports, sensor readings, etc., engineers can vastly expand their training and testing datasets while staying privacy-compliant.
Dunsworth, who builds AI infrastructure for emergency services, describes this approach. Rather than rely on real 911 call logs, Dunsworth reconstructs patterns from operational data to generate synthetic equivalents. “I take it apart and find the things I need to see in it… so when I make that dataset, it reflects those ratios properly,” he explains. This includes recreating distributions across service types—for e.g. police, fire, medical—and reproducing key statistical features like call arrival intervals, elapsed event times, or geospatial distribution.
“For me, it’s a lot of statistical recreation… I can feed that into an AI model and say, ‘OK, I need you to examine this.’”
Dunsworth’s pipeline is entirely Python-based and open source. He uses local LLMs to iteratively refine the generated datasets: “I build a lot of it, and then I pass it off to my local models to refine what I’m working on.” That includes teaching the model to correct for misleading assumptions—such as when synthetic time intervals defaulted to normal distributions, even though real data followed Poisson or gamma curves. He writes scripts to analyze and feed the correct distributions back into generation:
“Then it tells me, ‘Here’s the distribution, here are its details.’ And I feed that back into the model and say, ‘OK, make sure you’re using this distribution with these parameters.’”
The shift to synthetic pipelines can solve multiple problems at once: data scarcity, privacy compliance, and edge-case testing. For training, synthetic records make it easy to balance class frequency—whether you’re modeling rare floods or unusual fraud patterns. For validation, they offer controlled stress tests that historical logs simply can’t provide.
“I use it in testing my analytics models… then I can have my model do the same thing. I make sure that they match.”
Unlike real-world events, synthetic scenarios can be manufactured to simulate extreme or simultaneous failures—testing the AI under precise conditions.
Early adoption wasn’t smooth, Dunsworth says. “The biggest hurdle was pushback from peers at first,” he noted. But that changed as datasets improved in realism, and the utility of using synthetic data for demos, teaching, or sandbox testing became obvious.
“Now people are more interested… I keep it under an open-source license. Just give me the improvements back—that’s the last rule.”
A crucial distinction is that synthetic ≠ anonymized. Rather than redact real identities, Dunsworth starts from a clean slate, using only statistical patterns from real data as seed material. He avoids copying event narratives and even manually inspects Faker-generated names to ensure no accidental leakage:
“I don’t reproduce narratives… I go through my own list of people I know to make sure that name doesn’t show up.”
He also aligns his work with formal ethical frameworks.
“I was very fortunate throughout my education—through my software engineering courses, my analytics and data science courses at university—that ethics was stressed as one of the most important things we needed to focus on alongside practice. So, I have very solid ethical programming training.”
Dunsworth also reviews frameworks like the NIST AI RMF to maintain development guardrails.
These practices map directly onto any domain where real data is hard to access—medical records, financial logs, customer transcripts, or operational telemetry. The principles are universal:
For teams building AI tools without access to real production data, this is a practical playbook. You don’t need a GPU farm or proprietary toolchain. You need controlled pipelines, structured validation, and a robust sense of responsibility. As Dunsworth says:
“I feel confident that the people I work with… are all operating from the same place: protecting as much information as we can… making sure we're not exposing anything that we can’t.”
Building AI systems for constrained environments isn’t only about latency, memory, or cost. It’s also about failure and how to survive it.
Dunsworth’s work in emergency response illustrates the stakes clearly, but his framework for risk mitigation is widely transferable: define the use case tightly, control where the data flows, and validate under load—not just in ideal cases.
“One of the biggest risk mitigations is starting out from the beginning—knowing what you want to use AI for and how you define how it’s working well.”
Instead of treating vendor-provided models as turnkey solutions, Dunsworth interrogates the entire data path—from ingestion through inference to retention. That includes third-party dependencies:
“What data am I feeding, and how do I work with that vendor to make sure the data is being used the way I intend?” In sensitive environments, he keeps training in-house: “That way… it doesn't leave my organization.”
Success is measured operationally:
“If you're using it(AI) just to say, ‘Well, we're using AI,’ I'm going to be the first one to raise my hand and say, ‘Stop.’” Instead, AI is validated through concrete outcomes: “It’s enabled our QA manager to process more calls… improving our ability to service our community.”
For AI systems that might break under pressure, Dunsworth prescribes a straightforward and brutal regimen:
“Get synthetic data together to test it (the model)—and then just, in the middle of your testing lab, hit it all at once. Hit it with everything you've got, all at the same time.”
Only if the system remains responsive under full overload does it move forward. “If it continues to perform well, then you have some confidence… it’s still going to be reliable enough for you to continue to operate.”
Failure is expected but it must be observable and recoverable.
“Even if it breaks… we know it can still recover and come back to service quickly.”
One real-world implementation of this mindset is LogiDebrief, a QA automation system deployed in the Metro Nashville Department of Emergency Communications. Developed to audit 9-1-1 calls in real time, LogiDebrief formalizes emergency protocol as logical rules and then uses an LLM to interpret unstructured audio transcripts, match them against those rules, and flag any deviations. As Chen et al. explain: “The framework formalizes call-taking requirements as logical specifications, enabling systematic assessment of 9-1-1 calls against procedural guidelines”.
In practice, it executes a three-stage pipeline:
This enables automated QA for both AI and human decisions—a form of embedded auditing that surfaces failure as it happens. In deployment, LogiDebrief reviewed 1,701 calls and saved over 311 hours of manual evaluator time. More importantly, when something procedural is missed—like a mandatory question for a specific incident type—it gets flagged and can be corrected in downstream training, improving both model performance and human compliance.
When one early AI analytics platform failed under edge-case data—“it just said, ‘I got nothing’”—Dunsworth scrapped the codebase entirely. Why? The workflow made sense to him, but not to his users.
“I assumed I could develop an analytics flow that would work for everybody… it worked well for me, but it didn’t work well for my target audience.”
This led to a major design shift. Instead of building one global solution, he pivoted to “micro-solutions that will do different things inside the same framework.” This insight should be familiar to any engineer who’s seen a service fail not because it was wrong, but because no one could use it.
“If they’re not going to use it—it doesn’t work.”
Looking forward, Dunsworth is focused on redirecting complexity, not increasing it. One focus area: offloading non-emergency calls using AI assistants. “It really is a community win-win, because now we can get those services out faster.”
Another: multilingual responsiveness. In cities where services span four or more languages, Dunsworth sees multilingual AI as a matter of equity and latency:
“If we can improve the quality and speed of translation… (that can save a life.)”
To wrap up, here are some key risk mitigation strategies – from technical safeguards to policy measures – that can enable engineers and organizations to confidently adopt AI in sensitive environments:
Each of these strategies – from squeezing models to encrypting everything, from vetting vendors to fine-tuning internally – contributes to an overall posture of trust through transparency and control. They turn the unpredictable black box into something that engineers and auditors can reason about and rely on. Dunsworth repeatedly comes back to the theme of discipline in engineering choices. Public safety and other critical systems can’t afford guesswork.By enforcing these risk mitigations, engineers can build systems that move fast and not break things beyond rapid recovery.
BlueSky Statistics – A GUI-Driven Analytics Platform for R Users
BlueSky Statistics is a desktop-based, open-source analytics platform designed to make R more accessible to non-programmers—offering point-and-click simplicity without sacrificing statistical power. It supports data management, traditional and modern machine learning, advanced statistics, and quality engineering workflows, all through a rich graphical interface.
Highlights:
That’s all for today. Thank you for reading this issue ofDeep Engineering. We’re just getting started, and your feedback will help shape what comes next. Do take a moment tofill out this short surveywe run monthly—as a thank-you, we’ll addone Packt creditto your account, redeemable for any book of your choice.
We’ll be back next week with more expert-led content.
Stay awesome,
Divya Anne Selvaraj
Editor-in-Chief,Deep Engineering
If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want toadvertise with us.