FL Fredrik Lindstrom

AI Architecture

The Architecture Question

Why the future of AI won't be decided by the biggest model — and the five architectural decisions every enterprise is already making, whether they realize it or not.


Hero — The Architecture Question

By Fredrik Lindstrom · ~10–12 minute read · May 2026

The board conversation in most companies right now is about which model to license. GPT or Claude or Gemini. Frontier or fine-tuned. American or open-weight. It is the wrong conversation.

Not because models do not matter — they do — but because the strategic decisions in AI right now are not happening at the model layer. They are happening at the architecture layer. And most boards are not even aware they are making them.

I have spent twenty-five years watching enterprises make architecture decisions about another technology that went through this same arc: cybersecurity. The pattern was identical. Leadership treated it as a procurement problem — which firewall, which antivirus, which vendor — for a decade before realizing the actual decisions were about identity, segmentation, access, and where the perimeter lived. By the time the architecture questions were obvious, the organizations that had been treating cyber as procurement were structurally behind the ones that had been treating it as design.

AI is in the same place. Right now. And the August 2 deadline for full enforcement of the EU AI Act is twelve weeks away — which means the window to make these decisions deliberately, rather than retroactively, is closing fast.

EU AI Act. The European Union’s landmark AI regulation. It classifies AI systems by risk level and imposes strict obligations on high-risk uses — including AI literacy across the workforce, conformity assessments, and post-market monitoring. Full enforcement of the high-risk provisions begins August 2, 2026.

The honest state of things

Let me ground this in what is actually happening today, not what the hype cycle suggests.

Deloitte’s 2026 State of AI survey of 3,200 leaders found that eighty-six percent of organizations are increasing AI budgets. Thirty-four percent are using AI to fundamentally reimagine their business. The other sixty-six percent are optimizing around the edges — efficiency gains, faster drafting, slightly cheaper customer service. Real benefit, but not transformation.

Production deployment at scale has gone from five percent two years ago to thirty-nine percent today. That is meaningful progress, but it is far from the universal adoption the headlines suggest. The bubble is not bursting. The air is leaking.

Meanwhile, agentic AI — the buzzword that dominated enterprise pitches throughout 2025 — has entered the trough that every overhyped technology eventually finds. MIT Sloan Review put it directly earlier this year: agents make too many mistakes for high-stakes business processes. Prompt injection, deceptive behaviors, and misalignment with human objectives remain unsolved. We are asking AI agents to act autonomously in environments we have not secured. As someone who leads cybersecurity services for a living, that concerns me more than most of the conversations happening at board level.

Prompt injection. An attack where malicious instructions are embedded in input data — a document, an email, a webpage — to manipulate an AI model into ignoring its original instructions and following the attacker’s. The AI equivalent of SQL injection. Most enterprise governance frameworks are not yet built to detect or block it.

The companies getting the most value from AI right now are not the ones chasing the biggest models. They are the ones matching the right model to the right task — with the right governance around it.

That sentence is the entire article. The rest is what it actually means.

Five architectural decisions every enterprise is already making

Here are the five shifts reshaping enterprise AI right now. Each one is an architectural decision your organization is making whether or not anyone has explicitly named it.

1. Capability — prediction or reasoning

Most people still think of LLMs as fancy autocomplete. That was true in 2023. It is not true anymore.

The shift from prediction to reasoning is the single largest capability change in AI since transformers shipped. Models like OpenAI’s o-series, Anthropic’s Claude Opus 4.7, and Google’s reasoning Gemini variants use reinforcement learning to think through problems step by step rather than predicting the next word in a sequence. Reasoning models have already hit gold-level performance in major math competitions — a milestone most researchers did not expect until 2027.

And the bigger shift underneath that: inference-time scaling. The biggest gains in LLM performance over the last twelve months are not coming from bigger training runs. They are coming from smarter inference — letting models spend more compute on harder questions at runtime. A well-optimized small model with inference scaling can outperform a model many times its size.

Inference. The runtime phase where a trained AI model actually produces an answer to a query. Training happens once and is expensive. Inference happens every time someone uses the model, and is where day-to-day AI cost, latency, and scale are determined.

What this means for architecture: if your AI strategy assumes “we will buy access to the biggest frontier model and that is our capability ceiling,” you have already framed the decision wrong. The capability question is now about how much reasoning, on which tasks, at what latency cost.

2. Size — frontier or right-sized

While the industry obsessed over trillion-parameter frontier models, a parallel revolution happened underneath the headlines. Small language models — one to ten billion parameters — are now delivering eighty to ninety-five percent of frontier performance on the specific tasks they are tuned for, at a fraction of the cost.

Mistral released its 3.5 billion parameter model two weeks ago that competes with frontier reasoning models on density-per-parameter. The story is not the model itself. The story is the trajectory: smaller, cheaper, fit-for-purpose models are no longer toys. They are production-grade for the workload they target.

Microsoft Phi, Alibaba Qwen, Meta Llama 3.2, Mistral’s compact models — these are not stripped-down versions of larger systems. Through knowledge distillation, higher-quality training data, and targeted reinforcement learning, they handle summarization, code completion, document extraction, sentiment analysis, and domain-specific question-answering with production accuracy.

The SLM market is projected to grow from $7.8 billion in 2023 to $20.7 billion by 2030. That growth is driven entirely by enterprises that figured out they were overpaying for capability they did not need.

What this means for architecture: the strategic question is no longer “which is the biggest model?” It is “which is the right model for this task, on this infrastructure, under these constraints?” Most enterprises will end up running three to five different models in production — a frontier model for hard reasoning, two or three SLMs for high-volume specific tasks, and embedded models for edge cases. Single-vendor AI strategies are already obsolete.

3. Location — cloud, local, or hybrid

The third architectural shift is where the model actually runs.

For a decade the answer was the cloud. For most workloads it still is. But the economics, the privacy posture, and the latency profile of running AI locally have changed enough that “cloud-only” is no longer the default it used to be.

Local AI — running models on phones, laptops, edge devices, or self-hosted infrastructure — solves three problems simultaneously. Privacy: sensitive data never traverses a network, never touches a third-party API, never sits in a cloud breach surface. For regulated industries — financial services, healthcare, government, legal — this is the difference between deploying AI and not deploying it. Latency: cloud-based AI adds hundreds of milliseconds per round trip; local inference is instant. And cost: marginal cost per inference drops to near zero once the hardware investment is made.

The tooling has matured fast. Ollama is now the standard for local model orchestration. Apple’s MLX framework optimizes for Apple Silicon. Llama.cpp — which joined Hugging Face this year — powers most local AI deployments in production. Quantization techniques let models that once required data center GPUs run on consumer laptops at four-bit precision with minimal quality loss.

Quantization. A compression technique that reduces the numerical precision of a model’s weights — from sixteen-bit down to eight-, four-, or even two-bit — dramatically shrinking memory and compute needs with minimal accuracy loss. The reason a model that needed a data center two years ago can now run on a laptop.

But here is the nuance most architecture conversations miss: local AI is not a replacement for cloud AI. It is a complement. Frontier reasoning, massive context windows, and the most complex multi-step tasks still benefit from cloud compute. The strategic decision is which workloads belong where — and most organizations have not even started mapping their workloads to deployment topology.

What this means for architecture: the question is no longer “are we cloud or are we on-prem?” It is “for which workloads do we run where, and what is our hybrid pattern?” That mapping is the architecture. Most boards have not seen it because most CIOs have not built it.

4. Topology — single agent or orchestrated

The fourth shift is the most operationally significant one and the one most under-priced by leadership today: AI is moving from single-model interactions to orchestrated multi-agent systems.

Forget the single chatbot. The architecture coming into production over the next eighteen months is teams of specialized agents — one for research, one for analysis, one for execution, one for verification — coordinating across environments, calling tools, executing code, and reasoning about each other’s outputs. The Model Context Protocol has become the de facto standard for agent tool access. IDC forecasts forty-five percent of organizations will be orchestrating AI agents at scale by 2030.

MCP — Model Context Protocol. An open standard introduced by Anthropic in 2024 that defines how AI models discover and call external tools, APIs, and data sources. Functionally the USB-C of agentic AI, now adopted across most major model providers.

This is where the security blind spot is most acute. Every agent in an orchestrated system needs an identity. Every tool call needs authorization. Every output needs validation before the next agent acts on it. Most of these systems are being deployed today by engineering teams without a security review, because the governance frameworks treat agents as if they were API calls. They are not. They are autonomous actors with the ability to take consequential actions on behalf of the organization.

We have seen this movie before. It was called shadow IT, and it was called Robotic Process Automation, or RPA, before that. The pattern is identical: a useful capability proliferates faster than governance can keep up, and then something breaks publicly, and then everyone wishes they had architected it deliberately from the start.

What this means for architecture: every organization deploying AI agents in 2026 needs an answer to four questions. Who is the agent acting on behalf of? What is the agent authorized to do? How is the agent’s behavior audited? And what is the kill switch when something goes wrong? If you cannot answer those four questions, you do not have an agent strategy. You have an agent exposure.

5. Jurisdiction — global or sovereign

The fifth shift is geopolitical, and it is the one most boards are completely unprepared for.

Sovereign AI — AI systems that operate under a specific country’s data laws, infrastructure, and compliance frameworks — is moving from a theoretical conversation to a procurement requirement. Deloitte and Gartner both flag sovereign AI as a defining trend, with thirty-five percent of nations projected to rely on region-specific AI platforms by 2027. The EU AI Act becomes fully operational on August 2, 2026 — twelve weeks from this article. China is building parallel AI infrastructure with parallel models. The United States is moving toward export controls on advanced inference compute. India is mandating data localization for AI training. The Middle East is investing tens of billions in regional AI infrastructure.

If your organization operates in more than one major jurisdiction — and most do — your AI architecture cannot be one architecture. It has to be a federation of regional architectures, each tuned to local data laws, local model availability, local compliance requirements, and local sovereignty pressures. The procurement decision that looked like “which cloud do we use for AI?” two years ago is now “which AI infrastructure runs in which jurisdiction, under whose rules, for which workloads?”

This is not abstract. The EU AI Act high-risk obligations carry penalties up to €35 million or seven percent of global revenue. Compliance costs run €180,000 to €420,000 per high-risk system. Deployment delays of four to seven months are now standard for systems that did not anchor jurisdictional governance from day one.

What this means for architecture: AI strategy is now a function of regulatory geography. Boards operating globally need a map of where their AI runs, under whose rules, with what compliance posture — and that map should be reviewed quarterly.

The thread — security and governance

Five architectural shifts. One thread runs through every one of them: security and governance.

Reasoning models expand the attack surface. The same chain-of-thought capability that makes them powerful makes them more susceptible to prompt injection and goal hijacking. Right-sizing to small models reduces some risks — data residency, vendor concentration — and creates others, including supply chain vulnerability and weaker safety tuning. Local AI removes cloud exposure but introduces endpoint and physical-access risks. Orchestrated agents multiply the attack surface by the number of agents in the topology. Sovereign deployments fragment governance.

Every architectural decision is also a security decision. And the cybersecurity teams who have spent two decades learning how to govern complex distributed systems are the right people to be in the room when those decisions are made. Most are not.

This is the gap I keep coming back to. Boards are approving AI strategies that their cybersecurity leadership has not reviewed. CISOs are being asked to secure deployments they did not architect. CIOs are presenting AI roadmaps that do not include a security thread. None of this is the result of bad intent. All of it is the result of treating AI as a technology problem rather than an architecture problem.

The fix is structural, not procedural. AI governance belongs at the same level as cybersecurity governance — which means at the audit committee, with quarterly board reporting, with named accountability. Not in a working group. Not on a roadmap. At the level where strategic risk lives.

The verdict

The Promise and Risk meter on this one leans toward Promise — but only for organizations that treat AI strategy as an architecture problem.

The promise is real. The five architectural shifts collectively create more leverage than any single frontier model release. Reasoning capability, right-sized models, hybrid deployment, orchestrated agents, sovereign-ready governance — combined, these reshape what is possible at the enterprise scale. The organizations that make these decisions deliberately, with their cybersecurity leadership in the room, with named board-level accountability, will compound an advantage that will be hard for late movers to catch.

The risk is also real. Every shift expands attack surface. Every shift creates compliance complexity. Every shift introduces dependencies the organization did not have a year ago. The frameworks are catching up — NIST AI RMF, ISO 42001, EU AI Act, the IAPP AIGP credential — but the governance infrastructure inside most organizations is not. Late movers will not just be behind. They will be inheriting architecture decisions made by vendors, engineering teams, and individual employees, with no governance overlay.

The leadership question is not “which model do we use?” It is “what are our five architectural decisions, who is making them, and who is reviewing them?”

If you cannot answer that, the decisions are still being made. You are just not in the room.

What to do this quarter

I will go deeper on two specific pieces of this in the coming weeks. Episode 13 of Promise & Risk of AI covers AI governance for decision-makers — the NIST AI RMF, the 4D Framework, and how to structure board oversight in practice. Episode 20 walks the enterprise AI stack end to end, from RPA through orchestrated agents, with a focus on how directors and senior IT leaders evaluate vendor claims with confidence.

For the moment, the most useful exercise you can run with your leadership team this quarter is this one: take each of the five architectural decisions above, and write down what your organization has decided. Not what is on the roadmap. Not what the vendor pitched. What has been decided, by whom, and with what governance review.

If most of the boxes are empty, that is the work.

If most of the boxes are filled in by people whose names you do not know, that is a bigger problem.

If the security team is not in the room when those boxes get filled in next, that is the architecture problem this article is about.

The future of AI is not going to be decided by the biggest model. It is going to be decided by the organizations that figured out the architecture question before their competitors did. Most have not started. The window to start deliberately is the next twelve weeks.


Fredrik Lindstrom is the founder of Promise & Risk of AI, a YouTube channel and LinkedIn publication delivering balanced, honest AI literacy to business leaders. With 25+ years in cybersecurity and enterprise technology, his work focuses on the intersection of AI capability and organizational governance.

Every promise. Every risk. The truth.