AI is Leaving the Cloud

A Field Guide to the Out-of-Cloud AI Movement

10 min read

Addo Smajic avatar

Addo Smajic

Jun 15, 2026

AI is Leaving the Cloud

AI Is Leaving the Cloud: A Field Guide to the Out-of-Cloud AI Movement

For most of the last decade, AI lived in someone else's data center. You sent your data up. A model sent the answer back. That was the whole architecture. It was so dominant that "AI" and "cloud AI" became the same word in most developer minds.

That story is still true. It is just not the only one anymore.

Right now, across robotics, mobile, automotive, healthcare, consumer hardware, industrial systems, and aerospace, the most strategic AI work is happening outside the cloud. It is happening on the chip in your pocket, on the humanoid stepping around a warehouse, on the car deciding whether to brake, on the watch listening for an irregular heartbeat, on the satellite that has not had a network connection since the day it was launched.

Apple calls it on-device AI. NVIDIA calls it physical AI. The open-source community calls it local AI. Robotics engineers call it embedded AI. Industrial and IoT engineers call it edge AI. Surgeons call it intraoperative AI. Each community has its own conferences, its own GitHub repos, its own jargon. These are not separate trends. They are one trend, viewed from a dozen different industries.

This is the field guide.

A movement hiding in plain sight

Operating systems now ship with on-device foundation models built into the platform. Apple Intelligence runs locally on iPhones, iPads, and Macs, and Apple has opened the on-device model to any third-party developer who wants it. Google's Gemini Nano runs inside Chrome on the desktop and inside Android through AICore. The pattern is the same on both sides: the model lives next to the app, the network is not in the loop, and inference is free.

An open-source local-AI stack now exists outside the hyperscalers. Tools like Ollama, llama.cpp, MLX, WebLLM, ONNX Runtime, and Core ML have made running capable models on your own hardware a default rather than a project. The stack updates monthly. The names will change. The fact of the stack will not.

Foundation models have moved into robots. NVIDIA's Isaac GR00T, designed to run on Jetson hardware inside the robot, is the leading example. Every serious humanoid program now ships an on-board foundation model as a matter of course. The cloud is not in the control loop, because it cannot be.

Cars do not even pretend to have a choice. A cloud round trip typically takes 100 milliseconds or more. Highway-speed autonomous driving needs a decision under 10 milliseconds. The math is not negotiable. So the AI runs in the car, against the car's sensors. The network is not in the picture.

The same logic plays out wherever physics, latency, or trust will not allow a round trip. Drones run AI on board because dead zones do not have signal. Spacecraft run AI in orbit because they are, by definition, not in the cloud. Surgical robots run AI on the operating table because a hospital cannot wait for a server in another state to confirm whether the next motion is safe. Smartwatches run inference on dedicated neural silicon because the battery and bandwidth math does not work any other way. Hearing aids and cochlear implants clean up speech in real time using on-device machine learning. None of this will reverse. The constraints that put the AI on the device are permanent.

These look like separate stories. They are one story. AI is moving to where the data lives, to where the action happens, and to where the milliseconds matter.

The territory is bigger than you think

Out-of-cloud AI is a continent with several major regions, and most developers have only walked through one of them.

Local AI is models running on personal devices. Phones, laptops, browsers, tablets. Apple Intelligence, Gemini Nano, Ollama, MLX, WebLLM, llama.cpp, Core ML, ONNX Runtime. The user's own device is the inference target.

Physical AI is AI that controls things in the physical world. Robots, vehicles, drones, surgical systems, industrial machines, agricultural equipment, warehouse automation, humanoids. The AI is closing a control loop with sensors and actuators. A missed deadline is not a degraded experience. It is a failure.

Embedded AI gets baked into devices with very little compute to spare. Smartwatches, earbuds, glasses, hearing aids, medical implants, sensors. Models compressed and tuned to run on chips designed for efficiency, not performance. Every milliwatt and every byte of memory is a real constraint.

Edge AI is the term the cloud industry has tried to claim, but it does not mean what they want it to mean. Their version is CDN nodes, base stations, and regional compute boxes that still belong to the provider. The real meaning is AI running where the data is generated and where the user actually is. On the device, on the sensor, on the machine. The AI sits at the trust boundary the user controls, not at the nearest data center the provider operates.

Mission-critical AI is the AI that has to keep working when nothing else does. Spacecraft, military gear, first-responder equipment, deep-sea hardware, polar research stations, surgical systems, life-support equipment. The device cannot phone home and cannot fail.

Agentic AI at the edge is the most strategically important emerging region. Agents that act on behalf of users and organizations, increasingly running on the device or the physical system rather than in the cloud. An agent on your phone that books your appointments. An agent on a delivery robot that picks a route. An agent on an industrial controller that negotiates with the agent on the next machine. Autonomy at the edge, with all the trust, governance, and coordination questions that come with it.

These regions overlap. A surgical robot is physical AI and mission-critical AI at the same time. A smartwatch is local AI and embedded AI at the same time. An agent on a humanoid is agentic AI and physical AI all at once. The territory is bigger and more diverse than most developers realize.

The range of devices makes the scale of it real. AI is now running on phones, tablets, laptops, and browsers. It is running on watches, rings, earbuds, and glasses. It is running on cars, trucks, drones, boats, ships, planes, and satellites. It is running on surgical systems, medical implants, hearing aids, and insulin pumps. It is running on industrial robots, factory controllers, warehouse automation, agricultural equipment, and construction machinery. It is running on doorbells, locks, thermostats, appliances, and security cameras. It is running on traffic systems, public sensors, and smart-city gear. It is running on wearables for soldiers, firefighters, and paramedics. It is running on spacecraft, rovers, deep-sea equipment, and polar research stations.


Four tailwinds pointing the same way

For the last decade, software has been quietly moving out of the cloud and into the world. Cars stopped being cars with software bolted on and became software products with hardware attached. Factories did the same. So did machines, surgical instruments, weapons, appliances, vehicles, buildings. Whole industries spent ten years software-defining themselves while everyone watched the cloud.

AI is the next layer. AI is software, and software-defined everything is what it runs on. The substrate is already in place.

Four forces are driving this. None of them are reversing.

Physical AI is no longer a niche. Komatsu commissioned its 1,000th autonomous mining truck in April 2026, and customers running its system have moved over 11.5 billion metric tons of material since 2008. Caterpillar runs nearly 700 autonomous haul trucks today and is targeting more than 2,000 by 2030. Waymo runs roughly 3,000 robotaxis, completes more than 500,000 paid trips a week, and is expanding from five markets to twenty-six in 2026, including London, New York, and Tokyo. Humanoid robots are in pilot or production at BMW, Mercedes, Hyundai, BYD, GXO, and a dozen other major operators, with IDC forecasting more than 510,000 humanoid units shipped annually by 2030. Drones, surgical robots, agricultural machines, autonomous trucks, warehouse automation. None of them run in the cloud. They cannot.

Models are getting smaller and stronger at the same time. Microsoft's Phi-4 outperforms models forty times its size on reasoning and coding tasks. Google's Gemma runs on a single consumer GPU. Sub-billion parameter models now generate at fifty tokens per second on an iPhone. Capability per parameter is improving faster than parameter counts ever grew. What was frontier eighteen months ago now fits on a phone, and the trajectory is not flattening.

Silicon is making on-device AI cheap. Apple's A19 Pro runs 3-billion parameter models natively on the Neural Engine. Qualcomm's latest Snapdragon platforms hit more than 80 TOPS. NVIDIA's Jetson Thor is the brain inside the next generation of humanoid robots. AMD, Intel, MediaTek, and Samsung are shipping NPUs in consumer and edge silicon. The cost-per-token of local inference is collapsing. The cost-per-token of cloud inference is not.

Data sovereignty is becoming a legal requirement. The EU AI Act becomes fully applicable in August 2026. The Cyber Resilience Act begins enforcement in September 2026, with full compliance required by December 2027. The EU Data Act has been in effect since September 2025. NIS2 is being enforced now. Roughly 75% of the world's population already operates under modern privacy regulation, and more frameworks land every year. The legal direction is uniform. Data has to stay close to where it was generated, with audit trails, with consent, with sovereignty over who can see it. Cloud architectures that pull all data to a central server are heading the wrong way.

These forces are not separate, and they are not slowing down. They converge on a single future where AI lives where the data lives, on devices the user controls, in machines that act in the physical world. The substrate is being built faster than most people realize. The question is not whether this happens. The question is what gets built on top.

Nine faces of the same problem

Forces this strong eventually win. But the road from today to "eventually" runs through a set of structural problems that every developer in this space hits, no matter which industry they came from.

The robotics engineer and the mobile developer think they are working in different worlds. They are not. They are running into the same nine walls.

The first wall is latency. A round trip to the cloud takes time the application does not have. A self-driving car cannot wait 100 milliseconds to decide whether to brake. A surgeon's robotic instrument cannot wait for a server to confirm the next motion is safe. The speed of light is the same constraint whether you are building a humanoid or a hearing aid.

The second wall is connectivity. A network you cannot count on is the same problem in a tunnel, a hospital wing, a valley, a cargo hold, or a kitchen with bad Wi-Fi. The AI has to keep working as if the network were optional, because it is.

The third wall is compute heterogeneity. A Jetson Thor is not a smartwatch. An iPhone is not a Raspberry Pi. The same model and the same data layer have to scale across orders of magnitude in compute, memory, storage, and power. Code that assumes "you have a server" does not survive contact with a wearable. The same code, lifted onto a workstation, leaves an order of magnitude of performance on the table. Out-of-cloud AI has to be portable across silicon, not portable across data center regions.

The fourth wall is data that cannot leave the device. Patient vitals, user behavior, sensor readings, voice recordings, video feeds. Privacy law, business rules, bandwidth, and physics all push toward keeping the data local. The cloud-first assumption is that data flows freely upward to be stored and processed on someone else's servers. That assumption does not hold here. The data has to stay where it was captured. The model has to come to the data.

The fifth wall is state. An AI agent on a device cannot reset its memory every time it loses signal. A robot cannot forget what it was doing in the middle of a task. A watch cannot lose your health history because the phone is in another room. The device has to be the source of truth for its own context, history, and state. That inverts the cloud assumption. The cloud has been the system of record. The device has been a thin window onto it. At the edge, the device is the system of record.

The sixth wall is coordination without a coordinator. Multiple agents, on multiple devices, acting on shared data, with no central server to play referee. Two devices change the same record at the same time, both offline, and reconcile later. Three robots pick up tasks from a shared queue without a master scheduler. Federated learning across a fleet of phones. Peer learning across a swarm of drones. Workflows that span a watch, a phone, a laptop, and a car. This is one of the hardest problems in distributed systems, and out-of-cloud AI runs into it constantly. The devices have to agree on shared state, divide work, and learn together, all without aggregating their raw data into a single store.

The seventh wall is verifiability and trust without a server. When an AI agent acts on your behalf, how do you prove what it did? When data arrives from a peer device, how do you trust it? When the model on your watch updates over the air, how do you know it has not been tampered with? In the cloud, a vendor is the trust anchor. At the edge, there is no vendor. So trust has to be built into the data and the code.

The eighth wall is governance that survives offline operation. What rules govern what the AI is allowed to do, when there is no central authority to ask? How do you audit an agent that took an action while the network was down? How do you enforce access controls on data that is being processed by a model on a device you do not own? Compliance frameworks built for cloud architectures assume a control plane. Out-of-cloud AI has to bring the control plane with it.

The ninth wall is fragmentation. Edge devices come from hundreds of manufacturers, each with their own data formats, schemas, units, sampling rates, and quirks. A wearable from one brand encodes heart rate differently from a wearable from another. A factory sensor from one vendor uses a schema nothing else on the line understands. A medical implant emits data in a format the hospital's other systems have never seen. The AI cannot reason across any of this until the data has been normalized into something coherent. And that normalization has to happen at the edge, on the device, without a central server to do the translation.

These are nine faces of the same underlying problem. AI is moving to where the data lives, and the data infrastructure to support that move does not exist yet.

Why the walls are still standing

Cloud AI did not scale because the models were good. Cloud AI scaled because the data infrastructure was good.

Twenty years of investment built the data layer that cloud AI rides on. Object storage, managed databases, streaming pipelines, identity systems, query engines, governance frameworks. By the time GPT-3 shipped, the infrastructure to capture, store, move, transform, govern, and serve data at planetary scale was already mature. Models stood on top of that infrastructure. They did not have to invent it.

Out-of-cloud AI does not have that foundation. The hardware is here. The models are here. The tailwinds are here. The data layer is not. Every team building edge AI today is hand-rolling its own answers to the nine walls. A robotics company writes its own sync protocol. A wearables company writes its own access control. A hospital integrator writes its own schema translator. They are each rebuilding, badly and incompatibly, what cloud developers take for granted.

This is the bottleneck. Out-of-cloud AI is not stuck on compute. It is not stuck on models. It is stuck on data infrastructure. Until the data layer for non-cloud environments is as mature as the data layer for the cloud, every developer in this space is paying a tax the cloud developer is not.

The bottleneck is data infrastructure

Data is the lifeblood of AI. That has always been true in the cloud, where the entire stack was designed around it. It is true again at the edge, where the entire stack still has to be built.

Out-of-cloud AI is the most important category of software being built right now. The robots, the cars, the watches, and the agents on phones are the new center of gravity. The developers building in this space are working on top of infrastructure that does not exist yet. That is the constraint, and it is the opportunity.

Cloud AI is not going away. It is good at what it is good at. The argument is not that the cloud is broken. The argument is that an entire category of new software is being built that the cloud was never designed to support, and that category is bottlenecked on a missing layer.

Whoever builds that layer unlocks the next decade. The cloud era was won by the companies that built the data infrastructure underneath it. The edge era will be won the same way.

The cloud built the last era of software. The edge will build the next one.


Share

Start Building the Future

We built Source to help you build the next generation of intelligent software, from Earth to orbit. It’s time to break free from cloud constraints.