WWDC 2026 and AI: A New Window of Opportunity — or Another Year on Hold?

Published on
June 24, 2026
Updated on
June 24, 2026
WWDC 2026 and AI: A New Window of Opportunity — or Another Year on Hold?
WWDC 2026 opened Apple's on-device AI to developers: a free 3B model, Private Cloud Compute, and Core AI for custom models. An iOS engineer's honest breakdown.

For the past two years, the AI story in the Apple ecosystem has followed the same pattern — a staggered Apple Intelligence rollout, a delayed Siri rebuild, "personal context" landing only partially across releases, and the EU getting almost none of it. Heading into WWDC 2026, the question was simple: is this finally the year Apple delivers, or just another chapter of delay?

From my perspective as an iOS engineer at NineTwoThree, here's the honest answer — for the consumer-facing Siri AI experience, the wait continues. But for on-device developer AI, this is genuinely the year the platform landed, and in a way that changes the economics of mobile AI features we'd previously written off as too expensive to ship.

What actually landed for developers at WWDC 2026

Apple opened the Foundation Models framework to third-party apps. Every Apple Intelligence-capable iPhone now ships with a 3-billion-parameter on-device model — AFM 3 Core — accessible through a native Swift API. It handles writing, summarization, translation across 25 languages, content classification, and tool calls, running at roughly 30 tokens per second on an iPhone 15 Pro. No cloud bill, no API key, no quota.

For heavier tasks, Private Cloud Compute (PCC) is now developer-accessible — a cryptographically attested environment where Apple can't read the request, verified independently by security researchers. And here's the part that changes the calculation for us: for developers in the App Store Small Business Program with fewer than 2 million lifetime downloads, PCC is free. The per-token cloud cost that has historically killed mobile AI features isn't really a blocker anymore for most teams.

There's also a new unifying Swift protocol — LanguageModel — that lets the same session and API run against Apple's on-device model, PCC, MLX, Anthropic Claude, or Google Gemini. Switching providers becomes a one-line change rather than a rewrite.

The three things that flipped

For about three years the industry consensus has been simple: real AI runs in the cloud, and mobile clients are thin wrappers. That held because per-token cloud inference was expensive, model files were too big to ship to a phone, and on-device hardware couldn't run a usable language model. WWDC 2026 broke all three assumptions at once.

  1. Free per-user inference became the default. A 3B model ships on every capable iPhone, plus a privacy-grade cloud tier that's free for most small and mid-sized teams.
  2. Model size stopped being a packaging problem. Core AI's compression toolchain shrinks an open model like SAM3 from ~3 GB to ~430 MB at int4 with no meaningful quality loss, and ships it via Background Assets instead of the app bundle.
  3. Provider lock-in became optional. One Swift protocol lets the same app call Apple's model, Claude, or Gemini — chosen at runtime, not at compile time.

The net effect: a lot of features we rejected as "too expensive to operate" two years ago are now zero-bill, on-device features.

Why this changes the economics of mobile AI

The single biggest commercial shift is the cost model. Most of the stack is now free for most teams. On-device models (AFM 3 Core and Core Advanced) carry no metering at all. PCC is free under the Small Business Program below 2M lifetime downloads. Custom on-device models built with Core AI cost nothing to run — the user downloads them once via Background Assets. The only metered costs are deliberate choices to route to Claude or Gemini for capabilities Apple's models don't yet match.

That reframes the conversation we have with clients. Features that used to come with an open-ended per-image or per-token operating bill — receipt parsing, photo-to-structured-data, message rewriting, content tagging — are now fixed-cost or free, and they run on the phone.

What this unlocks for regulated industries

There's a second story that matters even more for the healthcare, legal, and finance clients we work with: the privacy posture is different now. For years the blocking question on any regulated build has been "can we send this data to a cloud LLM?" — and the honest answer was usually no.

On this stack, that question finally has a defensible answer. On-device inference means the data never leaves the phone. PCC means that when you do need the cloud, Apple can cryptographically prove — to the client and to independent researchers — that no one, including Apple, can read the request. And Core AI means a client's proprietary model (a medical imaging classifier, a fraud-detection model, a domain-specific LLM they own) can ship to a phone without the model or its data ever touching a third-party vendor. That path simply didn't exist cleanly before.

What I'd build on this stack today

A few patterns are immediately viable, and I'd reach for them in this order:

  • On-device only (zero cost, zero data exposure): note and email summarization, structured extraction from receipts and forms, content classification, local Q&A grounded in the user's own content, voice-note transcription with on-device summary, style-rewrite assistants, and writing assistants that personalize to the user over time.
  • Route to PCC (free at small scale): long-document summarization (50-page contracts), multi-page OCR, multi-step or agentic workflows that need chain-of-thought, and anything over the 8K on-device context window.
  • Route to Core AI (custom on-device models): domain-specific LLMs starting from Qwen3 or Mistral, proprietary classifiers a client can't send to a third party, image segmentation with SAM3, and fully-local RAG.
  • Reach for Claude or Gemini when you need frontier-tier capability, server-side tools like web search or code execution, direct audio input, or a client has a strong vendor preference.

The honest caveats: device tiers, the EU, and the 2M-download cliff

This isn't unqualified good news, and it's worth being straight about the limits.

Device eligibility is the big one. Apple Intelligence — the gate on Foundation Models access — runs on roughly 65–70% of US iPhones today (iPhone 15 Pro and newer). The higher-capability AFM 3 Core Advanced model, which adds image input, reaches only ~25–30%. That leaves roughly a third of users on devices that get nothing — and many of them are on recent hardware like the iPhone 14 or the base iPhone 15, which is still on sale. Any feature you build needs a graceful degradation path.

The free tier has a ceiling. Cross 2 million lifetime downloads and you exit the free PCC tier. Apple hasn't published pricing above that line yet, so treat it as TBD in any planning conversation.

The EU gets the frameworks but not the assistant. Foundation Models, PCC, Core AI, and the LanguageModel protocol all work in the EU. What's restricted is the consumer Siri AI assistant layer, which Apple attributes to DMA compliance. The developer-facing stack is not regionally gated.

The bottom line

If you were waiting for Siri to become the AI assistant Apple has been promising, this was another year on hold. But if you build apps, WWDC 2026 was the window opening. The platform now hands you free on-device inference, a privacy-grade cloud tier, and a clean path to ship custom models — all behind native Swift APIs. The features we used to price out of existence are suddenly shippable. That's the part worth acting on now.

This is exactly the kind of shift we help teams turn into shipped product. If you're weighing what's now possible on-device — especially in a regulated industry — talk to our team about your AI roadmap.

The technical reference: Apple's on-device AI stack after WWDC 2026

For the engineers: here's what each piece of the stack does, where the limits are, and how the pieces compose. Scope is intentionally narrow — the AI inference surfaces available to a third-party iOS developer, on-device or in PCC.

AFM 3 Core — the free 3B model on every modern iPhone

Apple Foundation Models 3 Core is the on-device model bundled with iOS 27, accessible through the FoundationModels Swift framework. It's a true small LLM, not a feature-specific classifier, and it runs entirely on the device.

  • Capabilities: text generation, rewriting, summarization, translation across 25 languages, classification and structured extraction, multi-turn streaming conversation, and tool calling against built-in and custom Swift tools. Strongly-typed outputs come via the @Generable macro and @Guide field hints.
  • Capacity: 8,000-token context window on iOS 27 (~6,000 words); ~30 tokens/sec on an iPhone 15 Pro; zero cost per request.
  • Device floor: iPhone 15 Pro and newer (~65–70% of US iPhones, rising ~1–2 points/month).
  • Limits to plan for: no chain-of-thought reasoning (route to PCC for multi-step planning); 8K context means longer inputs must be chunked; .refusal and .guardrailViolation errors are part of the contract; English-first at launch, with the other 24 languages rolling in over the cycle.

The framework defaults to AFM 3 Core via SystemLanguageModel.default — swapping to PCC or a third-party provider is a one-line change to the model parameter.

AFM 3 Core Advanced — 20B sparse MoE, top-tier devices only

The high-capability on-device variant is a 20-billion-parameter sparse Mixture-of-Experts model that activates just 1–4B parameters per token. Apple fits it on a phone by storing most of it in flash (NAND) rather than DRAM, using a lightweight dense block to select experts dynamically. Apple credits two techniques: Instruction-Following Pruning (IFP) for deployment beyond DRAM constraints, and Quantization Aware Training (QAT) to preserve accuracy at lower precision.

  • What changes vs. Core: accepts image input (multimodal), and reaches closer to frontier quality on hard summarization, extraction, and reasoning that doesn't strictly require chain-of-thought. Same Swift API — the framework picks the variant automatically.
  • Device floor (the catch): iPhone Air, iPhone 17 Pro / Pro Max, M4+ iPads ≥12 GB, M3+ Macs ≥12 GB, Apple Vision Pro M5 — roughly 25–30% of US iPhones. Plan to degrade to text-only or route to PCC on weaker devices.

Private Cloud Compute — Apple's privacy-grade cloud tier

PCC is the cloud half of the stack — not a traditional cloud LLM, but a cryptographically attested compute environment where Apple proves no party (including Apple) can read the data. Announced for Apple's own use at WWDC 2024, it opened to third-party developers at WWDC 2026. It hosts three models: AFM 3 Cloud (the server-side workhorse), ADM 3 Cloud (Image) for image generation/editing, and AFM 3 Cloud Pro (the most capable, for agentic tool use and complex reasoning). Notably, AFM 3 Cloud Pro runs on Google Cloud infrastructure with NVIDIA GPUs — the first time PCC has extended beyond Apple silicon — with privacy guarantees maintained via the same attestation.

  • What PCC adds over Core: ~32,000-token context, chain-of-thought reasoning at three intensities (.light, .moderate, .deep), multi-step planning and agentic tool-calling, long-document summarization, multi-page OCR, and multi-image reasoning.
  • What PCC can't do: image generation (that's the separate ADM 3 surface), direct audio/video input (transcribe with SpeechAnalyzer first), built-in web search or code execution, custom weights or fine-tuning, offline operation, or persistent cross-session memory.
  • Free tier: free for Small Business Program developers (<$1M/year) under 2M lifetime downloads. iCloud+ subscribers get higher per-user daily quotas — that subsidy lands on the user, not the developer. Pricing above 2M downloads is unpublished (TBD).
  • UX implication: the per-user daily quota is shared across every app the user runs. Inspect model.quotaUsage.status before invoking, disable AI-bound buttons near the limit, and surface an iCloud+ upsell where appropriate. Xcode 27 can simulate availability and quota states.

Core AI — shipping custom models on-device

Core AI is a new framework (distinct from, and not replacing, the older Core ML) built for generative-era workloads: LLMs in the 10B+ range, image segmentation models, and custom domain models. Core ML stays for its legacy image-classifier and tabular cases.

The pipeline: convert a PyTorch model to Apple's .aimodel format via torch.export; compress with coreai-opt (int4 per-channel symmetric is the standard preset, with K-means palletization for embeddings and FP4/FP8 for sensitive layers); AOT-compile per target device with xcrun coreai-build; distribute via Background Assets, not the app bundle; and run through high-level wrappers like CoreAILanguageModel or CoreAIImageSegmenter.

The headline demo — SAM3: Meta's Segment Anything Model 3 ran on iPhone after compressing from ~3 GB to ~430 MB at int4 (about an 85% reduction) with no meaningful quality loss, plus a 76% speedup from cached image-encoder reuse via the multi-entrypoint asset feature.

API and lifecycle: three core types — AIModel, InferenceFunction, and NDArray — built on memory-safe, non-escapable Swift types. Specialization (per-device compile) is a formal lifecycle step that can be slow on first load, so trigger it ahead of time; AIModelCache manages cached artifacts and can share them across apps in the same App Group. For transformers, Core AI adds model states and in-place KV caching to avoid recomputing context. A dedicated Core AI Debugger visualizes execution, inspects tensors, and traces operations back to the original Python source.

What ships today: Apple's open-source apple/coreai-models package includes ready pipelines for Qwen3, the Mistral family, and SAM3. These surface through CoreAILanguageModel as Foundation Models providers, so the same @Generable struct, streaming, and tool-call API work against a custom model you ship.

Limits: PyTorch source is required (no direct GGUF/weights-only conversion); torch.export is strict about dynamic shapes and control flow; custom CUDA kernels must be ported to Metal; there's roughly a 1 GB post-compression memory ceiling on iPhone (caps practical size around 10–30B at int4; Mac is much higher); no automatic .mlmodel.aimodel converter; some open-weight licenses (notably Llama) prohibit App Store redistribution; and first-load specialization is slow — schedule it via Background Assets.

LanguageModel protocol — the provider-agnostic Swift API

The LanguageModel protocol is the unifying abstraction. Anything that conforms — Apple's models, a custom Core AI model, Claude, or Gemini — drops into the same LanguageModelSession, and the downstream call sites stay identical. Five providers ship at WWDC 2026:

ProviderWhere it runsHow you get it
Apple Foundation ModelsOn-device (Core / Core Advanced) and PCCBuilt into the OS — SystemLanguageModel
MLXOn-device — any open-weight model converted to MLXApple's MLX framework
Core AIOn-device — custom .aimodel assetsCoreAILanguageModel
Anthropic ClaudeCloud (Anthropic-hosted)ClaudeForFoundationModels SPM (Apache-2.0)
Google GeminiCloud (Google-hosted)Firebase Apple SDK

What it enables: provider-tier routing inside one app (cheap on-device for triage, PCC for harder tasks, Claude/Gemini for frontier needs), per-customer model selection in B2B without code changes, failover patterns, and future-proof shells where the "best model" is a runtime config.

Gotchas: provider parity isn't guaranteed (Claude can throw .unsupportedGenerationGuide for structured outputs Apple supports); API keys must use proxied auth in production; you own the third-party token bill; tool support varies (Claude exposes .webSearch/.webFetch/.codeExecution; Apple's on-device model has OCRTool, BarcodeReaderTool, SpotlightSearchTool); and OpenAI is an Xcode 27 agent option but not yet a first-party LanguageModel conformer.

Multimodal image input, SpeechAnalyzer, and on-device personalization

  • On-device image input: AFM 3 Core Advanced reasons over photos via Attachment(image) in the prompt builder (UIImage, CGImage, CVPixelBuffer, file URLs, and more), with OCRTool() and BarcodeReaderTool() built in, and can fill a @Generable struct directly from a photo. Image attachments cost tokens proportional to size, so downscale before sending. It returns text/structured output, not pixel masks — use Vision's segmentation requests for masks — and devices on plain AFM 3 Core fall back to text-only.
  • SpeechAnalyzer + AI: Apple's on-device transcription stack (SpeechTranscriber, DictationTranscriber, SpeechDetector) now feeds transcripts straight into LanguageModelSession. The fully on-device shape — transcribe locally, then summarize/structure with AFM 3 — means audio never leaves the device, which unlocks meeting recorders, voice journals, field-inspection reports, and medical/legal session notes. (For direct audio input you still route to a provider like Gemini.)
  • On-device adapter training: iOS 27 ships a Foundation Models Adapter Trainer that fine-tunes a small LoRA-style adapter per user, entirely on-device. The base model updates with the OS; the megabyte-scale adapter persists in the app sandbox; the training data never leaves the phone. It's ideal for writing-style learning, per-user domain vocabulary, and tone-matching — but cold-start users have no adapter (design for day-zero usefulness), adapter quality depends on data quality, and training is compute-heavy, so schedule it for idle/charging periods.

The cost model — what's free, what isn't

TierCost to developerCost to user
AFM 3 Core (on-device)$0 — no metering$0 — runs on their phone
AFM 3 Core Advanced (on-device)$0 — no metering$0 — runs on their phone
PCC — Small Business Program, <2M downloads$0 — freePer-user daily quota; higher with iCloud+
PCC — above 2M lifetime downloadsPricing not yet published (TBD)Same per-user daily quota
Core AI (custom on-device models)$0 — no metering$0; one-time Background Assets download
Anthropic Claude via LanguageModelAnthropic's per-token pricing$0
Google Gemini via LanguageModelGoogle's per-token pricing$0

Keeping PCC free requires both conditions: enrollment in the App Store Small Business Program (under $1M/year) and fewer than 2 million lifetime first-time downloads across all apps. A single app crossing 2M downloads exits the free tier; Apple's stated intent is to publish above-threshold pricing before any developer is forced off.

Device tiers and install-base reality

TierDevicesUS install-base share
AFM 3 Core eligibleiPhone 15 Pro/Pro Max; iPhone 16 line; iPhone Air; iPhone 17 / 17 Pro / Pro Max~65–70% (rising ~1–2 pts/month)
AFM 3 Core Advanced eligibleiPhone Air; iPhone 17 Pro / Pro Max; M4+ iPads ≥12 GB; M3+ Macs ≥12 GB; Apple Vision Pro M5~25–30%
Not eligibleiPhone 14 line and older; base iPhone 15 / 15 Plus~30–35%

The AI-capable base is the largest it has ever been at a feature launch — but ~30% on non-eligible devices is not an edge case, and much of it is recent hardware still in active sale. Build for graceful degradation.

Frequently asked questions

Is WWDC 2026 the year Apple's AI finally delivered?

It depends on what you're waiting for. The consumer-facing Siri AI assistant is still delayed, and in the EU it's restricted for DMA reasons. But for developers, WWDC 2026 delivered: a free on-device model on every capable iPhone, a privacy-grade cloud tier, a framework for shipping custom models on-device, and one Swift protocol that makes providers interchangeable.

Is Apple's on-device AI free to use?

For most teams, yes. On-device models (AFM 3 Core and Core Advanced) and custom Core AI models carry no metering. Private Cloud Compute is free for App Store Small Business Program developers under 2 million lifetime downloads. The only metered costs come from deliberately routing to Anthropic Claude or Google Gemini.

What's the difference between Foundation Models, PCC, and Core AI?

Foundation Models gives you Apple's built-in on-device models (and PCC access) through a Swift API. Private Cloud Compute is Apple's privacy-grade cloud tier for heavier tasks that need more context or chain-of-thought reasoning. Core AI is the framework for converting and running your own custom models — open-source or proprietary — directly on the device.

Can I run a custom or proprietary model on iPhone now?

Yes. Core AI converts a PyTorch model to Apple's .aimodel format, compresses it (often ~85% smaller at int4), and ships it via Background Assets. The model and its data never leave the device — which is what makes it viable for regulated industries like healthcare, legal, and finance.

Which iPhones support Apple Intelligence and Foundation Models?

AFM 3 Core runs on iPhone 15 Pro and newer (~65–70% of US iPhones). The higher-capability AFM 3 Core Advanced, which adds image input, requires top-tier devices like the iPhone Air and iPhone 17 Pro (~25–30%). The iPhone 14 line and base iPhone 15 are not eligible, so plan a degradation path.

How does the new LanguageModel protocol help future-proof an app?

It makes the model a runtime choice instead of a compile-time dependency. The same LanguageModelSession and downstream code can target Apple's on-device model, PCC, a custom Core AI model, Claude, or Gemini — so switching providers, or supporting different providers per B2B customer, is a configuration change rather than a rewrite.

Sources & further reading

Apple-original sources

WWDC 2026 sessions cited: 241 (What's new in the Foundation Models framework), 319 (Build with the new Apple Foundation Model on PCC), 324 (Meet Core AI), 325 (Dive into Core AI model authoring and optimization), 326 (Integrate on-device AI models into your app using Core AI), 339 (Bring an LLM provider to the Foundation Models framework).

Third-party context: 9to5mac (on-device AI explainer), appcircle.io (Core AI framework explained), and the open-source apple/coreai-models recipes (Qwen3, Mistral, SAM3). Third-party LanguageModel conformers: ClaudeForFoundationModels (GitHub) and the Firebase Apple SDK for Gemini.

For the past two years, the AI story in the Apple ecosystem has followed the same pattern — a staggered Apple Intelligence rollout, a delayed Siri rebuild, "personal context" landing only partially across releases, and the EU getting almost none of it. Heading into WWDC 2026, the question was simple: is this finally the year Apple delivers, or just another chapter of delay?

From my perspective as an iOS engineer at NineTwoThree, here's the honest answer — for the consumer-facing Siri AI experience, the wait continues. But for on-device developer AI, this is genuinely the year the platform landed, and in a way that changes the economics of mobile AI features we'd previously written off as too expensive to ship.

What actually landed for developers at WWDC 2026

Apple opened the Foundation Models framework to third-party apps. Every Apple Intelligence-capable iPhone now ships with a 3-billion-parameter on-device model — AFM 3 Core — accessible through a native Swift API. It handles writing, summarization, translation across 25 languages, content classification, and tool calls, running at roughly 30 tokens per second on an iPhone 15 Pro. No cloud bill, no API key, no quota.

For heavier tasks, Private Cloud Compute (PCC) is now developer-accessible — a cryptographically attested environment where Apple can't read the request, verified independently by security researchers. And here's the part that changes the calculation for us: for developers in the App Store Small Business Program with fewer than 2 million lifetime downloads, PCC is free. The per-token cloud cost that has historically killed mobile AI features isn't really a blocker anymore for most teams.

There's also a new unifying Swift protocol — LanguageModel — that lets the same session and API run against Apple's on-device model, PCC, MLX, Anthropic Claude, or Google Gemini. Switching providers becomes a one-line change rather than a rewrite.

The three things that flipped

For about three years the industry consensus has been simple: real AI runs in the cloud, and mobile clients are thin wrappers. That held because per-token cloud inference was expensive, model files were too big to ship to a phone, and on-device hardware couldn't run a usable language model. WWDC 2026 broke all three assumptions at once.

  1. Free per-user inference became the default. A 3B model ships on every capable iPhone, plus a privacy-grade cloud tier that's free for most small and mid-sized teams.
  2. Model size stopped being a packaging problem. Core AI's compression toolchain shrinks an open model like SAM3 from ~3 GB to ~430 MB at int4 with no meaningful quality loss, and ships it via Background Assets instead of the app bundle.
  3. Provider lock-in became optional. One Swift protocol lets the same app call Apple's model, Claude, or Gemini — chosen at runtime, not at compile time.

The net effect: a lot of features we rejected as "too expensive to operate" two years ago are now zero-bill, on-device features.

Why this changes the economics of mobile AI

The single biggest commercial shift is the cost model. Most of the stack is now free for most teams. On-device models (AFM 3 Core and Core Advanced) carry no metering at all. PCC is free under the Small Business Program below 2M lifetime downloads. Custom on-device models built with Core AI cost nothing to run — the user downloads them once via Background Assets. The only metered costs are deliberate choices to route to Claude or Gemini for capabilities Apple's models don't yet match.

That reframes the conversation we have with clients. Features that used to come with an open-ended per-image or per-token operating bill — receipt parsing, photo-to-structured-data, message rewriting, content tagging — are now fixed-cost or free, and they run on the phone.

What this unlocks for regulated industries

There's a second story that matters even more for the healthcare, legal, and finance clients we work with: the privacy posture is different now. For years the blocking question on any regulated build has been "can we send this data to a cloud LLM?" — and the honest answer was usually no.

On this stack, that question finally has a defensible answer. On-device inference means the data never leaves the phone. PCC means that when you do need the cloud, Apple can cryptographically prove — to the client and to independent researchers — that no one, including Apple, can read the request. And Core AI means a client's proprietary model (a medical imaging classifier, a fraud-detection model, a domain-specific LLM they own) can ship to a phone without the model or its data ever touching a third-party vendor. That path simply didn't exist cleanly before.

What I'd build on this stack today

A few patterns are immediately viable, and I'd reach for them in this order:

  • On-device only (zero cost, zero data exposure): note and email summarization, structured extraction from receipts and forms, content classification, local Q&A grounded in the user's own content, voice-note transcription with on-device summary, style-rewrite assistants, and writing assistants that personalize to the user over time.
  • Route to PCC (free at small scale): long-document summarization (50-page contracts), multi-page OCR, multi-step or agentic workflows that need chain-of-thought, and anything over the 8K on-device context window.
  • Route to Core AI (custom on-device models): domain-specific LLMs starting from Qwen3 or Mistral, proprietary classifiers a client can't send to a third party, image segmentation with SAM3, and fully-local RAG.
  • Reach for Claude or Gemini when you need frontier-tier capability, server-side tools like web search or code execution, direct audio input, or a client has a strong vendor preference.

The honest caveats: device tiers, the EU, and the 2M-download cliff

This isn't unqualified good news, and it's worth being straight about the limits.

Device eligibility is the big one. Apple Intelligence — the gate on Foundation Models access — runs on roughly 65–70% of US iPhones today (iPhone 15 Pro and newer). The higher-capability AFM 3 Core Advanced model, which adds image input, reaches only ~25–30%. That leaves roughly a third of users on devices that get nothing — and many of them are on recent hardware like the iPhone 14 or the base iPhone 15, which is still on sale. Any feature you build needs a graceful degradation path.

The free tier has a ceiling. Cross 2 million lifetime downloads and you exit the free PCC tier. Apple hasn't published pricing above that line yet, so treat it as TBD in any planning conversation.

The EU gets the frameworks but not the assistant. Foundation Models, PCC, Core AI, and the LanguageModel protocol all work in the EU. What's restricted is the consumer Siri AI assistant layer, which Apple attributes to DMA compliance. The developer-facing stack is not regionally gated.

The bottom line

If you were waiting for Siri to become the AI assistant Apple has been promising, this was another year on hold. But if you build apps, WWDC 2026 was the window opening. The platform now hands you free on-device inference, a privacy-grade cloud tier, and a clean path to ship custom models — all behind native Swift APIs. The features we used to price out of existence are suddenly shippable. That's the part worth acting on now.

This is exactly the kind of shift we help teams turn into shipped product. If you're weighing what's now possible on-device — especially in a regulated industry — talk to our team about your AI roadmap.

The technical reference: Apple's on-device AI stack after WWDC 2026

For the engineers: here's what each piece of the stack does, where the limits are, and how the pieces compose. Scope is intentionally narrow — the AI inference surfaces available to a third-party iOS developer, on-device or in PCC.

AFM 3 Core — the free 3B model on every modern iPhone

Apple Foundation Models 3 Core is the on-device model bundled with iOS 27, accessible through the FoundationModels Swift framework. It's a true small LLM, not a feature-specific classifier, and it runs entirely on the device.

  • Capabilities: text generation, rewriting, summarization, translation across 25 languages, classification and structured extraction, multi-turn streaming conversation, and tool calling against built-in and custom Swift tools. Strongly-typed outputs come via the @Generable macro and @Guide field hints.
  • Capacity: 8,000-token context window on iOS 27 (~6,000 words); ~30 tokens/sec on an iPhone 15 Pro; zero cost per request.
  • Device floor: iPhone 15 Pro and newer (~65–70% of US iPhones, rising ~1–2 points/month).
  • Limits to plan for: no chain-of-thought reasoning (route to PCC for multi-step planning); 8K context means longer inputs must be chunked; .refusal and .guardrailViolation errors are part of the contract; English-first at launch, with the other 24 languages rolling in over the cycle.

The framework defaults to AFM 3 Core via SystemLanguageModel.default — swapping to PCC or a third-party provider is a one-line change to the model parameter.

AFM 3 Core Advanced — 20B sparse MoE, top-tier devices only

The high-capability on-device variant is a 20-billion-parameter sparse Mixture-of-Experts model that activates just 1–4B parameters per token. Apple fits it on a phone by storing most of it in flash (NAND) rather than DRAM, using a lightweight dense block to select experts dynamically. Apple credits two techniques: Instruction-Following Pruning (IFP) for deployment beyond DRAM constraints, and Quantization Aware Training (QAT) to preserve accuracy at lower precision.

  • What changes vs. Core: accepts image input (multimodal), and reaches closer to frontier quality on hard summarization, extraction, and reasoning that doesn't strictly require chain-of-thought. Same Swift API — the framework picks the variant automatically.
  • Device floor (the catch): iPhone Air, iPhone 17 Pro / Pro Max, M4+ iPads ≥12 GB, M3+ Macs ≥12 GB, Apple Vision Pro M5 — roughly 25–30% of US iPhones. Plan to degrade to text-only or route to PCC on weaker devices.

Private Cloud Compute — Apple's privacy-grade cloud tier

PCC is the cloud half of the stack — not a traditional cloud LLM, but a cryptographically attested compute environment where Apple proves no party (including Apple) can read the data. Announced for Apple's own use at WWDC 2024, it opened to third-party developers at WWDC 2026. It hosts three models: AFM 3 Cloud (the server-side workhorse), ADM 3 Cloud (Image) for image generation/editing, and AFM 3 Cloud Pro (the most capable, for agentic tool use and complex reasoning). Notably, AFM 3 Cloud Pro runs on Google Cloud infrastructure with NVIDIA GPUs — the first time PCC has extended beyond Apple silicon — with privacy guarantees maintained via the same attestation.

  • What PCC adds over Core: ~32,000-token context, chain-of-thought reasoning at three intensities (.light, .moderate, .deep), multi-step planning and agentic tool-calling, long-document summarization, multi-page OCR, and multi-image reasoning.
  • What PCC can't do: image generation (that's the separate ADM 3 surface), direct audio/video input (transcribe with SpeechAnalyzer first), built-in web search or code execution, custom weights or fine-tuning, offline operation, or persistent cross-session memory.
  • Free tier: free for Small Business Program developers (<$1M/year) under 2M lifetime downloads. iCloud+ subscribers get higher per-user daily quotas — that subsidy lands on the user, not the developer. Pricing above 2M downloads is unpublished (TBD).
  • UX implication: the per-user daily quota is shared across every app the user runs. Inspect model.quotaUsage.status before invoking, disable AI-bound buttons near the limit, and surface an iCloud+ upsell where appropriate. Xcode 27 can simulate availability and quota states.

Core AI — shipping custom models on-device

Core AI is a new framework (distinct from, and not replacing, the older Core ML) built for generative-era workloads: LLMs in the 10B+ range, image segmentation models, and custom domain models. Core ML stays for its legacy image-classifier and tabular cases.

The pipeline: convert a PyTorch model to Apple's .aimodel format via torch.export; compress with coreai-opt (int4 per-channel symmetric is the standard preset, with K-means palletization for embeddings and FP4/FP8 for sensitive layers); AOT-compile per target device with xcrun coreai-build; distribute via Background Assets, not the app bundle; and run through high-level wrappers like CoreAILanguageModel or CoreAIImageSegmenter.

The headline demo — SAM3: Meta's Segment Anything Model 3 ran on iPhone after compressing from ~3 GB to ~430 MB at int4 (about an 85% reduction) with no meaningful quality loss, plus a 76% speedup from cached image-encoder reuse via the multi-entrypoint asset feature.

API and lifecycle: three core types — AIModel, InferenceFunction, and NDArray — built on memory-safe, non-escapable Swift types. Specialization (per-device compile) is a formal lifecycle step that can be slow on first load, so trigger it ahead of time; AIModelCache manages cached artifacts and can share them across apps in the same App Group. For transformers, Core AI adds model states and in-place KV caching to avoid recomputing context. A dedicated Core AI Debugger visualizes execution, inspects tensors, and traces operations back to the original Python source.

What ships today: Apple's open-source apple/coreai-models package includes ready pipelines for Qwen3, the Mistral family, and SAM3. These surface through CoreAILanguageModel as Foundation Models providers, so the same @Generable struct, streaming, and tool-call API work against a custom model you ship.

Limits: PyTorch source is required (no direct GGUF/weights-only conversion); torch.export is strict about dynamic shapes and control flow; custom CUDA kernels must be ported to Metal; there's roughly a 1 GB post-compression memory ceiling on iPhone (caps practical size around 10–30B at int4; Mac is much higher); no automatic .mlmodel.aimodel converter; some open-weight licenses (notably Llama) prohibit App Store redistribution; and first-load specialization is slow — schedule it via Background Assets.

LanguageModel protocol — the provider-agnostic Swift API

The LanguageModel protocol is the unifying abstraction. Anything that conforms — Apple's models, a custom Core AI model, Claude, or Gemini — drops into the same LanguageModelSession, and the downstream call sites stay identical. Five providers ship at WWDC 2026:

ProviderWhere it runsHow you get it
Apple Foundation ModelsOn-device (Core / Core Advanced) and PCCBuilt into the OS — SystemLanguageModel
MLXOn-device — any open-weight model converted to MLXApple's MLX framework
Core AIOn-device — custom .aimodel assetsCoreAILanguageModel
Anthropic ClaudeCloud (Anthropic-hosted)ClaudeForFoundationModels SPM (Apache-2.0)
Google GeminiCloud (Google-hosted)Firebase Apple SDK

What it enables: provider-tier routing inside one app (cheap on-device for triage, PCC for harder tasks, Claude/Gemini for frontier needs), per-customer model selection in B2B without code changes, failover patterns, and future-proof shells where the "best model" is a runtime config.

Gotchas: provider parity isn't guaranteed (Claude can throw .unsupportedGenerationGuide for structured outputs Apple supports); API keys must use proxied auth in production; you own the third-party token bill; tool support varies (Claude exposes .webSearch/.webFetch/.codeExecution; Apple's on-device model has OCRTool, BarcodeReaderTool, SpotlightSearchTool); and OpenAI is an Xcode 27 agent option but not yet a first-party LanguageModel conformer.

Multimodal image input, SpeechAnalyzer, and on-device personalization

  • On-device image input: AFM 3 Core Advanced reasons over photos via Attachment(image) in the prompt builder (UIImage, CGImage, CVPixelBuffer, file URLs, and more), with OCRTool() and BarcodeReaderTool() built in, and can fill a @Generable struct directly from a photo. Image attachments cost tokens proportional to size, so downscale before sending. It returns text/structured output, not pixel masks — use Vision's segmentation requests for masks — and devices on plain AFM 3 Core fall back to text-only.
  • SpeechAnalyzer + AI: Apple's on-device transcription stack (SpeechTranscriber, DictationTranscriber, SpeechDetector) now feeds transcripts straight into LanguageModelSession. The fully on-device shape — transcribe locally, then summarize/structure with AFM 3 — means audio never leaves the device, which unlocks meeting recorders, voice journals, field-inspection reports, and medical/legal session notes. (For direct audio input you still route to a provider like Gemini.)
  • On-device adapter training: iOS 27 ships a Foundation Models Adapter Trainer that fine-tunes a small LoRA-style adapter per user, entirely on-device. The base model updates with the OS; the megabyte-scale adapter persists in the app sandbox; the training data never leaves the phone. It's ideal for writing-style learning, per-user domain vocabulary, and tone-matching — but cold-start users have no adapter (design for day-zero usefulness), adapter quality depends on data quality, and training is compute-heavy, so schedule it for idle/charging periods.

The cost model — what's free, what isn't

TierCost to developerCost to user
AFM 3 Core (on-device)$0 — no metering$0 — runs on their phone
AFM 3 Core Advanced (on-device)$0 — no metering$0 — runs on their phone
PCC — Small Business Program, <2M downloads$0 — freePer-user daily quota; higher with iCloud+
PCC — above 2M lifetime downloadsPricing not yet published (TBD)Same per-user daily quota
Core AI (custom on-device models)$0 — no metering$0; one-time Background Assets download
Anthropic Claude via LanguageModelAnthropic's per-token pricing$0
Google Gemini via LanguageModelGoogle's per-token pricing$0

Keeping PCC free requires both conditions: enrollment in the App Store Small Business Program (under $1M/year) and fewer than 2 million lifetime first-time downloads across all apps. A single app crossing 2M downloads exits the free tier; Apple's stated intent is to publish above-threshold pricing before any developer is forced off.

Device tiers and install-base reality

TierDevicesUS install-base share
AFM 3 Core eligibleiPhone 15 Pro/Pro Max; iPhone 16 line; iPhone Air; iPhone 17 / 17 Pro / Pro Max~65–70% (rising ~1–2 pts/month)
AFM 3 Core Advanced eligibleiPhone Air; iPhone 17 Pro / Pro Max; M4+ iPads ≥12 GB; M3+ Macs ≥12 GB; Apple Vision Pro M5~25–30%
Not eligibleiPhone 14 line and older; base iPhone 15 / 15 Plus~30–35%

The AI-capable base is the largest it has ever been at a feature launch — but ~30% on non-eligible devices is not an edge case, and much of it is recent hardware still in active sale. Build for graceful degradation.

Frequently asked questions

Is WWDC 2026 the year Apple's AI finally delivered?

It depends on what you're waiting for. The consumer-facing Siri AI assistant is still delayed, and in the EU it's restricted for DMA reasons. But for developers, WWDC 2026 delivered: a free on-device model on every capable iPhone, a privacy-grade cloud tier, a framework for shipping custom models on-device, and one Swift protocol that makes providers interchangeable.

Is Apple's on-device AI free to use?

For most teams, yes. On-device models (AFM 3 Core and Core Advanced) and custom Core AI models carry no metering. Private Cloud Compute is free for App Store Small Business Program developers under 2 million lifetime downloads. The only metered costs come from deliberately routing to Anthropic Claude or Google Gemini.

What's the difference between Foundation Models, PCC, and Core AI?

Foundation Models gives you Apple's built-in on-device models (and PCC access) through a Swift API. Private Cloud Compute is Apple's privacy-grade cloud tier for heavier tasks that need more context or chain-of-thought reasoning. Core AI is the framework for converting and running your own custom models — open-source or proprietary — directly on the device.

Can I run a custom or proprietary model on iPhone now?

Yes. Core AI converts a PyTorch model to Apple's .aimodel format, compresses it (often ~85% smaller at int4), and ships it via Background Assets. The model and its data never leave the device — which is what makes it viable for regulated industries like healthcare, legal, and finance.

Which iPhones support Apple Intelligence and Foundation Models?

AFM 3 Core runs on iPhone 15 Pro and newer (~65–70% of US iPhones). The higher-capability AFM 3 Core Advanced, which adds image input, requires top-tier devices like the iPhone Air and iPhone 17 Pro (~25–30%). The iPhone 14 line and base iPhone 15 are not eligible, so plan a degradation path.

How does the new LanguageModel protocol help future-proof an app?

It makes the model a runtime choice instead of a compile-time dependency. The same LanguageModelSession and downstream code can target Apple's on-device model, PCC, a custom Core AI model, Claude, or Gemini — so switching providers, or supporting different providers per B2B customer, is a configuration change rather than a rewrite.

Sources & further reading

Apple-original sources

WWDC 2026 sessions cited: 241 (What's new in the Foundation Models framework), 319 (Build with the new Apple Foundation Model on PCC), 324 (Meet Core AI), 325 (Dive into Core AI model authoring and optimization), 326 (Integrate on-device AI models into your app using Core AI), 339 (Bring an LLM provider to the Foundation Models framework).

Third-party context: 9to5mac (on-device AI explainer), appcircle.io (Core AI framework explained), and the open-source apple/coreai-models recipes (Qwen3, Mistral, SAM3). Third-party LanguageModel conformers: ClaudeForFoundationModels (GitHub) and the Firebase Apple SDK for Gemini.

color-rectangles

Subscribe To Our Newsletter