Build with the new Apple Foundation Model on Private Cloud Compute

Private Cloud Compute lets you access powerful, frontier-class models while protecting user privacy. Explore how it works and how to access it using the Foundation Models framework. Discover best practices for checking availability and handling graceful fallbacks in your apps.

Watch Video (11 min)4 min read

Key Takeaways

🔄 Server model is a one-line swap on LanguageModelSession — tools and @Generable stay the same
🧠 PCC for reasoning and 32K context; on-device when you need offline with no daily cap
💰 Free PCC usage for apps under 2M downloads, no API keys
📋 Quota is per-user via iCloud — gate on isAvailable, surface quotaUsage in UI

Introduction

Private Cloud Compute (PCC) exposes a server LLM through Foundation Models
Apple’s Server Model and On-device model are accessed through the same APIs
PCC is for workloads that need more context, reasoning, or heavy tool loops

What is Private Cloud Compute

Server model; request data isn’t stored or retained after the response
Wired into OS/iCloud — no auth setup, no API keys, no per-token billing for developers (for apps with under 2M downloads)
Per-user daily quota (higher tier with iCloud+) for users
Requires managed entitlement

Integrating PCC with Foundation Models

Default LanguageModelSession() uses on-devicemodel ; pass PrivateCloudComputeLanguageModel() to switch
@Generable, tools, and respond signatures are unchanged between models
PCC features can be gated based on a users available usage quota using model.isAvailable

import FoundationModels

// On-device (default)
let session = LanguageModelSession()
let response = try await session.respond(to: "Summarize this article: \(article)")

// PCC — one-line change
let pccSession = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
let pccResponse = try await pccSession.respond(to: "Summarize this article: \(article)")

let session = LanguageModelSession(
    model: PrivateCloudComputeLanguageModel(),
    tools: [FindRelatedArticlesTool.self]
)

let response = try await session.respond(
    to: "Summarize this article: \(article)",
    generating: ArticleSummary.self  // @Generable works the same
)

On-device vs PCC

	On-device	PCC
Offline	✅	❌
Daily limit	None	Per-user quota
`contextSize`	4K (8K on newer devices in 27.0)	32K
Reasoning	❌	✅

Reasoning and context

Reasoning = extra transcript text before the response; levels: .light, .moderate, .deep
Set via ContextOptions(reasoningLevel:) on respond
Reasoning tokens count against contextSize — check the property on each model type

let response = try await session.respond(
    to: prompt,
    contextOptions: ContextOptions(reasoningLevel: .light)
)

SystemLanguageModel().contextSize          // 4096 (26.0), 8192 (27.0)
PrivateCloudComputeLanguageModel().contextSize  // 32768

Usage limits

Quota is per-user via iCloud, not per-app
Surface limit state in persistent UI, not alerts
quotaUsage.status has an approaching-limit case
- Quota states can be tested through scheme settings to simulate Apple Foundation Model Availabilities

if case .belowLimit(let info) = model.quotaUsage.status, info.isApproachingLimit {
    // warn
}
if model.quotaUsage.isLimitReached {
    // disable + explain
}
if let suggestion = model.quotaUsage.limitIncreaseSuggestion {
    Button("Show options") { suggestion.show() }
}

Missing anything? Corrections? Contributions are welcome!

Written By

harrison-heinig

9 notes contributed

Contributed Notes GitHub