Key Takeaways
🔄 Server model is a one-line swap on
LanguageModelSession— tools and@Generablestay the same🧠 PCC for reasoning and 32K context; on-device when you need offline with no daily cap
💰 Free PCC usage for apps under 2M downloads, no API keys
📋 Quota is per-user via iCloud — gate on
isAvailable, surfacequotaUsagein UI
Introduction
Private Cloud Compute (PCC) exposes a server LLM through Foundation Models
Apple’s Server Model and On-device model are accessed through the same APIs
PCC is for workloads that need more context, reasoning, or heavy tool loops
What is Private Cloud Compute
Server model; request data isn’t stored or retained after the response
Wired into OS/iCloud — no auth setup, no API keys, no per-token billing for developers (for apps with under 2M downloads)
Per-user daily quota (higher tier with iCloud+) for users
Requires managed entitlement
Integrating PCC with Foundation Models
Default
LanguageModelSession()uses on-devicemodel ; passPrivateCloudComputeLanguageModel()to switch@Generable, tools, andrespondsignatures are unchanged between modelsPCC features can be gated based on a users available usage quota using
model.isAvailable
import FoundationModels
// On-device (default)
let session = LanguageModelSession()
let response = try await session.respond(to: "Summarize this article: \(article)")
// PCC — one-line change
let pccSession = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
let pccResponse = try await pccSession.respond(to: "Summarize this article: \(article)")let session = LanguageModelSession(
model: PrivateCloudComputeLanguageModel(),
tools: [FindRelatedArticlesTool.self]
)
let response = try await session.respond(
to: "Summarize this article: \(article)",
generating: ArticleSummary.self // @Generable works the same
)On-device vs PCC
| On-device | PCC | |
|---|---|---|
| Offline | ✅ | ❌ |
| Daily limit | None | Per-user quota |
contextSize | 4K (8K on newer devices in 27.0) | 32K |
| Reasoning | ❌ | ✅ |
Reasoning and context
Reasoning = extra transcript text before the response; levels:
.light,.moderate,.deepSet via
ContextOptions(reasoningLevel:)onrespondReasoning tokens count against
contextSize— check the property on each model type
let response = try await session.respond(
to: prompt,
contextOptions: ContextOptions(reasoningLevel: .light)
)
SystemLanguageModel().contextSize // 4096 (26.0), 8192 (27.0)
PrivateCloudComputeLanguageModel().contextSize // 32768Usage limits
Quota is per-user via iCloud, not per-app
Surface limit state in persistent UI, not alerts
quotaUsage.statushas an approaching-limit caseQuota states can be tested through scheme settings to simulate Apple Foundation Model Availabilities
if case .belowLimit(let info) = model.quotaUsage.status, info.isApproachingLimit {
// warn
}
if model.quotaUsage.isLimitReached {
// disable + explain
}
if let suggestion = model.quotaUsage.limitIncreaseSuggestion {
Button("Show options") { suggestion.show() }
}