
Most AI platforms operate on a simple assumption: You send your prompt, the provider processes it, and you trust that your data remains private.
For many developers, that trust model has always been uncomfortable. Sensitive prompts, proprietary models, and confidential datasets often pass through infrastructure controlled entirely by the service provider.
Now a different approach is emerging.
Chutes (Bittensor Subnet 64) has introduced end-to-end encryption for AI inference, a system designed to remove the infrastructure provider from the trust chain entirely.
With the new release, prompts are encrypted directly on the userβs machine and can only be decrypted by the GPU (Graphics Processing Unit) instance that performs the inference inside a secure environment (Even the platform running the infrastructure cannot read the data!)
The goal is to enforce confidential compute by cryptography, and not policy.
Rethinking Trust in AI Infrastructure
Most AI services follow a conventional architecture. This means that prompts travel from the user to the providerβs servers, where they are processed and returned with a response. While encryption is often used in transit, the provider typically retains the ability to access or inspect the data.
Chutes takes a fundamentally different approach. It now encrypts prompts before they ever leave the userβs machine, and once encrypted, the data moves through the network as ciphertext and remains unreadable until it reaches the secure execution environment where the model is running.
This means that:
a. Chutes network cannot read the prompts,
b. Infrastructure operators cannot read the prompts, and
c. GPU miners cannot read the prompts.
Only the secure execution environment processing the request can decrypt it.
How the Encryption Works
The system relies on a combination of modern cryptographic standards designed to provide strong protection both today and in the future.
Key components of this whole grand protocol include:
a. ML KEM 768: A post-quantum key encapsulation mechanism standardized by NIST,
b. HKDF SHA256: Used for secure key derivation, and
c. ChaCha20 Poly1305: A modern authenticated encryption algorithm used to protect the data itself.
Trusted execution environments publish a public key using ML KEM. When a request is sent, the client generates a fresh ephemeral key pair and performs a secure key exchange with the GPU instance.
This creates forward secrecy, and in practice, this means:
a. Every request uses a unique encryption key, and
b. Past communications cannot be decrypted even if keys are later compromised
Even if someone were able to capture every packet traveling through the network today, those packets would remain unreadable.
According to the design, future quantum computers would still be unable to decrypt them.
Trusted Execution Environments at the GPU Level
Decryption only occurs inside a trusted execution environment, often referred to as a TEE.
A TEE is a secure enclave running directly on the hardware that isolates computation from the surrounding system. Code and data inside the enclave cannot be accessed by the operating system, the infrastructure provider, or the hardware operator.
For Chutes, this means that the GPU performing the inference becomes the only entity capable of reading the prompt and generating the response.
Everything outside the enclave sees only encrypted data.
What This Means for Developers
While the cryptography behind the system is sophisticated, the developer experience is intentionally simple.
Builders can integrate encrypted inference into their applications with minimal changes.
There are currently two main ways to use the system, depending on how an application is built.
a. Using the OpenAI Python SDK (Software Developer Kit): Developers using the OpenAI compatible Python client can enable encryption by installing a small extension.
Steps include installing the package (pip install chutes-e2ee), and passing the custom transport layer into the client.
Once configured, encryption occurs automatically at the HTTP layer. The base API URL remains unchanged, allowing developers to integrate the system without rewriting their applications.
b. Using Other Client Platforms: Developers using other frameworks or languages can deploy a local proxy. Chutes provides an e2ee proxy Docker container that runs locally and handles the encryption pipeline.
The proxy manages format translation, key exchange, encryption, and streaming decryption. It also supports multiple API standards, including OpenAI compatible APIs, the newer Responses API specification used by tools like Codex, and Anthropic Messages API for Claude style clients
Streaming responses remain fully supported, and billing continues to operate under the same token based model developers expect.
Both the proxy and supporting tools are open source under the MIT license.
Encryption Across the Entire Model Stack
The encryption system works across all models available on Chutes. However, the strongest privacy guarantees are achieved when the models run inside TEE-enabled environments, where the encrypted request is decrypted only within the secure enclave.
This design creates a fully confidential pipeline:
a. Prompt encrypted on the client device,
b. Ciphertext transmitted through the network,
c. Decryption occurs only inside the secure GPU environment, and
d. Response returned to the client.
At no point can the infrastructure provider access the raw data.
A New Standard for Confidential AI Compute
Confidential computing has become a popular concept across cloud and AI infrastructure. But in many systems, privacy protections depend on trust in the provider.
Chutes takes a different stance.
Instead of asking developers to trust the platform, the system removes the platform from the equation entirely. Privacy becomes a property enforced by mathematics rather than policy.
For developers working with proprietary models, sensitive prompts, or regulated data, this shift could become increasingly important as AI systems continue to expand into critical applications.
Privacy by Design
As AI infrastructure grows, questions about data ownership and confidentiality are becoming more urgent.
End-to-end encrypted inference represents one possible answer.
By encrypting prompts at the source and limiting decryption to secure execution environments, Chutes introduces a model where sensitive data never needs to be exposed to the infrastructure provider at all.
Systems like this could redefine how developers think about trust in AI platforms.
Not as a promise from the provider, but as a guarantee built directly into the architecture itself.

Be the first to comment