Documentation

Mask SDK

Runtime cryptographic guardrails for autonomous AI systems. Open-source. Framework-agnostic. Enterprise-ready.

Overview

Mask is an enterprise-grade AI Data Loss Prevention (DLP) SDK. It acts as the runtime enforcement layer between your Large Language Models and your tool execution environment, ensuring that LLMs never see raw PII while maintaining flawless tool execution for end users.

The SDK uses Just-In-Time (JIT) Encryption and Decryption Middleware to intercept data flowing to and from LLMs. Pre-tool hooks decrypt parameters so tools can execute with real data, while post-tool hooks instantly encrypt sensitive entities before results return to the model's context.

Combined with Deterministic Vaultless FPE, downstream schemas and tool calls stay intact — tokens look like real data (emails, phone numbers, SSNs, credit cards) but contain no actual PII. While generation is deterministic and vaultless, the SDK utilizes your configured vault for secure recovery and high-fidelity audit trails.

Installation

Install the core SDK with pip:

pip install mask-privacy

Add optional extras depending on your infrastructure and framework:

pip install mask-privacy[redis]

pip install mask-privacy[dynamodb]

pip install mask-privacy[memcached]

pip install mask-privacy[langchain]

pip install mask-privacy[llamaindex]

pip install mask-privacy[adk]

pip install "mask-privacy[remote]"

Installing AI Models

Mask uses powerful NLP engines for PII detection. Install the spacy extra and then download your preferred model:

1. Install with spaCy support

pip install "mask-privacy[spacy]"

2. Download the NLP model (choose one)

python -m spacy download en_core_web_sm

python -m spacy download en_core_web_md

python -m spacy download en_core_web_lg

For a typical production environment, you might combine extras:

pip install "mask-privacy[spacy,redis]"

python -m spacy download en_core_web_lg

Configuration

Mask requires an encryption key and a vault backend selection. Configure these via environment variables:

bash

# 1. Generate and set your encryption key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
export MASK_ENCRYPTION_KEY="..."

# 2. Select your failure strategy (open or closed)
# "closed" raises exceptions on vault/policy errors; "open" continues gracefully.
export MASK_FAIL_STRATEGY=open 

# 3. Select your scanner type (local or remote)
export MASK_SCANNER_TYPE=local # Options: local, remote
export MASK_SCANNER_URL=http://presidio-analyzer:5001/analyze # If remote
export MASK_NLP_TIMEOUT_SECONDS=30 # Max time for individual PII analysis

# 4. Select your vault type
export MASK_VAULT_TYPE=memory  # Options: memory, redis, dynamodb, memcached

# 5. Configure your vault backend (if applicable)
# Redis:
export MASK_REDIS_URL=redis://localhost:6379/0
# DynamoDB:
export MASK_DYNAMODB_TABLE=mask-vault
export MASK_DYNAMODB_REGION=us-east-1

# 6. Optional: Disable local audit buffer
export MASK_DISABLE_AUDIT_DB=true

Note: For production environments, MASK_ENCRYPTION_KEY must be set — the SDK will not start without it.

Enterprise Key Management

For zero-trust environments, you can fetch secrets dynamically from AWS KMS, Azure Key Vault, or HashiCorp Vault at runtime using a provider:

python

from mask_privacy.core.key_provider import set_key_provider, AwsKmsKeyProvider
set_key_provider(AwsKmsKeyProvider(key_id="alias/mask"))

Quick Start

Here's a minimal example using the core MaskClient API to encode and decode sensitive data:

python

from mask_privacy.client import MaskClient
from mask_privacy.core.vault import MemoryVault
from mask_privacy.core.crypto import CryptoEngine

# Create an isolated Mask client
client = MaskClient(
    vault=MemoryVault(),
    crypto=CryptoEngine(your_encryption_key),
    ttl=3600  # Token TTL in seconds
)

# Encode: PII → Format-Preserving Token
token = client.encode("user@example.com")
# token → "tkn-a1b2c3d4@email.com"

# Decode: Token → Original PII
original = client.decode(token)
# original → "user@example.com"

The encode() method detects the data type, generates a format-preserving token, and stores the mapping. For structured data like emails, Mask uses Vaultless FPE, meaning the encryption is deterministic and doesn't require database storage for 1:1 round-trips.

Async API

The latest version introduces native asyncio support for high-throughput environments:

python

import asyncio
from mask_privacy import aencode, adecode

async def main():
    token = await aencode("alice@example.com")
    print(f"Token: {token}")
    original = await adecode(token)
    print(f"Original: {original}")

asyncio.run(main())

Robust Error Handling

v0.4.0 introduces granular exceptions for production reliability. You can now catch specific errors rather than genericException classes.

Exception	Raised When...
MaskVaultConnectionError	Redis/DynamoDB is unreachable (when in `closed` mode)
MaskDecryptionError	Cryptographic failure or corrupt vault data
MaskNLPTimeout	PII analysis exceeds the configured `MASK_NLP_TIMEOUT_SECONDS`
MaskPolicyError	Remote governance policy fetch fails (when in `closed` mode)

python

from mask_privacy import MaskClient, MaskVaultConnectionError

client = MaskClient()
try:
    token = client.encode("sensitive data")
except MaskVaultConnectionError:
    # Use fallback logic if the vault is unreachable
    pass

Framework Integrations

Mask integrates seamlessly by injecting dynamic, recursive hooks into your agent's execution pipeline. Pre-hooks decode incoming tool arguments, and post-hooks encode any PII found in tool outputs.

LangChain

Mask integrates with LangChain via an explicit @secure_tool decorator for maximum transparency:

python

from mask_privacy.integrations.langchain_hooks import secure_tool

@secure_tool
def send_email_tool(email: str, message: str) -> str:
    # email is automatically decrypted here
    return send_email_backend(email, message)

Alternatively, you can use the MaskToolWrapper for existing tool instances:

python

from langchain.agents import AgentExecutor
from mask_privacy.integrations.langchain_hooks import MaskCallbackHandler, MaskToolWrapper

# Wrap tools for automatic PII protection
secure_tools = [MaskToolWrapper(my_email_tool)]

# Add the callback handler for audit logging
agent_executor = AgentExecutor(
    agent=my_agent,
    tools=secure_tools,
    callbacks=[MaskCallbackHandler()]
)

LlamaIndex

Use MaskToolWrapper to wrap your callable for input detokenization and output tokenization:

python

from llama_index.core.tools import FunctionTool
from mask_privacy.integrations.llamaindex_hooks import MaskToolWrapper

secure_email_tool = FunctionTool.from_defaults(
    fn=MaskToolWrapper(my_email_function),
    name="send_email",
    description="Sends a secure email"
)

Google ADK

Use the decrypt_before_tool and encrypt_after_tool callbacks:

python

from google.adk.agents import Agent
from mask_privacy.integrations.adk_hooks import decrypt_before_tool, encrypt_after_tool

secure_agent = Agent(
    name="secure_assistant",
    model=...,
    tools=[...],
    before_tool_callback=decrypt_before_tool,
    after_tool_callback=encrypt_after_tool,
)

API Reference

The Mask SDK provides a clean, composable API for encoding, decoding, and managing encrypted data.

MaskClient

The explicit MaskClient API supports fully isolated instances for multi-tenant environments:

Parameter	Type	Description
vault	BaseVault	Vault backend instance
crypto	CryptoEngine	Encryption engine with your key
ttl	int	Token time-to-live in seconds

Method	Description
encode(value)	Detects PII type, creates an FPE token, encrypts and vaults the original value
decode(token)	Retrieves and decrypts the original value from the vault (strict — raises on failure)
detokenize_text(text)	Scans a large block of text (e.g. email body) and restores all found tokens
sync_policy()	Synchronizes detection requirements with a remote Control Plane

Vault Backends

Mask supports pluggable vault backends for storing encrypted token mappings:

Vault	Import	Use Case
MemoryVault	mask_privacy.core.vault	Local dev & testing
RedisVault	mask_privacy.vaults.redis	Distributed, multi-pod K8s
DynamoDBVault	mask_privacy.vaults.dynamodb	AWS-native deployments
MemcachedVault	mask_privacy.vaults.memcached	High-throughput caching

Format-Preserving Tokens

Mask generates tokens that retain the exact format of the original data, ensuring downstream validators and schemas never break:

Data Type	Format	Safety Guarantee
Email	tkn-<hex>@email.com	Unique prefix
International Phone	+44 XXXX XXXXXX	UK (+44), FR (+33), DE (+49)
SSN	000-00-XXXX	Area Number 000 never issued
Credit Card	4000-0000-0000-XXXX	Visa reserved test BIN
Routing Number	0000XXXXX	Validated via ABA checksum
Passport	[A-Z][0-9]{8}	US Passport standard format
Date of Birth	YYYY-MM-DD	MM/DD/YYYY also supported

Heuristic Safety: By using universally invalid prefixes, Mask guarantees it will never accidentally mistake real PII for a token. Common safety prefixes include 000 for SSNs and 0000 for Routing numbers.

Testing

Run the full test suite with pytest:

uv run pytest tests/ -v

The test suite covers:

Format-Preserving Tokenization — validates token format integrity across all data types
Vault Backends — tests store, retrieve, delete, and TTL expiry for all backends
Telemetry — validates asynchronous audit event buffering and network resilience
Framework Hooks — verifies LangChain, LlamaIndex, and ADK integrations

You can also run the interactive demo to observe the privacy middleware in action:

uv run python examples/test_agent.py

Telemetry & Auditing

Mask automatically records telemetry and audit trails for all encryption and decryption events. Raw PII is never logged.

The SDK features a 100% local-first architecture. Instead of forwarding events to a remote API, the audit logger buffers events locally and safely flushes them to a local SQLite database (.mask_audit.db) and stdout.

Note: You can completely disable the local audit database by setting MASK_DISABLE_AUDIT_DB=true in your environment.