Docs

Veil Guard is a single HTTP API. Every request has a source type, every response has a risk score, list of matched attack types, redacted text, and a suggested action.

Authentication

Bearer token in the Authorization header. Keys are issued through the self-serve email verification flow at /console and look like vgk_….

Authorization: Bearer vgk_2xlWiRzk-3GbOEwvHsQenAScnpYduF_D

Create a free key with a one-time email verification flow. The hosted UI for this lives at /console.

POST /v1/keys/create
{
  "email": "you@company.com"
}

POST /v1/keys/verify
{
  "token": "one-time verification token"
}

GET /v1/account

{
  "name": "you",
  "email": "you@company.com",
  "tier": "free",
  "requests_total": 42,
  "limit": 1000,
  "reset_at": 1776397761.4,
  "stripe_customer_linked": false,
  "has_active_subscription": false
}

POST /v1/scan

Request

{
  "text": "string, max 200,000 chars",
  "source": "user|rag|tool_output|web|system",
  "context": "optional — surrounding text for split-attack detection",
  "llm_judge": false   // Enterprise tier only; default off
}

Response

{
  "risk_score": 0.98,
  "attack_types": ["direct_injection", "system_prompt_leak"],
  "confidence": 0.88,
  "redacted_text": "<<GUARD_REDACTED>>",
  "reasoning": "regex:direct_injection/… + regex:system_prompt_leak/…",
  "action": "block",   // allow | flag | block
  "latency_ms": 3,
  "signals": {
    "regex_findings": 2,
    "context_supplied": true,
    "context_findings": 1,
    "context_used": true,
    "unicode_triggered": false,
    "context_unicode_triggered": false,
    "classifier_available": true,
    "classifier_prob": 0.92,
    "semantic_available": true,
    "semantic_similarity": 0.88,
    "semantic_match": "direct_injection",
    "llm_judge_used": false
  }
}

context is not separately returned or redacted. It is used as a short boundary window so Guard can catch instructions that are split across surrounding text and the primary payload.

Per-source thresholds

Source	Block above	Why
user	0.80	Naturally noisy
rag	0.55	Documents shouldn't speak in imperatives
tool_output	0.50	Tools shouldn't prefix with "You are…"
web	0.50	Untrusted by default
system	0.30	System is controlled — any injection is high signal

POST /v1/scan/stream

Server-Sent Events for long documents. Splits on paragraph boundaries (fallback to 2KB chunks), emits start, one chunk per piece, then a final done with the aggregate.

event: start
data: {"chunks": 3}

event: chunk
data: {"index": 0, "risk_score": 0.0, "attack_types": [], "action": "allow", "snippet": "…"}

event: chunk
data: {"index": 1, "risk_score": 0.98, "attack_types": ["direct_injection"], "action": "block", "snippet": "Ignore previous…"}

event: done
data: {"risk_score": 0.98, "attack_types": [...], "action": "block", "chunks_scanned": 3, "latency_ms": 140}

GET /v1/usage

{
  "tier": "starter",
  "requests_total": 842,
  "limit": 10000,
  "reset_at": 1776397761.4
}

POST /v1/billing/portal

Creates a Stripe customer-portal session so a paying user can update payment method, change plan, or cancel without contacting support.

{
  "return_url": "https://guard.veil-api.com/console"
}

Integration: OpenAI SDK

import httpx
from openai import OpenAI

client = OpenAI()
GUARD = "https://guard.veil-api.com"
GUARD_KEY = "vgk_…"

def guard_scan(text, source="user"):
    r = httpx.post(
        f"{GUARD}/v1/scan",
        headers={"Authorization": f"Bearer {GUARD_KEY}"},
        json={"text": text, "source": source},
        timeout=5,
    )
    return r.json()

user_msg = request_body["message"]
verdict = guard_scan(user_msg, "user")
if verdict["action"] == "block":
    return {"error": "input blocked", "reasoning": verdict["reasoning"]}

resp = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": user_msg}],
)

Integration: Anthropic SDK

import httpx
from anthropic import Anthropic

client = Anthropic()

def guarded_message(user_text, retrieved_docs):
    v1 = httpx.post(GUARD + "/v1/scan",
                    headers={"Authorization": f"Bearer {GUARD_KEY}"},
                    json={"text": user_text, "source": "user"}).json()
    if v1["action"] == "block":
        raise ValueError("user input blocked: " + v1["reasoning"])

    # Indirect injection is the bigger risk — scan the docs too.
    for doc in retrieved_docs:
        v2 = httpx.post(GUARD + "/v1/scan",
                        headers={"Authorization": f"Bearer {GUARD_KEY}"},
                        json={"text": doc, "source": "rag"}).json()
        if v2["action"] == "block":
            retrieved_docs.remove(doc)

    return client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_text + "\n\n" + "\n\n".join(retrieved_docs)}],
    )

Integration: LangChain

from langchain_core.runnables import RunnableLambda
import httpx

def guard_gate(source="user"):
    def _gate(text):
        r = httpx.post(GUARD + "/v1/scan",
                       headers={"Authorization": f"Bearer {GUARD_KEY}"},
                       json={"text": text, "source": source}).json()
        if r["action"] == "block":
            raise ValueError(f"Guard blocked: {r['reasoning']}")
        return text
    return RunnableLambda(_gate)

chain = guard_gate("user") | prompt | llm

Integration: LlamaIndex

from llama_index.core.node_parser import NodeParser
import httpx

def filter_injected_nodes(nodes):
    clean = []
    for n in nodes:
        r = httpx.post(GUARD + "/v1/scan",
                       headers={"Authorization": f"Bearer {GUARD_KEY}"},
                       json={"text": n.text, "source": "rag"}).json()
        if r["action"] != "block":
            clean.append(n)
    return clean

retrieved = retriever.retrieve(query)
safe_nodes = filter_injected_nodes(retrieved)

Integration: LlamaStack

# Use Guard as a shield that runs before every generation step.
from llama_stack_client import LlamaStackClient
import httpx

def guard_shield(message, source="user"):
    r = httpx.post(GUARD + "/v1/scan",
                   headers={"Authorization": f"Bearer {GUARD_KEY}"},
                   json={"text": message, "source": source}).json()
    return r["action"] != "block", r

client = LlamaStackClient(base_url=...)
ok, verdict = guard_shield(user_input, "user")
if not ok:
    raise RuntimeError(verdict["reasoning"])

Error codes

Status	Meaning
400	Invalid source or malformed body
401	Missing or invalid API key
403	Key not enrolled in Veil Guard
429	Rate limit or monthly quota exceeded
502	Upstream Stripe or model error

Limits

200,000 chars per scan (all tiers).
60 scans/minute per API key. 120/min per IP for the public demo.
Monthly quotas: Free 1k, Starter 10k, Growth 100k, Enterprise 1M.
LLM-as-judge is Enterprise-only and must be opted in per request with "llm_judge": true.