KAPEX Beta
getkapex.ai GitHub

Getting Started with KAPEX

KAPEX is an LLM memory middleware API. It intercepts your LLM's inputs and outputs, builds a salience-scored memory graph for each user, and injects the most relevant context into every new query. Your LLM remembers what matters -- without you managing any state.

Sign Up for a Trial

Every KAPEX integration starts with a free trial. You get 25 users, all features, and 30 days to evaluate.

curl -X POST https://api.getkapex.ai/api/v1/trial/signup \
  -H "Content-Type: application/json" \
  -d '{
    "company_name": "Acme Health",
    "email": "dev@acmehealth.com",
    "name": "Jordan Park"
  }'
import requests

resp = requests.post(
    "https://api.getkapex.ai/api/v1/trial/signup",
    json={
        "company_name": "Acme Health",
        "email": "dev@acmehealth.com",
        "name": "Jordan Park"
    }
)
data = resp.json()
print(data["api_key"])  # save this, it won't be shown again

Response:

{
  "status": "trial_active",
  "tenant_id": "tn_acme_health_01",
  "api_key": "a1b2c3d4e5f6g7h8i9j0",
  "trial_ends": "2026-07-18T00:00:00Z",
  "limits": {
    "max_users": 25,
    "max_nodes_per_user": 5000,
    "rate_limit_rpm": 60,
    "rate_limit_daily": 10000,
    "features": "all"
  },
  "quickstart": "https://docs.getkapex.ai/",
  "base_url": "https://api.getkapex.ai/api/v1"
}

Save your api_key immediately. It is only returned once.

Store a Memory

After each conversation turn, send the user's message to KAPEX so it can build the memory graph.

curl -X POST https://api.getkapex.ai/api/v1/ingest \
  -H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_001",
    "content": "I just got promoted to engineering manager at Dataflow. My team is 12 people now."
  }'

Response:

{
  "node_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "scoring": {
    "base_score": 0.48,
    "sdv": 0.62,
    "cscv": 0.0,
    "lcs": 0.55,
    "swv": 0.41,
    "pdv": 0.0,
    "signal_modifiers_applied": true
  },
  "entities_created": ["Dataflow", "engineering manager"],
  "safety": {
    "crisis_detected": false
  }
}

KAPEX automatically extracts entities (Dataflow, engineering manager), classifies the life domain (work/career), and scores the memory for future retrieval.

Query Memories

Before calling your LLM, ask KAPEX for relevant context to inject into the system prompt.

curl -X POST https://api.getkapex.ai/api/v1/query \
  -H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_001",
    "current_input": "I am feeling overwhelmed at work lately"
  }'

Response:

{
  "context": "## Known Context\n- Jordan was recently promoted to engineering manager at Dataflow, leading a team of 12.\n- Jordan has mentioned feeling stressed about the transition to management.\n- Jordan values mentorship and has expressed interest in leadership development.",
  "node_count": 3,
  "confidence": "HIGH",
  "retrieval_channels": {
    "salience": 2,
    "recency": 1,
    "constraints": 0
  }
}

The context field is a pre-formatted text block ready to inject into any LLM's system prompt.

Inject Context into Your LLM

KAPEX is model-agnostic. Insert the returned context into your system prompt.

With Claude (Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

# context from KAPEX query response
kapex_context = query_response["context"]

response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system=f"You are a supportive assistant.\n\n{kapex_context}",
    messages=[
        {"role": "user", "content": "I am feeling overwhelmed at work lately"}
    ]
)
print(response.content[0].text)

With OpenAI

from openai import OpenAI

client = OpenAI()

kapex_context = query_response["context"]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"You are a supportive assistant.\n\n{kapex_context}"},
        {"role": "user", "content": "I am feeling overwhelmed at work lately"}
    ]
)
print(response.choices[0].message.content)

Test Scoring Without Storing

Use the /score endpoint to preview how KAPEX will score a piece of content, without storing it in the graph.

curl -X POST https://api.getkapex.ai/api/v1/score \
  -H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "I just got promoted to engineering manager",
    "include_signals": true
  }'

Response:

{
  "base_score": 0.72,
  "categories": ["career", "achievement"],
  "decay_modifier": 1.0,
  "signal_scores": {
    "sdv": 0.75,
    "cscv": 0.0,
    "lcs": 0.55,
    "swv": 0.60,
    "pdv": 0.0
  }
}

Check Your Usage

Monitor your trial consumption at any time.

curl https://api.getkapex.ai/api/v1/stats \
  -H "X-API-Key: a1b2c3d4e5f6g7h8i9j0"

Response:

{
  "tier": "trial",
  "trial": {
    "trial_ends": "2026-07-18T00:00:00Z",
    "days_remaining": 22
  },
  "users": {
    "total": 3,
    "limit": 25
  },
  "memory_nodes": {
    "total": 87,
    "avg_salience": 0.41,
    "high_salience": 12
  },
  "api_calls": {
    "today": 142,
    "this_month": 1205,
    "rpm_limit": 60,
    "daily_limit": 10000
  }
}

Full Integration Example

This is the recommended pattern for a production integration. KAPEX sits in the middleware layer: it reads from the graph before your LLM call (sync), then writes the new turn after (async).

import anthropic
import requests

KAPEX_KEY = "a1b2c3d4e5f6g7h8i9j0"  # Replace with your actual API key from signup
KAPEX_BASE = "https://api.getkapex.ai/api/v1"
KAPEX_HEADERS = {"X-API-Key": KAPEX_KEY, "Content-Type": "application/json"}

anthropic_client = anthropic.Anthropic()


def chat_with_memory(user_id: str, message: str) -> str:
    """Complete conversation turn with KAPEX memory."""

    # Step 1: Query KAPEX for relevant memories (sync -- user waits)
    query_resp = requests.post(
        f"{KAPEX_BASE}/query",
        headers=KAPEX_HEADERS,
        json={"user_id": user_id, "current_input": message}
    )
    query_resp.raise_for_status()
    memory_context = query_resp.json().get("context", "")

    # Step 2: Build system prompt with injected context
    system_prompt = f"""You are a helpful, empathetic assistant. Use the context below \
to personalize your response. Do not fabricate details -- only reference what is \
explicitly stated in the context.

{memory_context}"""

    # Step 3: Call your LLM
    response = anthropic_client.messages.create(
        model="claude-sonnet-4-6-20250514",
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": message}]
    )
    llm_output = response.content[0].text

    # Step 4: Store the turn in KAPEX (fire-and-forget — make async in production)
    import threading
    threading.Thread(
        target=requests.post,
        args=(f"{KAPEX_BASE}/ingest",),
        kwargs={"headers": KAPEX_HEADERS, "json": {"user_id": user_id, "content": message}},
        daemon=True,
    ).start()

    return llm_output


# Usage
reply = chat_with_memory("user_001", "I am feeling overwhelmed at work lately")
print(reply)

How it works

  1. Read -- KAPEX retrieves the highest-salience memories for this user and query.
  2. Inject -- The pre-formatted context block goes into the LLM's system prompt.
  3. Generate -- Your LLM produces a response grounded in real memory, not hallucination.
  4. Write -- The new conversation turn is stored in the memory graph for future retrieval.

Memories that matter stay salient. Memories that don't get discussed naturally decay over time, keeping the context window focused on what's relevant.

Next Steps

Support

Questions or issues? Contact support@sandstonecloud.com.