Getting Started with KAPEX
KAPEX is an LLM memory middleware API. It intercepts your LLM's inputs and outputs, builds a salience-scored memory graph for each user, and injects the most relevant context into every new query. Your LLM remembers what matters -- without you managing any state.
Sign Up for a Trial
Every KAPEX integration starts with a free trial. You get 25 users, all features, and 30 days to evaluate.
curl -X POST https://api.getkapex.ai/api/v1/trial/signup \
-H "Content-Type: application/json" \
-d '{
"company_name": "Acme Health",
"email": "dev@acmehealth.com",
"name": "Jordan Park"
}'
import requests
resp = requests.post(
"https://api.getkapex.ai/api/v1/trial/signup",
json={
"company_name": "Acme Health",
"email": "dev@acmehealth.com",
"name": "Jordan Park"
}
)
data = resp.json()
print(data["api_key"]) # save this, it won't be shown again
Response:
{
"status": "trial_active",
"tenant_id": "tn_acme_health_01",
"api_key": "a1b2c3d4e5f6g7h8i9j0",
"trial_ends": "2026-07-18T00:00:00Z",
"limits": {
"max_users": 25,
"max_nodes_per_user": 5000,
"rate_limit_rpm": 60,
"rate_limit_daily": 10000,
"features": "all"
},
"quickstart": "https://docs.getkapex.ai/",
"base_url": "https://api.getkapex.ai/api/v1"
}
Save your api_key immediately. It is only returned once.
Store a Memory
After each conversation turn, send the user's message to KAPEX so it can build the memory graph.
curl -X POST https://api.getkapex.ai/api/v1/ingest \
-H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
-H "Content-Type: application/json" \
-d '{
"user_id": "user_001",
"content": "I just got promoted to engineering manager at Dataflow. My team is 12 people now."
}'
Response:
{
"node_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"scoring": {
"base_score": 0.48,
"sdv": 0.62,
"cscv": 0.0,
"lcs": 0.55,
"swv": 0.41,
"pdv": 0.0,
"signal_modifiers_applied": true
},
"entities_created": ["Dataflow", "engineering manager"],
"safety": {
"crisis_detected": false
}
}
KAPEX automatically extracts entities (Dataflow, engineering manager), classifies the life domain (work/career), and scores the memory for future retrieval.
Query Memories
Before calling your LLM, ask KAPEX for relevant context to inject into the system prompt.
curl -X POST https://api.getkapex.ai/api/v1/query \
-H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
-H "Content-Type: application/json" \
-d '{
"user_id": "user_001",
"current_input": "I am feeling overwhelmed at work lately"
}'
Response:
{
"context": "## Known Context\n- Jordan was recently promoted to engineering manager at Dataflow, leading a team of 12.\n- Jordan has mentioned feeling stressed about the transition to management.\n- Jordan values mentorship and has expressed interest in leadership development.",
"node_count": 3,
"confidence": "HIGH",
"retrieval_channels": {
"salience": 2,
"recency": 1,
"constraints": 0
}
}
The context field is a pre-formatted text block ready to inject into any LLM's system prompt.
Inject Context into Your LLM
KAPEX is model-agnostic. Insert the returned context into your system prompt.
With Claude (Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
# context from KAPEX query response
kapex_context = query_response["context"]
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=1024,
system=f"You are a supportive assistant.\n\n{kapex_context}",
messages=[
{"role": "user", "content": "I am feeling overwhelmed at work lately"}
]
)
print(response.content[0].text)
With OpenAI
from openai import OpenAI
client = OpenAI()
kapex_context = query_response["context"]
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are a supportive assistant.\n\n{kapex_context}"},
{"role": "user", "content": "I am feeling overwhelmed at work lately"}
]
)
print(response.choices[0].message.content)
Test Scoring Without Storing
Use the /score endpoint to preview how KAPEX will score a piece of content, without storing it in the graph.
curl -X POST https://api.getkapex.ai/api/v1/score \
-H "X-API-Key: a1b2c3d4e5f6g7h8i9j0" \
-H "Content-Type: application/json" \
-d '{
"content": "I just got promoted to engineering manager",
"include_signals": true
}'
Response:
{
"base_score": 0.72,
"categories": ["career", "achievement"],
"decay_modifier": 1.0,
"signal_scores": {
"sdv": 0.75,
"cscv": 0.0,
"lcs": 0.55,
"swv": 0.60,
"pdv": 0.0
}
}
Check Your Usage
Monitor your trial consumption at any time.
curl https://api.getkapex.ai/api/v1/stats \
-H "X-API-Key: a1b2c3d4e5f6g7h8i9j0"
Response:
{
"tier": "trial",
"trial": {
"trial_ends": "2026-07-18T00:00:00Z",
"days_remaining": 22
},
"users": {
"total": 3,
"limit": 25
},
"memory_nodes": {
"total": 87,
"avg_salience": 0.41,
"high_salience": 12
},
"api_calls": {
"today": 142,
"this_month": 1205,
"rpm_limit": 60,
"daily_limit": 10000
}
}
Full Integration Example
This is the recommended pattern for a production integration. KAPEX sits in the middleware layer: it reads from the graph before your LLM call (sync), then writes the new turn after (async).
import anthropic
import requests
KAPEX_KEY = "a1b2c3d4e5f6g7h8i9j0" # Replace with your actual API key from signup
KAPEX_BASE = "https://api.getkapex.ai/api/v1"
KAPEX_HEADERS = {"X-API-Key": KAPEX_KEY, "Content-Type": "application/json"}
anthropic_client = anthropic.Anthropic()
def chat_with_memory(user_id: str, message: str) -> str:
"""Complete conversation turn with KAPEX memory."""
# Step 1: Query KAPEX for relevant memories (sync -- user waits)
query_resp = requests.post(
f"{KAPEX_BASE}/query",
headers=KAPEX_HEADERS,
json={"user_id": user_id, "current_input": message}
)
query_resp.raise_for_status()
memory_context = query_resp.json().get("context", "")
# Step 2: Build system prompt with injected context
system_prompt = f"""You are a helpful, empathetic assistant. Use the context below \
to personalize your response. Do not fabricate details -- only reference what is \
explicitly stated in the context.
{memory_context}"""
# Step 3: Call your LLM
response = anthropic_client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": message}]
)
llm_output = response.content[0].text
# Step 4: Store the turn in KAPEX (fire-and-forget — make async in production)
import threading
threading.Thread(
target=requests.post,
args=(f"{KAPEX_BASE}/ingest",),
kwargs={"headers": KAPEX_HEADERS, "json": {"user_id": user_id, "content": message}},
daemon=True,
).start()
return llm_output
# Usage
reply = chat_with_memory("user_001", "I am feeling overwhelmed at work lately")
print(reply)
How it works
- Read -- KAPEX retrieves the highest-salience memories for this user and query.
- Inject -- The pre-formatted context block goes into the LLM's system prompt.
- Generate -- Your LLM produces a response grounded in real memory, not hallucination.
- Write -- The new conversation turn is stored in the memory graph for future retrieval.
Memories that matter stay salient. Memories that don't get discussed naturally decay over time, keeping the context window focused on what's relevant.
Next Steps
- API Reference -- full endpoint documentation with request/response schemas
- Configuration Presets -- tune memory behavior with named presets
- MCP Server -- connect KAPEX as an MCP tool for Claude Desktop or Claude Code
- Authentication -- API key management, rate limits, and security
Support
Questions or issues? Contact support@sandstonecloud.com.