Open Source

ai-memory

Persistent memory for any AI — zero token cost until recalled, 79% smaller responses with TOON format

View Project →

ai-memory gives any AI assistant persistent memory across sessions. Unlike built-in memory systems (Claude auto-memory, ChatGPT memory) that load your entire memory into every conversation — burning tokens and money on every message — ai-memory uses zero context tokens until recalled. Only relevant memories come back, ranked by a 6-factor scoring algorithm.

A single Rust binary with three interfaces (MCP, HTTP, CLI), four feature tiers from zero-dependency keyword search to autonomous recall with local LLMs via Ollama, and TOON format (Token-Oriented Object Notation) that cuts response tokens by 79%. Works with Claude, ChatGPT, Grok, Llama, Cursor, Windsurf, Continue.dev, and any MCP-compatible platform.

The MCP server provides recall-first prompts that teach AI clients to use memory proactively — recalling at session start, storing corrections as permanent knowledge, and using TOON compact format automatically. 158 tests across 14/14 modules with 95%+ coverage.

79% Token Savings (TOON)

17 MCP Tools

158 Tests (95%+ Coverage)

4 Feature Tiers

Why Zero Token Cost Matters

Every AI platform that offers built-in memory charges you tokens for it — whether you use it or not. ai-memory eliminates that tax entirely.

Zero Cost Until Recall

Built-in systems inject 200+ lines of memory into every conversation's system prompt. ai-memory uses zero context tokens until the AI explicitly calls memory_recall. You only pay for what you use.

79%

TOON Compact Format

When memories are recalled, TOON (Token-Oriented Object Notation) declares field names once as a header, then lists values as pipe-delimited rows. 3 memories: JSON = 1,600 bytes, TOON compact = 336 bytes.

6-Factor Ranked Recall

FTS relevance, priority weight, access frequency, confidence, tier boost, and recency decay. Only the most relevant memories surface — no wasted tokens on irrelevant context.

⚙

MCP Prompts

Two MCP prompts (recall-first + memory-workflow) teach AI clients to use memory proactively at connection time. No manual configuration needed — the behavior is built in.

4 Feature Tiers

Scale from zero-dependency keyword search to autonomous memory management with local LLMs. Each tier builds on the one below it.

🔍

Keyword

SQLite FTS5 full-text search. Zero ML dependencies, zero memory overhead. 13 MCP tools. The binary is entirely self-contained.

🧠

Semantic

Adds dense vector embeddings (all-MiniLM-L6-v2, 384-dim) with HNSW index. Hybrid recall blends FTS5 + cosine similarity. 14 MCP tools. ~256 MB RAM.

⚡

Smart

Upgrades to nomic-embed-text (768-dim) via Ollama. Adds Gemma 4 E2B LLM for query expansion, auto-tagging, and contradiction detection. 17 MCP tools. ~1 GB RAM.

🎯

Autonomous

Gemma 4 E4B + neural cross-encoder reranker (ms-marco-MiniLM). Full autonomous memory reflection and contradiction resolution. 17 MCP tools. ~4 GB RAM.

Works With Any AI Platform

Claude Code MCP native — replaces auto-memory with zero-token recall. Disable autoMemoryEnabled in settings.json

OpenAI Codex CLI TOML-based MCP config. Works with codex mcp add CLI shortcut

Google Gemini CLI JSON-based MCP config. Supports gemini mcp add CLI shortcut

Cursor, Windsurf, Continue.dev MCP native via platform-specific config files. Same command and args

xAI Grok, META Llama HTTP API on localhost:9077 — works with any platform that can make HTTP requests

Quality 158 tests (115 unit + 43 integration), 14/14 modules, 95%+ coverage. Red-team validated at smart and autonomous tiers

Explore ai-memory → View on GitHub

Open Source

OpenClaw Graph

Graph-native AI skill database for the OpenClaw community — powered by Neo4j

View Project →

OpenClaw Graph is AlphaOne's contribution to the open source OpenClaw project — a graph database layer that fundamentally transforms how AI agents discover, relate, and invoke skills. By replacing flat skill registries with a graph-native architecture, agents gain multi-hop reasoning across skill relationships, cluster-aware discovery, and dramatically reduced token consumption.

v1.5 introduces a Rust event-driven sync daemon that materializes Neo4j graph data into OpenClaw workspace files with FSEvents file watching for sub-second write-back. The Rust binary replaces a Python polling script — eliminating interpreter startup overhead for a 111x per-cycle speedup and 12x memory reduction. Persistent Bolt connection, 3.8 MB binary, ~4 MB RSS.

v1.5 also ships a self-contained installer — seed.py reads all 315 skills directly from the repo's SKILL.md files, requiring zero external data files. One command: ./install.sh seeds Neo4j, deploys workspace stubs, and optionally builds the Rust daemon.

The entire skill graph runs on Neo4j with proper graph modeling — SkillCluster nodes, typed relationships, namespaced labels for multi-graph coexistence, and workspace-scoped isolation. ~10 MB footprint with sub-millisecond query times.

111x Faster Sync (Rust)

315 Skills Mapped

27 Skill Clusters

~4MB Daemon RSS

How AI is Enhanced with Graph Database Capabilities

Traditional AI skill registries are flat lists — the agent must scan everything to find what it needs, consuming tokens proportional to the registry size. Graph databases change the fundamental economics. With Neo4j integration, skills become first-class graph citizens alongside other knowledge domains.

Relationship-Aware Discovery

Skills are connected through typed edges — RELATED_TO for dependencies, IN_CLUSTER for semantic grouping. Neo4j's native graph storage means traversals are index-free adjacency lookups — constant time regardless of graph size.

Cluster-Based Context Assembly

27 SkillCluster nodes group related skills into semantic domains. Each cluster is a proper graph node with IN_CLUSTER edges — agents retrieve coherent, pre-organized context windows instead of arbitrary skill fragments.

Multi-Graph Coexistence

Namespaced labels (OCAgent, OCMemory, OCTool) let the skill graph coexist with other graph domains in a shared Neo4j instance. One database, multiple knowledge domains, workspace-level isolation.

Scoped Retrieval

Cypher queries naturally enforce scope boundaries via workspace properties. Agents retrieve only the subgraph relevant to their identity — no over-fetching, no token waste. Context assembly, not context accumulation.

TOON Optimization & Token Cost Reduction

The 229% token cost reduction comes from multiple optimization layers working together — what we call the TOON (Token-Optimized Operation Network) approach.

🎯

Targeted Retrieval

Graph queries return precisely scoped subgraphs instead of bulk skill dumps. Agents receive only the context they need for the current task — eliminating the token overhead of transmitting entire registries.

🗜️

Structural Compression

Graph edges encode relationships that would otherwise require verbose natural language descriptions. A single edge replaces paragraphs of context about how skills relate — compressing the context window without losing information.

🔄

Cached Traversals

Frequently accessed subgraphs are pre-materialized. Common agent workflows hit warm caches instead of executing fresh graph queries — reducing both latency and the token cost of re-computing context.

📉

Progressive Disclosure

Instead of loading all skill details upfront, the graph serves summaries first and full definitions on demand. Agents decide what to expand based on initial traversal — paying token costs only for what they actually use.

Rust Event-Driven Sync Daemon

The workspace materializer is a long-running Rust binary (neo4j-sync) that bridges Neo4j and OpenClaw flat files. It maintains a persistent Bolt connection and uses macOS FSEvents for instant write-back — replacing a Python polling script that spent 97% of each invocation on interpreter startup.

⚙

Persistent Connection

A single Neo4j Bolt connection stays open for the lifetime of the daemon. No per-cycle startup cost, no connection churn — 60 new connections/hour reduced to 1.

⚡

Event-Driven Write-Back

FSEvents file watching with 500ms debounce detects IDENTITY.md changes and syncs to Neo4j in under a second. Agents see identity updates reflected instantly instead of waiting up to 60 seconds.

📦

3.8 MB Binary

Compiled with LTO and size optimization. The entire daemon — Neo4j driver, file watcher, async runtime, regex engine — fits in a 3.8 MB arm64 binary with ~4 MB RSS at runtime.

🛠

Zero-Downtime Migration

Drop-in replacement for the Python script. Same launchd service name, same log paths, same workspace files. Atomic swap: unload Python, load Rust — under 1 second of downtime.

Self-Contained v1.5 Installer

One command installs everything. seed.py reads skills directly from the repo's 315 SKILL.md files and 217 relationship pairs — no external data files, no migration scripts, no manual steps.

📦

Zero External Dependencies

Skills are parsed from YAML frontmatter in the repo's skills/*/SKILL.md files. Relationships loaded from seed-data/skill_rels.json. Workspace defaults embedded in Python. Only Neo4j + Python neo4j driver required.

⚡

Sub-Second Seeding

315 skills, 27 clusters, 217 relationships, and 50 workspace nodes seeded in under 1 second. UNWIND batches of 50, MERGE for full idempotency — safe to re-run anytime.

🛠

Multi-Workspace Support

./install.sh --workspace myagent creates isolated workspace nodes with auto-deployed stubs. Run multiple agents on a single Neo4j instance with workspace-scoped isolation.

✅

Built-In Verification

seed.py --verify validates all node counts, relationship edges, and workspace stub queries. --dry-run previews every operation before touching Neo4j.

Performance Impact

Rust Sync Speed 111x faster per-cycle — 5ms (Rust) vs 555ms (Python). 97% of Python's wall time was interpreter startup, now eliminated entirely

Write-Back Latency 120x faster — sub-500ms event-driven write-back (FSEvents) replaces 60-second polling. Agent identity changes hit Neo4j near-instantly

Memory Footprint 12x smaller — ~4 MB RSS (Rust daemon) vs ~50 MB (Python interpreter + neo4j driver). 3.8 MB arm64 binary with LTO optimization

Token Cost 229% reduction through TOON optimization — targeted retrieval, structural compression, cached traversals

Neo4j Performance Sub-millisecond skill lookups, ~2ms full scans across 315 nodes, ~10 MB total footprint — installs on macOS, Ubuntu, and Fedora

Scalability Sub-millisecond query times across 315 skills, 27 clusters, 217 typed relationships — cost stays flat as the graph grows

Explore OpenClaw Graph → View on GitHub

Open Source Contributions

ai-memory

Why Zero Token Cost Matters

Zero Cost Until Recall

TOON Compact Format

6-Factor Ranked Recall

MCP Prompts

4 Feature Tiers

Keyword

Semantic

Smart

Autonomous

Works With Any AI Platform

OpenClaw Graph

How AI is Enhanced with Graph Database Capabilities

Relationship-Aware Discovery

Cluster-Based Context Assembly

Multi-Graph Coexistence

Scoped Retrieval

TOON Optimization & Token Cost Reduction

Targeted Retrieval

Structural Compression

Cached Traversals

Progressive Disclosure

Rust Event-Driven Sync Daemon

Persistent Connection

Event-Driven Write-Back

3.8 MB Binary

Zero-Downtime Migration

Self-Contained v1.5 Installer

Zero External Dependencies

Sub-Second Seeding

Multi-Workspace Support

Built-In Verification

Performance Impact

Interested in Our Open Source Work?