Ingestion
Moneypenny supports ingesting documents from local files, URLs, and external runtime event logs from OpenClaw, Cortex Code CLI, and Claude Code. Ingested content feeds the knowledge base and enriches agent context.
Because Moneypenny exposes its full capability set over MCP, you can perform most ingestion tasks by asking your MCP-connected agent in plain English — no CLI required.
Local Files
Ingest a single file or an entire directory:
mp ingest path/to/document.mdmp ingest path/to/docs/Ask your MCP-connected agent:
Ingest the document at path/to/document.md into the knowledge base.
Or for a directory:
Ingest all files in the path/to/docs/ directory.
Markdown files are chunked section-aware (splitting on headings). Plain text and other formats are chunked by size.
URL Ingestion
Fetch and ingest from a web page:
mp ingest --url "https://example.com/docs/architecture"Ask your MCP-connected agent:
Ingest the page at https://example.com/docs/architecture
The page content is extracted, chunked, and indexed just like a local file.
What Happens Under the Hood
- Content is read (local) or fetched (URL)
- Parsed and split into chunks (section-aware for markdown)
- Each chunk is inserted into the
chunkstable with FTS5 indexing - A
documentsrecord tracks metadata (title, source, chunk count) - If embeddings are configured, chunks are vectorized for KNN search
All of this runs through the canonical operation pipeline — policy-checked and audited.
External Event Ingestion
Moneypenny can ingest runtime event logs from external agent runtimes. Events are stored raw for replay and projected into native tables (messages, tool calls, policy audit). Content-hash deduplication makes re-imports safe.
OpenClaw
mp ingest --openclaw-file events.jsonlCortex Code CLI
Auto-discover and ingest all Cortex Code conversation history:
mp ingest --cortexDiscovers sessions from ~/.snowflake/cortex/conversations/, converts native
JSON format to Moneypenny events, and ingests. Thinking blocks and system
reminders are filtered out. Re-run anytime — duplicates are skipped.
Claude Code
Auto-discover and ingest all Claude Code conversation history:
mp ingest --claude-codeOr scope to a specific project:
mp ingest --claude-code=my-project-slugDiscovers sessions from ~/.claude/projects/, extracts messages, tool call
results (as run.attempt events), and usage statistics. System reminders and
thinking blocks are filtered out.
Replay and Forensics
Every ingest run is tracked. You can replay previous runs:
# View recent ingest runsmp ingest --status
# Replay a specific runmp ingest --replay-run <run-id>
# Replay the latest run matching filtersmp ingest --replay-latest --source openclaw
# Dry-run replay (preview without writing)mp ingest --replay-latest --dry-run
# Apply replay writesmp ingest --replay-latest --applyAsk your MCP-connected agent:
Show me recent ingest runs
Replay the latest ingest run
Replay the latest OpenClaw ingest run as a dry run
Filtering Ingest History
# Filter by statusmp ingest --status --status-filter completed
# Filter by file namemp ingest --status --file-filter "events"
# Control output limitmp ingest --status --limit 50Via Canonical Operations
Ingestion is also available through the sidecar operation interface:
echo '{"op":"knowledge.ingest","args":{"path":"docs/architecture.md"}}' | mp sidecarFor external events:
echo '{"op":"ingest.events","args":{"source":"openclaw","file":"events.jsonl"}}' | mp sidecarecho '{"op":"ingest.status","args":{"source":"openclaw","limit":10}}' | mp sidecarecho '{"op":"ingest.status","args":{"source":"cortex","limit":10}}' | mp sidecarecho '{"op":"ingest.status","args":{"source":"claude-code","limit":10}}' | mp sidecarecho '{"op":"ingest.replay","args":{"run_id":"...","dry_run":true}}' | mp sidecarIdempotent Ingestion
Events are deduplicated by event ID or content hash. Re-ingesting the same file or replaying the same run will not create duplicate records.
Listing Ingested Content
mp knowledge listAsk your MCP-connected agent:
What documents are in my knowledge base?
Shows each document with its title, source path, and chunk count.