Ingestion

Moneypenny supports ingesting documents from local files, URLs, and external runtime event logs from OpenClaw, Cortex Code CLI, and Claude Code. Ingested content feeds the knowledge base and enriches agent context.

Because Moneypenny exposes its full capability set over MCP, you can perform most ingestion tasks by asking your MCP-connected agent in plain English — no CLI required.

Local Files

Ingest a single file or an entire directory:

CLI
Natural Language

mp ingest path/to/document.md
mp ingest path/to/docs/

Markdown files are chunked section-aware (splitting on headings). Plain text and other formats are chunked by size.

URL Ingestion

Fetch and ingest from a web page:

CLI
Natural Language

mp ingest --url "https://example.com/docs/architecture"

The page content is extracted, chunked, and indexed just like a local file.

What Happens Under the Hood

Content is read (local) or fetched (URL)
Parsed and split into chunks (section-aware for markdown)
Each chunk is inserted into the chunks table with FTS5 indexing
A documents record tracks metadata (title, source, chunk count)
If embeddings are configured, chunks are vectorized for KNN search

All of this runs through the canonical operation pipeline — policy-checked and audited.

External Event Ingestion

Moneypenny can ingest runtime event logs from external agent runtimes. Events are stored raw for replay and projected into native tables (messages, tool calls, policy audit). Content-hash deduplication makes re-imports safe.

OpenClaw

mp ingest --openclaw-file events.jsonl

Cortex Code CLI

Auto-discover and ingest all Cortex Code conversation history:

mp ingest --cortex

Discovers sessions from ~/.snowflake/cortex/conversations/, converts native JSON format to Moneypenny events, and ingests. Thinking blocks and system reminders are filtered out. Re-run anytime — duplicates are skipped.

Claude Code

Auto-discover and ingest all Claude Code conversation history:

mp ingest --claude-code

Or scope to a specific project:

mp ingest --claude-code=my-project-slug

Discovers sessions from ~/.claude/projects/, extracts messages, tool call results (as run.attempt events), and usage statistics. System reminders and thinking blocks are filtered out.

Replay and Forensics

Every ingest run is tracked. You can replay previous runs:

CLI
Natural Language

# View recent ingest runs
mp ingest --status

# Replay a specific run
mp ingest --replay-run <run-id>

# Replay the latest run matching filters
mp ingest --replay-latest --source openclaw

# Dry-run replay (preview without writing)
mp ingest --replay-latest --dry-run

# Apply replay writes
mp ingest --replay-latest --apply

Filtering Ingest History

# Filter by status
mp ingest --status --status-filter completed

# Filter by file name
mp ingest --status --file-filter "events"

# Control output limit
mp ingest --status --limit 50

Via Canonical Operations

Ingestion is also available through the sidecar operation interface:

echo '{"op":"knowledge.ingest","args":{"path":"docs/architecture.md"}}' | mp sidecar

For external events:

echo '{"op":"ingest.events","args":{"source":"openclaw","file":"events.jsonl"}}' | mp sidecar
echo '{"op":"ingest.status","args":{"source":"openclaw","limit":10}}' | mp sidecar
echo '{"op":"ingest.status","args":{"source":"cortex","limit":10}}' | mp sidecar
echo '{"op":"ingest.status","args":{"source":"claude-code","limit":10}}' | mp sidecar
echo '{"op":"ingest.replay","args":{"run_id":"...","dry_run":true}}' | mp sidecar

Idempotent Ingestion

Events are deduplicated by event ID or content hash. Re-ingesting the same file or replaying the same run will not create duplicate records.

mp knowledge list

Shows each document with its title, source path, and chunk count.