Arkive
An enterprise-ready RAG knowledge base that lets you upload documents and ask natural language questions — getting accurate, cited answers grounded in your actual files.

Overview
What is Arkive?
Arkive is an enterprise-ready knowledge base powered by Retrieval-Augmented Generation (RAG). Upload any PDF, DOCX, or TXT file and ask questions in plain English — Arkive retrieves the most relevant passages and uses Claude to synthesize a clear, cited answer.
Every answer includes source cards showing exactly which document and passage the information came from. A document preview lets you click into the original file with relevant passages highlighted — built for enterprise clients who need to trust and verify AI output. A filter dropdown lets users scope queries to a specific document when multiple files are loaded.
Arkive validates every upload — catching password-protected PDFs, corrupted files, empty documents, and oversized files with clear error messages before they hit the pipeline.
Every query is logged to Langfuse with full observability — latency, token usage, retrieved sources, and Claude's response — giving enterprise clients the audit trail they need to trust the system.
The Problem
Companies spend hours manually searching through documents for specific information. A 50-page policy handbook, a contract, a technical spec — finding one answer means reading the whole thing. Existing AI tools either hallucinate or can't point you to where the answer came from, which makes them unusable in professional settings where accuracy is non-negotiable.
The Solution
Upload the document once, ask anything in plain English. Arkive retrieves the exact relevant passages using semantic vector search, sends them to Claude as grounded context, and returns a cited answer — with source cards, a document preview, highlighted passages, and a full query log so teams can verify every claim and audit every interaction.
What I Learned
Chunking strategy is everything in RAG. Character-based splitting destroys table structure — a grade breakdown becomes meaningless fragments. Real documents need smarter extraction that understands the difference between paragraphs and tables.
Semantic search alone is not enough. For small documents, retrieving all chunks and letting Claude reason over the full context consistently outperformed retrieval tuning. Knowing when to search vs. when to just send everything is a real design decision.
Observability is not optional in enterprise AI. Adding Langfuse logging transformed the system from a black box into an auditable pipeline — every query, every retrieved chunk, every token. That is what makes clients trust it.
Next Project
NexaDesk