Arkive
An enterprise-ready RAG knowledge base that lets you upload documents and ask natural language questions — getting accurate, cited answers grounded in your actual files.

Overview
What is Arkive?
Arkive is an enterprise-ready knowledge base powered by Retrieval-Augmented Generation (RAG). Upload any PDF, DOCX, or TXT file and ask questions about it in plain English — Arkive retrieves the most relevant passages and uses Claude to synthesize a clear, cited answer.
Every answer includes source cards showing exactly which document and passage the information came from. A document preview lets you click into the original file with relevant passages highlighted in context — built specifically for enterprise clients who need to trust and verify AI output.
The Problem
Companies spend hours manually searching through documents for specific information. A 50-page policy handbook, a contract, a technical spec — finding one answer means reading the whole thing. Existing AI tools either hallucinate or can't point you to where the answer came from, which makes them unusable in professional settings.
The Solution
Upload the document once, ask anything in plain English. Arkive retrieves the exact relevant passages using semantic vector search, sends them to Claude as grounded context, and returns a cited answer — with source cards and a document preview so users can verify every claim directly in the original file.
What I Learned
Chunking strategy is everything in RAG. Character-based splitting destroys table structure — a grade breakdown becomes meaningless fragments. Real documents need smarter extraction that understands the difference between paragraphs and tables.
Semantic search alone is not enough. For small documents, retrieving all chunks and letting Claude reason over the full context consistently outperformed retrieval tuning. Knowing when to search vs. when to just send everything is a real design decision.
The gap between local and production exposed things I wouldn't have caught otherwise — Vite environment variables are baked in at build time, not runtime, and CORS behaves differently across services. Deploying early is part of building correctly.
Next Project
Cadence