AI · Enterprise ToolsLive

Arkive

An enterprise-ready RAG knowledge base that lets you upload documents and ask natural language questions — getting accurate, cited answers grounded in your actual files.

Year2026
RoleSolo — design & engineering
TypePortfolio Project
Arkive App Screenshot
RAG
Retrieval-Augmented Generation — answers grounded in your documents, not hallucinated
3
File types supported — PDF, DOCX, and TXT including tables and structured data
Every answer cited with the exact source document and passage it came from
Full observability via Langfuse — every query logged with latency, tokens, and sources

Overview

What is Arkive?

Arkive is an enterprise-ready knowledge base powered by Retrieval-Augmented Generation (RAG). Upload any PDF, DOCX, or TXT file and ask questions in plain English — Arkive retrieves the most relevant passages and uses Claude to synthesize a clear, cited answer.

Every answer includes source cards showing exactly which document and passage the information came from. A document preview lets you click into the original file with relevant passages highlighted — built for enterprise clients who need to trust and verify AI output. A filter dropdown lets users scope queries to a specific document when multiple files are loaded.

Arkive validates every upload — catching password-protected PDFs, corrupted files, empty documents, and oversized files with clear error messages before they hit the pipeline.

Every query is logged to Langfuse with full observability — latency, token usage, retrieved sources, and Claude's response — giving enterprise clients the audit trail they need to trust the system.

The Problem

Companies spend hours manually searching through documents for specific information. A 50-page policy handbook, a contract, a technical spec — finding one answer means reading the whole thing. Existing AI tools either hallucinate or can't point you to where the answer came from, which makes them unusable in professional settings where accuracy is non-negotiable.

The Solution

Upload the document once, ask anything in plain English. Arkive retrieves the exact relevant passages using semantic vector search, sends them to Claude as grounded context, and returns a cited answer — with source cards, a document preview, highlighted passages, and a full query log so teams can verify every claim and audit every interaction.

What I Learned

01

Chunking strategy is everything in RAG. Character-based splitting destroys table structure — a grade breakdown becomes meaningless fragments. Real documents need smarter extraction that understands the difference between paragraphs and tables.

02

Semantic search alone is not enough. For small documents, retrieving all chunks and letting Claude reason over the full context consistently outperformed retrieval tuning. Knowing when to search vs. when to just send everything is a real design decision.

03

Observability is not optional in enterprise AI. Adding Langfuse logging transformed the system from a black box into an auditable pipeline — every query, every retrieved chunk, every token. That is what makes clients trust it.

Try Arkive Live →View source on GitHub

Next Project

NexaDesk