Project Updates
Progress log for the Epstein Files Archive.
Feb 10, 2026
Full Corpus Ingested & Knowledge Graph In Progress
All 11 DOJ data sets are now fully ingested—51,652 documents producing roughly 290,000 chunks. Chat search on the homepage is live and pulling results from across the corpus, though we're still verifying full index coverage.
Quality: We ran a quality audit across all chunks—86.6% scored good, and cross-document deduplication removed ~37,000 duplicate chunks that appeared across multiple data sets.
Entity extraction is running: We're now building a knowledge graph that maps people, organizations, locations, and dates across the entire corpus and links them to the documents where they appear. This will power graph-based search—find a person and see every document they're connected to, who they're linked with, and how. Not yet live, but in progress.
What people are searching for: We've had 1,895 queries so far. The most common themes:
- Flight logs & passenger lists—who flew where and when (109 queries)
- Trump connections—all mentions and context across the archive (96 queries)
- Indonesia connections—government officials, Deutsche Bank reports (83 queries)
- Politicians worldwide—Modi, Macron, Merkel, Erdogan, Clinton, Netanyahu, and dozens more (77 queries)
- Bank & financial records—Deutsche Bank, FirstBank Puerto Rico, Colonial Bank documents (51 queries)
- Country-specific searches—users from 30+ countries looking for mentions of their own nations (47 queries)
All of these topics have real document coverage in the archive. Flight logs return specific dates, passengers, and routes. Bank records include subpoena returns from multiple institutions. Country references span diplomatic cables, financial reports, and correspondence.
What's next: Entity resolution (merging duplicate references to the same person/org), graph-powered search, and country-specific filtering for international researchers.
Feb 7, 2026
Community Research Requests
We received a number of research requests from journalists and independent researchers. We worked to source the following as first-class searches in the archive:
- Jeffrey Epstein — core document connections and timeline
- Donald Trump — all mentions, context, and associated documents
- Mossad — intelligence references across the archive
- Ehud Barak — former Israeli Prime Minister, documented connections
- Angela Merkel — references in estate and correspondence documents
- Indonesian government connections — officials, Deutsche Bank reports, and related records
We are actively ingesting all 12 DOJ data sets (82,000+ files) and building dedicated search profiles for each of these topics. If you have a research request, reach out via the newsletter form on the homepage.
Feb 7, 2026
AI-powered search is live
You can now ask the archive anything and get citation-backed answers drawn directly from indexed documents. The chat widget on the homepage is wired to a live API—no waitlist, no signup required.
What's working today:
We've built a knowledge graph of 6,100+ entities—people, organizations, locations, and dates—extracted and linked across every document in the archive. You can search entities by name and see exactly where they appear.
Document browsing is live too. 20 documents from DOJ Data Set 4 are indexed and searchable, with metadata and presigned S3 download links for the originals.
What's next: Ingesting the remaining 11 DOJ data sets, improving entity resolution for edge cases (date and location disambiguation), and adding country-specific filtering for international researchers.
Feb 2026
DOJ dump + country search
We're processing the full DOJ release—3.5 million pages, 12 data sets. FBI records, estate documents, travel logs. Search will go live once ingestion wraps up.
Noticed we're getting a lot of traffic from Indonesian news outlets. Journalists there are trying to figure out why Indonesian officials show up in the files. Short answer: most mentions are in Deutsche Bank reports and political analysis docs, not direct Epstein correspondence. But it's hard to filter for that right now.
So we're adding country-specific search. You'll be able to pull all docs mentioning a country and see context for each hit—whether it's a bank report, a flight log, or actual correspondence. Should help researchers cut through the noise faster.
Dec 2025
Graph and search coming online
We ran GraphRAG on the first House Oversight drop and are loading entities and relationships into Mongo. This lets us enrich every document with who/what/where links so we can search by people, orgs, places, and expose relationships the knowledge base reveals (defense teams, correspondents, travel legs, filings). Atlas Search is live on the initial chunks with highlights, so you can already run full-text queries while we grow the entity graph behind the scenes.
Nov 2025
Infra + subscriptions
We stood up the static stack (CloudFront + S3 + TLS via CDK) and wired newsletter capture with PostHog analytics. The goal is to keep the site fast, secure, and transparent—sharing progress as we ingest estate releases, oversight docs, CBP/FBI records, run OCR on scans/handwriting, and add planned location/face inference for images and video.