Documentation
THEIA OS is a compound AI system designed to decouple scientific discovery from human constraints. It operates as an unsupervised operating system for autonomous research.
System Overview
THEIA OS (Transformative Hypothesis Exploration & Intelligence Architecture) fundamentally reimagines the scientific method as a computational loop. Unlike traditional assistants that aid humans, THEIA acts as the primary agent, taking high-level research goals and executing the entire discovery lifecycle.
Core Capabilities
- Autonomous Ingestion: Concurrently reads from OpenAlex, arXiv, bioRxiv, and Semantic Scholar.
- Deep Screening: Evaluates papers for reproducibility and quantitative claims using cascaded LLMs.
- Agentic Tree Search: Explores hypothesis spaces using UCB algorithms instead of linear iteration.
- Self-Correction: Detects execution failures and autonomously debugs experimental code.
Stateful by Default
Time-travel debugging via PostgreSQL checkpointing of the entire graph state.
Tree Search
Non-linear hypothesis exploration pruning less promising branches early.
Cost-Aware
Real-time token tracking and budget enforcement per research pipeline.
Theia (Θεία)
In Greek mythology, Theia is the Titaness of sight (thea) and the shining ether of the bright blue sky. Mother of Helios (the Sun), Selene (the Moon), and Eos (the Dawn), she endowed gold and silver with their brilliance.
"THEIA OS embodies this ancient vision—the capacity to see beyond the known horizon of scientific literature and illuminate high-value hypotheses in the vast darkness of the search space."
Quick Start
1. Installation
git clone https://github.com/to-be-added
cd theia
# Install the package
pip install -e .
2. Launch & Onboard
theia
On first launch, THEIA runs a wizard to configure API keys (Anthropic, OpenAI, etc.) and model preferences.
Architecture
THEIA OS follows a Compound AI System architecture, orchestrating specialized agents through a LangGraph state machine with durable PostgreSQL checkpointing.
| Component | Technology | Purpose |
|---|---|---|
| Orchestration | LangGraph | Stateful graph execution & human-in-the-loop gates |
| Knowledge Graph | Neo4j | Semantic reasoning (Paper → Cites → Method) |
| Vector Store | ChromaDB | Episodic memory & semantic search |
| Runtime | Docker | Sandboxed code reproduction & experiments |
Interactive Research Planner
Before executing a pipeline, THEIA guides users through an interactive planning session that builds a structured research plan. The planner uses phased questioning, a deterministic readiness score, and a refinement loop to ensure plan quality.
Phase 1: Core
Two mandatory seed questions (query, domain) plus LLM-generated follow-ups for hypothesis.
Phase 2: Methodology
LLM decides whether to ask about methodology, metrics, or baselines — or skip if already inferred.
Phase 3: Resources
Optional questions about novelty claims, datasets, and key papers to strengthen the plan.
Readiness Score
A deterministic score (0–100) computed from field weights. Fields confirmed by direct user answers receive full weight; fields auto-inferred from other answers receive half weight. The pipeline requires a minimum score of 40 to run.
Refinement Loop
After all phases complete, users enter a refinement loop with five options:
Run pipeline Save and execute (requires score ≥ 40) Edit a field Pick a field by number and type a new value directly Refine with AI Describe changes in plain English; LLM updates fields Ask more questions Re-enter the phased question flow Core Pipeline
1. Ingest
Searches OpenAlex, arXiv, bioRxiv, and Semantic Scholar using enriched multi-query search. The primary query is augmented with domain, methodology, baseline, and dataset terms from the research plan. Additional focused queries (methodology-angle, baseline-angle, dataset-angle) run in parallel for broader recall. Results are deduplicated by paper ID.
2. Screen
Each paper is scored 0.0–1.0 by an LLM using a plan-aware relevance prompt that includes the full research context: query, domain, hypothesis, methodology, metrics, baselines, datasets, and novelty. Papers below 0.3 are filtered out; the top N by score are selected. Quantitative claims are extracted from passing papers.
3. Reproduce
Sandboxed Docker execution to replicate baseline results. Uses python-on-whales for container management and GitHub API for code retrieval.
4. Explore
AIDE-based tree search generates hypotheses, forks code, and executes experiments to find SOTA improvements using UCB pruning.
5. Write
Generates full LaTeX manuscripts via pylatex, compiling figures, bibliography (pybtex), and discussion of results.
Human-in-the-Loop Gates
Autonomous execution is governed by mandatory
checkpoints where human review is required before
proceeding to cost-intensive stages. Gates use
LangGraph's interrupt() mechanism with Command(resume=...) to pause the pipeline, collect interactive input, and resume
from the exact checkpoint.
| Gate ID | Stage | User Action | Context Shown |
|---|---|---|---|
| GATE_HYPOTHESIS | Post-Screening | Select a paper from interactive list, approve/reject | Ranked paper list with titles, sources, relevance scores |
| GATE_EXPERIMENT | Post-Reproduction | Validate baseline results, approve exploration | Baseline metrics |
| GATE_RESULTS | Post-Exploration | Verify improvements, approve writing | Exploration tree with baseline vs improved metrics |
| GATE_MANUSCRIPT | Post-Writing | Final paper review before completion | PDF path, manuscript sections |
Gate Mechanism
Gates are registered as interrupt_before
nodes in the LangGraph state machine. When the pipeline
reaches a gate:
- The event stream pauses and the live display shows an "AWAITING REVIEW" indicator
- The gate handler displays context (papers, metrics, tree) and collects user input interactively
- For GATE_HYPOTHESIS, paper selection uses an arrow-key menu instead of index typing
-
The response is sent back via
Command(resume=response), which delivers it to theinterrupt()call inside the gate node - The pipeline resumes from the checkpoint; approved gates advance to the next phase, rejected gates end the pipeline
Pipeline Resume
Pipelines are durably checkpointed after every node via PostgreSQL (or in-memory fallback). If a pipeline is cancelled or interrupted at any point — including mid-screening or at a human gate — it can be resumed from exactly where it left off without re-running completed stages.
Resume from Gate
If paused at a human gate, resuming detects the
pending gate via snapshot.next, shows the gate UI, and sends the response.
The pipeline continues from the checkpoint.
Resume from Crash
On resume, the pipeline streams from the last
checkpoint by passing None
as input to astream(). Completed nodes (ingest, screen, etc.) are
not re-executed.
theia resume-pipeline <pipeline-id>
# Or from the home menu, select a pipeline marked "awaiting_human"
theia
Memory System
THEIA implements a three-tier cognitive memory architecture to maintain context over long research horizons.
ChromaDB Vector Store
Stores timestamped observations and events with hybrid retrieval.
score = recency^α * relevance^β *
importance^γ Neo4j Knowledge Graph
Maps relationships between papers, methods, and results.
(Paper)-[:CITES]->(Paper),
(Method)-[:USED_BY]->(Experiment) Case Bank (JSON)
Stores successful experiment patterns (code templates, hyperparams) for few-shot retrieval in future tasks.
Infrastructure
PostgreSQL 16
Persistent graph state & checkpoints
Redis 7
Celery async job queue management
Neo4j 5
Cypher queries & relationship mapping
ChromaDB
High-dimensional vector storage
Configuration
THEIA is configured via environment variables. Create a .env file in the root directory.
| Variable | Description | Default |
|---|---|---|
| DATABASE_URL | PostgreSQL connection string | postgresql://... |
| OPENROUTER_API_KEY | Primary LLM aggregator key | - |
| ANTHROPIC_API_KEY | Direct Anthropic access (optional) | - |
| NEO4J_URI | Knowledge graph connection | bolt://localhost:7687 |
| MAX_COST_PER_PIPELINE | Hard limit for run budget ($) | 100.0 |
| GPU_ENABLED | Use local CUDA if available | True |