THEIA OS | Documentation

System Overview

THEIA OS (Transformative Hypothesis Exploration & Intelligence Architecture) fundamentally reimagines the scientific method as a computational loop. Unlike traditional assistants that aid humans, THEIA acts as the primary agent, taking high-level research goals and executing the entire discovery lifecycle.

Core Capabilities

Autonomous Ingestion: Concurrently reads from OpenAlex, arXiv, bioRxiv, and Semantic Scholar.
Deep Screening: Evaluates papers for reproducibility and quantitative claims using cascaded LLMs.
Agentic Tree Search: Explores hypothesis spaces using UCB algorithms instead of linear iteration.
Self-Correction: Detects execution failures and autonomously debugs experimental code.

Stateful by Default

Time-travel debugging via PostgreSQL checkpointing of the entire graph state.

Tree Search

Non-linear hypothesis exploration pruning less promising branches early.

Cost-Aware

Real-time token tracking and budget enforcement per research pipeline.

Origins & Etymology

Theia (Θεία)

In Greek mythology, Theia is the Titaness of sight (thea) and the shining ether of the bright blue sky. Mother of Helios (the Sun), Selene (the Moon), and Eos (the Dawn), she endowed gold and silver with their brilliance.

"THEIA OS embodies this ancient vision—the capacity to see beyond the known horizon of scientific literature and illuminate high-value hypotheses in the vast darkness of the search space."

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/to-be-added
cd theia

# Install the package
pip install -e .

2. Launch & Onboard

# Start the interactive setup wizard
theia

On first launch, THEIA runs a wizard to configure API keys (Anthropic, OpenAI, etc.) and model preferences.

Architecture

THEIA OS follows a Compound AI System architecture, orchestrating specialized agents through a LangGraph state machine with durable PostgreSQL checkpointing.

Component	Technology	Purpose
Orchestration	LangGraph	Stateful graph execution & human-in-the-loop gates
Knowledge Graph	Neo4j	Semantic reasoning (Paper → Cites → Method)
Vector Store	ChromaDB	Episodic memory & semantic search
Runtime	Docker	Sandboxed code reproduction & experiments

graph TD subgraph Control_Plane [THEIA Control Plane] CLI[CLI/UI] Scheduler CostMgr[Cost Manager] API[REST API] end subgraph Core LangGraph[LangGraph Core <br/> State Machine] State[PostgreSQL State Manager] end subgraph Memory [Memory System] Episodic[Episodic <br/> ChromaDB] Semantic[Semantic <br/> Neo4j KG] Procedural[Procedural <br/> Case Bank] end subgraph Agents [Agent Pipeline] Ingest --> Screen Screen --> Reproduce Reproduce --> Explore Explore --> Write end CLI --> LangGraph LangGraph --> State LangGraph --> Agents Agents <--> Memory style Control_Plane fill:#09090b,stroke:#333,color:#fff style Core fill:#0f172a,stroke:#3b82f6,color:#fff style Memory fill:#1c1917,stroke:#ef4444,color:#fff style Agents fill:#0f172a,stroke:#10b981,color:#fff

Interactive Research Planner

Before executing a pipeline, THEIA guides users through an interactive planning session that builds a structured research plan. The planner uses phased questioning, a deterministic readiness score, and a refinement loop to ensure plan quality.

Phase 1: Core

Two mandatory seed questions (query, domain) plus LLM-generated follow-ups for hypothesis.

Fields: query (20), domain (10), hypothesis (20)

Phase 2: Methodology

LLM decides whether to ask about methodology, metrics, or baselines — or skip if already inferred.

Fields: methodology (15), metrics (8), baselines (7)

Phase 3: Resources

Optional questions about novelty claims, datasets, and key papers to strengthen the plan.

Fields: novelty (8), datasets (5), key_papers (7)

Readiness Score

A deterministic score (0–100) computed from field weights. Fields confirmed by direct user answers receive full weight; fields auto-inferred from other answers receive half weight. The pipeline requires a minimum score of 40 to run.

0–39 = blocked 40–64 = ready 65–84 = good 85–100 = excellent

Refinement Loop

After all phases complete, users enter a refinement loop with five options:

Run pipeline Save and execute (requires score ≥ 40)

Edit a field Pick a field by number and type a new value directly

Refine with AI Describe changes in plain English; LLM updates fields

Ask more questions Re-enter the phased question flow

Core Pipeline

1

1. Ingest

Searches OpenAlex, arXiv, bioRxiv, and Semantic Scholar using enriched multi-query search. The primary query is augmented with domain, methodology, baseline, and dataset terms from the research plan. Additional focused queries (methodology-angle, baseline-angle, dataset-angle) run in parallel for broader recall. Results are deduplicated by paper ID.

2

2. Screen

Each paper is scored 0.0–1.0 by an LLM using a plan-aware relevance prompt that includes the full research context: query, domain, hypothesis, methodology, metrics, baselines, datasets, and novelty. Papers below 0.3 are filtered out; the top N by score are selected. Quantitative claims are extracted from passing papers.

3

3. Reproduce

Sandboxed Docker execution to replicate baseline results. Uses python-on-whales for container management and GitHub API for code retrieval.

4

4. Explore

AIDE-based tree search generates hypotheses, forks code, and executes experiments to find SOTA improvements using UCB pruning.

5

5. Write

Generates full LaTeX manuscripts via pylatex, compiling figures, bibliography (pybtex), and discussion of results.

Human-in-the-Loop Gates

Autonomous execution is governed by mandatory checkpoints where human review is required before proceeding to cost-intensive stages. Gates use LangGraph's interrupt() mechanism with Command(resume=...) to pause the pipeline, collect interactive input, and resume from the exact checkpoint.

Gate ID	Stage	User Action	Context Shown
GATE_HYPOTHESIS	Post-Screening	Select a paper from interactive list, approve/reject	Ranked paper list with titles, sources, relevance scores
GATE_EXPERIMENT	Post-Reproduction	Validate baseline results, approve exploration	Baseline metrics
GATE_RESULTS	Post-Exploration	Verify improvements, approve writing	Exploration tree with baseline vs improved metrics
GATE_MANUSCRIPT	Post-Writing	Final paper review before completion	PDF path, manuscript sections

Gate Mechanism

Gates are registered as interrupt_before nodes in the LangGraph state machine. When the pipeline reaches a gate:

The event stream pauses and the live display shows an "AWAITING REVIEW" indicator
The gate handler displays context (papers, metrics, tree) and collects user input interactively
For GATE_HYPOTHESIS, paper selection uses an arrow-key menu instead of index typing
The response is sent back via Command(resume=response), which delivers it to the interrupt() call inside the gate node
The pipeline resumes from the checkpoint; approved gates advance to the next phase, rejected gates end the pipeline

Pipeline Resume

Pipelines are durably checkpointed after every node via PostgreSQL (or in-memory fallback). If a pipeline is cancelled or interrupted at any point — including mid-screening or at a human gate — it can be resumed from exactly where it left off without re-running completed stages.

Resume from Gate

If paused at a human gate, resuming detects the pending gate via snapshot.next, shows the gate UI, and sends the response. The pipeline continues from the checkpoint.

Resume from Crash

On resume, the pipeline streams from the last checkpoint by passing None as input to astream(). Completed nodes (ingest, screen, etc.) are not re-executed.

# Resume a paused pipeline
theia resume-pipeline <pipeline-id>

# Or from the home menu, select a pipeline marked "awaiting_human"
theia

Memory System

THEIA implements a three-tier cognitive memory architecture to maintain context over long research horizons.

Episodic

ChromaDB Vector Store

Stores timestamped observations and events with hybrid retrieval.

score = recency^α * relevance^β *
                                    importance^γ

Semantic

Neo4j Knowledge Graph

Maps relationships between papers, methods, and results.

(Paper)-[:CITES]->(Paper),
                                    (Method)-[:USED_BY]->(Experiment)

Procedural

Case Bank (JSON)

Stores successful experiment patterns (code templates, hyperparams) for few-shot retrieval in future tasks.

Infrastructure

State Manager

PostgreSQL 16

Persistent graph state & checkpoints

Task Broker

Redis 7

Celery async job queue management

Knowledge Graph

Neo4j 5

Cypher queries & relationship mapping

Embeddings

ChromaDB

High-dimensional vector storage

Configuration

THEIA is configured via environment variables. Create a .env file in the root directory.

Variable	Description	Default
DATABASE_URL	PostgreSQL connection string	postgresql://...
OPENROUTER_API_KEY	Primary LLM aggregator key	-
ANTHROPIC_API_KEY	Direct Anthropic access (optional)	-
NEO4J_URI	Knowledge graph connection	bolt://localhost:7687
MAX_COST_PER_PIPELINE	Hard limit for run budget ($)	100.0
GPU_ENABLED	Use local CUDA if available	True