Documentation

THEIA OS is a compound AI system designed to decouple scientific discovery from human constraints. It operates as an unsupervised operating system for autonomous research.

System Overview

THEIA OS (Transformative Hypothesis Exploration & Intelligence Architecture) fundamentally reimagines the scientific method as a computational loop. Unlike traditional assistants that aid humans, THEIA acts as the primary agent, taking high-level research goals and executing the entire discovery lifecycle.

Core Capabilities

  • Autonomous Ingestion: Concurrently reads from OpenAlex, arXiv, bioRxiv, and Semantic Scholar.
  • Deep Screening: Evaluates papers for reproducibility and quantitative claims using cascaded LLMs.
  • Agentic Tree Search: Explores hypothesis spaces using UCB algorithms instead of linear iteration.
  • Self-Correction: Detects execution failures and autonomously debugs experimental code.

Stateful by Default

Time-travel debugging via PostgreSQL checkpointing of the entire graph state.

Tree Search

Non-linear hypothesis exploration pruning less promising branches early.

Cost-Aware

Real-time token tracking and budget enforcement per research pipeline.

Titaness Theia
Origins & Etymology

Theia (Θεία)

In Greek mythology, Theia is the Titaness of sight (thea) and the shining ether of the bright blue sky. Mother of Helios (the Sun), Selene (the Moon), and Eos (the Dawn), she endowed gold and silver with their brilliance.

"THEIA OS embodies this ancient vision—the capacity to see beyond the known horizon of scientific literature and illuminate high-value hypotheses in the vast darkness of the search space."

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/to-be-added
cd theia

# Install the package
pip install -e .

2. Launch & Onboard

# Start the interactive setup wizard
theia

On first launch, THEIA runs a wizard to configure API keys (Anthropic, OpenAI, etc.) and model preferences.

Architecture

THEIA OS follows a Compound AI System architecture, orchestrating specialized agents through a LangGraph state machine with durable PostgreSQL checkpointing.

Component Technology Purpose
Orchestration LangGraph Stateful graph execution & human-in-the-loop gates
Knowledge Graph Neo4j Semantic reasoning (Paper → Cites → Method)
Vector Store ChromaDB Episodic memory & semantic search
Runtime Docker Sandboxed code reproduction & experiments
graph TD subgraph Control_Plane [THEIA Control Plane] CLI[CLI/UI] Scheduler CostMgr[Cost Manager] API[REST API] end subgraph Core LangGraph[LangGraph Core <br/> State Machine] State[PostgreSQL State Manager] end subgraph Memory [Memory System] Episodic[Episodic <br/> ChromaDB] Semantic[Semantic <br/> Neo4j KG] Procedural[Procedural <br/> Case Bank] end subgraph Agents [Agent Pipeline] Ingest --> Screen Screen --> Reproduce Reproduce --> Explore Explore --> Write end CLI --> LangGraph LangGraph --> State LangGraph --> Agents Agents <--> Memory style Control_Plane fill:#09090b,stroke:#333,color:#fff style Core fill:#0f172a,stroke:#3b82f6,color:#fff style Memory fill:#1c1917,stroke:#ef4444,color:#fff style Agents fill:#0f172a,stroke:#10b981,color:#fff

Interactive Research Planner

Before executing a pipeline, THEIA guides users through an interactive planning session that builds a structured research plan. The planner uses phased questioning, a deterministic readiness score, and a refinement loop to ensure plan quality.

Phase 1: Core

Two mandatory seed questions (query, domain) plus LLM-generated follow-ups for hypothesis.

Fields: query (20), domain (10), hypothesis (20)

Phase 2: Methodology

LLM decides whether to ask about methodology, metrics, or baselines — or skip if already inferred.

Fields: methodology (15), metrics (8), baselines (7)

Phase 3: Resources

Optional questions about novelty claims, datasets, and key papers to strengthen the plan.

Fields: novelty (8), datasets (5), key_papers (7)

Readiness Score

A deterministic score (0–100) computed from field weights. Fields confirmed by direct user answers receive full weight; fields auto-inferred from other answers receive half weight. The pipeline requires a minimum score of 40 to run.

0–39 = blocked 40–64 = ready 65–84 = good 85–100 = excellent

Refinement Loop

After all phases complete, users enter a refinement loop with five options:

Run pipeline Save and execute (requires score ≥ 40)
Edit a field Pick a field by number and type a new value directly
Refine with AI Describe changes in plain English; LLM updates fields
Ask more questions Re-enter the phased question flow

Core Pipeline

1

1. Ingest

Searches OpenAlex, arXiv, bioRxiv, and Semantic Scholar using enriched multi-query search. The primary query is augmented with domain, methodology, baseline, and dataset terms from the research plan. Additional focused queries (methodology-angle, baseline-angle, dataset-angle) run in parallel for broader recall. Results are deduplicated by paper ID.

2

2. Screen

Each paper is scored 0.0–1.0 by an LLM using a plan-aware relevance prompt that includes the full research context: query, domain, hypothesis, methodology, metrics, baselines, datasets, and novelty. Papers below 0.3 are filtered out; the top N by score are selected. Quantitative claims are extracted from passing papers.

3

3. Reproduce

Sandboxed Docker execution to replicate baseline results. Uses python-on-whales for container management and GitHub API for code retrieval.

4

4. Explore

AIDE-based tree search generates hypotheses, forks code, and executes experiments to find SOTA improvements using UCB pruning.

5

5. Write

Generates full LaTeX manuscripts via pylatex, compiling figures, bibliography (pybtex), and discussion of results.

Human-in-the-Loop Gates

Autonomous execution is governed by mandatory checkpoints where human review is required before proceeding to cost-intensive stages. Gates use LangGraph's interrupt() mechanism with Command(resume=...) to pause the pipeline, collect interactive input, and resume from the exact checkpoint.

Gate ID Stage User Action Context Shown
GATE_HYPOTHESIS Post-Screening Select a paper from interactive list, approve/reject Ranked paper list with titles, sources, relevance scores
GATE_EXPERIMENT Post-Reproduction Validate baseline results, approve exploration Baseline metrics
GATE_RESULTS Post-Exploration Verify improvements, approve writing Exploration tree with baseline vs improved metrics
GATE_MANUSCRIPT Post-Writing Final paper review before completion PDF path, manuscript sections

Gate Mechanism

Gates are registered as interrupt_before nodes in the LangGraph state machine. When the pipeline reaches a gate:

  1. The event stream pauses and the live display shows an "AWAITING REVIEW" indicator
  2. The gate handler displays context (papers, metrics, tree) and collects user input interactively
  3. For GATE_HYPOTHESIS, paper selection uses an arrow-key menu instead of index typing
  4. The response is sent back via Command(resume=response), which delivers it to the interrupt() call inside the gate node
  5. The pipeline resumes from the checkpoint; approved gates advance to the next phase, rejected gates end the pipeline

Pipeline Resume

Pipelines are durably checkpointed after every node via PostgreSQL (or in-memory fallback). If a pipeline is cancelled or interrupted at any point — including mid-screening or at a human gate — it can be resumed from exactly where it left off without re-running completed stages.

Resume from Gate

If paused at a human gate, resuming detects the pending gate via snapshot.next, shows the gate UI, and sends the response. The pipeline continues from the checkpoint.

Resume from Crash

On resume, the pipeline streams from the last checkpoint by passing None as input to astream(). Completed nodes (ingest, screen, etc.) are not re-executed.

# Resume a paused pipeline
theia resume-pipeline <pipeline-id>

# Or from the home menu, select a pipeline marked "awaiting_human"
theia

Memory System

THEIA implements a three-tier cognitive memory architecture to maintain context over long research horizons.

Episodic

ChromaDB Vector Store

Stores timestamped observations and events with hybrid retrieval.

score = recency^α * relevance^β * importance^γ
Semantic

Neo4j Knowledge Graph

Maps relationships between papers, methods, and results.

(Paper)-[:CITES]->(Paper), (Method)-[:USED_BY]->(Experiment)
Procedural

Case Bank (JSON)

Stores successful experiment patterns (code templates, hyperparams) for few-shot retrieval in future tasks.

Infrastructure

State Manager

PostgreSQL 16

Persistent graph state & checkpoints

Task Broker

Redis 7

Celery async job queue management

Knowledge Graph

Neo4j 5

Cypher queries & relationship mapping

Embeddings

ChromaDB

High-dimensional vector storage

Configuration

THEIA is configured via environment variables. Create a .env file in the root directory.

Variable Description Default
DATABASE_URL PostgreSQL connection string postgresql://...
OPENROUTER_API_KEY Primary LLM aggregator key -
ANTHROPIC_API_KEY Direct Anthropic access (optional) -
NEO4J_URI Knowledge graph connection bolt://localhost:7687
MAX_COST_PER_PIPELINE Hard limit for run budget ($) 100.0
GPU_ENABLED Use local CUDA if available True