01. Architecture Overview

Refery is built as a hybrid AI system: deterministic rule engines and vector retrieval handle the high-volume work, while LLM-powered adversarial panels handle the small set of genuinely ambiguous evaluations that remain. Every component is observable, auditable, and idempotent.

High-level system diagram

graph TB
  subgraph Sources["Data sources"]
    SCOUTS[300+ Operator Scouts<br/>HITL labelers]
    GMAIL[Gmail<br/>candidate + client comms]
    LINKEDIN[LinkedIn signals]
    JDS[Public + private<br/>job descriptions]
  end

  subgraph Ingest["Ingest + normalization"]
    JOB_INGEST[Job ingestion<br/>verbatim JD preservation]
    CAND_INGEST[Candidate intake<br/>resume + signal parse]
    AUTO_DRAFT[Auto-draft rule engine<br/>off-ICP filtering]
  end

  subgraph Core["Core intelligence"]
    SIGNAL[Signal engine<br/>logo tier, trajectory,<br/>pedigree, AI bonus]
    EMBED[Multi-vector embedder<br/>pgvector]
    RETRIEVE[Top-K retriever<br/>vector + filter]
    PANEL[5-persona panel<br/>adversarial evaluation]
    BRACKET[Bracketing +<br/>stage fit matrix]
  end

  subgraph State["Pipeline + state"]
    SM[State machine<br/>9-stage forward-only]
    HISTORY[Append-only history<br/>pipeline_stage_history]
    RECON[Gmail reconciliation<br/>evidence-bound transitions]
  end

  subgraph Outreach["Automated outreach"]
    WATERFALL[Tier-based waterfall<br/>1차 → 5차]
    VOICE[Refery voice engine<br/>style-consistent drafts]
    GMAIL_OUT[Gmail draft writer]
  end

  subgraph Storage["Persistence"]
    SUPA[(Supabase Postgres<br/>candidates, jobs, pipeline,<br/>history, notes, embeddings)]
  end

  SCOUTS --> CAND_INGEST
  GMAIL --> CAND_INGEST
  LINKEDIN --> CAND_INGEST
  JDS --> JOB_INGEST
  JOB_INGEST --> AUTO_DRAFT
  AUTO_DRAFT --> SUPA
  CAND_INGEST --> SUPA

  SUPA --> SIGNAL
  SUPA --> EMBED
  EMBED --> RETRIEVE
  SIGNAL --> RETRIEVE
  RETRIEVE --> PANEL
  PANEL --> BRACKET
  BRACKET --> SM

  SM --> HISTORY
  GMAIL --> RECON
  RECON --> SM

  BRACKET --> WATERFALL
  WATERFALL --> VOICE
  VOICE --> GMAIL_OUT
  GMAIL_OUT --> GMAIL

Tech stack

Refery runs on a deliberately minimal stack chosen for cost efficiency, low operational overhead, and strong primitives for AI workloads.

Layer	Technology	Why
Frontend	Next.js 14 (App Router), React, TypeScript	Server components reduce client bundle; React Server Actions remove API plumbing
Hosting	Vercel	Edge functions for low-latency embeddings, automatic preview environments
Database	Supabase Postgres	Strong relational + JSON for hybrid records, RLS for multi-tenant safety
Vector store	pgvector (in Supabase)	Co-located with relational data, eliminates a separate vector DB and an extra network hop
Auth	Supabase Auth	Row-level security policies enforce data boundaries at the SQL layer
Workflow orchestration	Skill-based runners (deterministic)	Bounded execution cost vs unbounded agent loops
LLM layer	Anthropic Claude API + OpenAI embeddings	Anthropic for the persona panel; OpenAI for cheap, well-benchmarked embeddings
Email integration	Gmail API	First-party data ingestion for candidate/client comms
Search	Postgres full-text + pgvector hybrid	One database for both keyword and semantic search
Observability	Vercel logs + Supabase logs + custom audit trails	All state transitions logged to `pipeline_stage_history`

The choice to colocate vector embeddings with relational data in pgvector (instead of using a dedicated vector database like Pinecone or Weaviate) is one of Refery's most consequential engineering decisions. It eliminates an entire class of synchronization bugs, reduces operational complexity, removes a network hop from the critical retrieval path, and is materially cheaper at Refery's data volume. The same SELECT can join candidate metadata with vector similarity in a single query.

Data flow: candidate intake to pipeline

A new candidate enters the system through one of three paths: a scout submission, a referral via Gmail, or a direct intake. The flow that follows is the same in all cases.

sequenceDiagram
  participant Source as Scout / Gmail / Intake
  participant Ingest as Candidate Intake
  participant Signal as Signal Engine
  participant Embed as Embedder
  participant Retrieve as Retriever
  participant Panel as 5-Persona Panel
  participant DB as Supabase
  participant Lily as Operator (Lily)

  Source->>Ingest: Resume + context
  Ingest->>DB: INSERT candidates row
  Ingest->>Signal: Compute deterministic signals
  Signal->>Signal: Logo tier (raw → modified)<br/>Trajectory<br/>Non-tech flag<br/>Sales profile
  Signal->>DB: UPDATE candidates with signals
  Ingest->>Embed: Build candidate vector
  Embed->>DB: INSERT into embeddings (pgvector)
  Retrieve->>DB: Top-K open jobs by vector + filters
  DB-->>Retrieve: Top 30 candidate-role pairs
  Retrieve->>Panel: Top 30 → panel evaluation
  Panel->>Panel: 5 personas × adversarial scoring<br/>Hard veto check<br/>Aggregate + bracket
  Panel->>DB: UPDATE candidates.ai_analysis<br/>INSERT pipeline rows<br/>INSERT history rows
  Panel-->>Lily: Brief + match table + screening questions

The architecture is intentionally split so that the expensive step (the panel) only ever runs on a pre-filtered, retrieval-ranked set of 20-30 candidate-role pairs, not on the full cross product of candidates × jobs. This is the core efficiency lever and is described in detail in scalability and efficiency.

Component boundaries

Each component has a single, well-defined responsibility and a stable interface. This is what makes the system extensible without breaking existing flows.

Signal engine is pure: same input always produces the same output. No external API calls. Runs in milliseconds.
Embedder is idempotent and cached. A candidate's vector is recomputed only when the underlying signals change.
Retriever is read-only. It never mutates state.
Panel is the only LLM-heavy component. It is invoked sparingly and produces structured output that downstream components consume.
State machine is the only component that writes to job_candidate_pipeline.stage. Every transition emits a pipeline_stage_history row. No other code path is allowed to mutate stage directly.
Reconciliation engine is idempotent. Re-running it with no new evidence produces zero writes.

This discipline matters because the system runs on a small operator team. Strict component boundaries mean a junior engineer can extend any one piece without understanding the entire system, and bugs are localized rather than cascading.

What is novel here

Several pieces of this architecture are unusual in the recruiting tech space and constitute proprietary technical know-how:

Hybrid deterministic + retrieval + adversarial-LLM architecture. Most "AI recruiting" platforms are either pure rules (no semantic understanding) or pure LLM (expensive, non-deterministic, biased). Refery's three-tier architecture inherits the strengths of each.
Co-located vector and relational data via pgvector. Eliminates a class of consistency bugs that single-database systems do not have.
Append-only state machine with evidence-bound transitions. Pipeline integrity is guaranteed by the database, not by application code.
Skill-based execution boundaries. Each automation runs as a deterministic, bounded skill rather than an open-ended agent. This is the difference between predictable cost and runaway cost.
Production-grade structured signals (logo tier, trajectory, pedigree, AI bonus) replacing free-text resume parsing. These are encoded as explicit data structures, not extracted ad-hoc per query.

The next chapters describe each of these in detail.