01. Architecture Overview
Refery is built as a hybrid AI system: deterministic rule engines and vector retrieval handle the high-volume work, while LLM-powered adversarial panels handle the small set of genuinely ambiguous evaluations that remain. Every component is observable, auditable, and idempotent.
High-level system diagram
graph TB
subgraph Sources["Data sources"]
SCOUTS[300+ Operator Scouts<br/>HITL labelers]
GMAIL[Gmail<br/>candidate + client comms]
LINKEDIN[LinkedIn signals]
JDS[Public + private<br/>job descriptions]
end
subgraph Ingest["Ingest + normalization"]
JOB_INGEST[Job ingestion<br/>verbatim JD preservation]
CAND_INGEST[Candidate intake<br/>resume + signal parse]
AUTO_DRAFT[Auto-draft rule engine<br/>off-ICP filtering]
end
subgraph Core["Core intelligence"]
SIGNAL[Signal engine<br/>logo tier, trajectory,<br/>pedigree, AI bonus]
EMBED[Multi-vector embedder<br/>pgvector]
RETRIEVE[Top-K retriever<br/>vector + filter]
PANEL[5-persona panel<br/>adversarial evaluation]
BRACKET[Bracketing +<br/>stage fit matrix]
end
subgraph State["Pipeline + state"]
SM[State machine<br/>9-stage forward-only]
HISTORY[Append-only history<br/>pipeline_stage_history]
RECON[Gmail reconciliation<br/>evidence-bound transitions]
end
subgraph Outreach["Automated outreach"]
WATERFALL[Tier-based waterfall<br/>1차 → 5차]
VOICE[Refery voice engine<br/>style-consistent drafts]
GMAIL_OUT[Gmail draft writer]
end
subgraph Storage["Persistence"]
SUPA[(Supabase Postgres<br/>candidates, jobs, pipeline,<br/>history, notes, embeddings)]
end
SCOUTS --> CAND_INGEST
GMAIL --> CAND_INGEST
LINKEDIN --> CAND_INGEST
JDS --> JOB_INGEST
JOB_INGEST --> AUTO_DRAFT
AUTO_DRAFT --> SUPA
CAND_INGEST --> SUPA
SUPA --> SIGNAL
SUPA --> EMBED
EMBED --> RETRIEVE
SIGNAL --> RETRIEVE
RETRIEVE --> PANEL
PANEL --> BRACKET
BRACKET --> SM
SM --> HISTORY
GMAIL --> RECON
RECON --> SM
BRACKET --> WATERFALL
WATERFALL --> VOICE
VOICE --> GMAIL_OUT
GMAIL_OUT --> GMAIL
Tech stack
Refery runs on a deliberately minimal stack chosen for cost efficiency, low operational overhead, and strong primitives for AI workloads.
| Layer | Technology | Why |
|---|---|---|
| Frontend | Next.js 14 (App Router), React, TypeScript | Server components reduce client bundle; React Server Actions remove API plumbing |
| Hosting | Vercel | Edge functions for low-latency embeddings, automatic preview environments |
| Database | Supabase Postgres | Strong relational + JSON for hybrid records, RLS for multi-tenant safety |
| Vector store | pgvector (in Supabase) | Co-located with relational data, eliminates a separate vector DB and an extra network hop |
| Auth | Supabase Auth | Row-level security policies enforce data boundaries at the SQL layer |
| Workflow orchestration | Skill-based runners (deterministic) | Bounded execution cost vs unbounded agent loops |
| LLM layer | Anthropic Claude API + OpenAI embeddings | Anthropic for the persona panel; OpenAI for cheap, well-benchmarked embeddings |
| Email integration | Gmail API | First-party data ingestion for candidate/client comms |
| Search | Postgres full-text + pgvector hybrid | One database for both keyword and semantic search |
| Observability | Vercel logs + Supabase logs + custom audit trails | All state transitions logged to pipeline_stage_history |
The choice to colocate vector embeddings with relational data in pgvector (instead of using a dedicated vector database like Pinecone or Weaviate) is one of Refery's most consequential engineering decisions. It eliminates an entire class of synchronization bugs, reduces operational complexity, removes a network hop from the critical retrieval path, and is materially cheaper at Refery's data volume. The same SELECT can join candidate metadata with vector similarity in a single query.
Data flow: candidate intake to pipeline
A new candidate enters the system through one of three paths: a scout submission, a referral via Gmail, or a direct intake. The flow that follows is the same in all cases.
sequenceDiagram
participant Source as Scout / Gmail / Intake
participant Ingest as Candidate Intake
participant Signal as Signal Engine
participant Embed as Embedder
participant Retrieve as Retriever
participant Panel as 5-Persona Panel
participant DB as Supabase
participant Lily as Operator (Lily)
Source->>Ingest: Resume + context
Ingest->>DB: INSERT candidates row
Ingest->>Signal: Compute deterministic signals
Signal->>Signal: Logo tier (raw → modified)<br/>Trajectory<br/>Non-tech flag<br/>Sales profile
Signal->>DB: UPDATE candidates with signals
Ingest->>Embed: Build candidate vector
Embed->>DB: INSERT into embeddings (pgvector)
Retrieve->>DB: Top-K open jobs by vector + filters
DB-->>Retrieve: Top 30 candidate-role pairs
Retrieve->>Panel: Top 30 → panel evaluation
Panel->>Panel: 5 personas × adversarial scoring<br/>Hard veto check<br/>Aggregate + bracket
Panel->>DB: UPDATE candidates.ai_analysis<br/>INSERT pipeline rows<br/>INSERT history rows
Panel-->>Lily: Brief + match table + screening questions
The architecture is intentionally split so that the expensive step (the panel) only ever runs on a pre-filtered, retrieval-ranked set of 20-30 candidate-role pairs, not on the full cross product of candidates × jobs. This is the core efficiency lever and is described in detail in scalability and efficiency.
Component boundaries
Each component has a single, well-defined responsibility and a stable interface. This is what makes the system extensible without breaking existing flows.
- Signal engine is pure: same input always produces the same output. No external API calls. Runs in milliseconds.
- Embedder is idempotent and cached. A candidate's vector is recomputed only when the underlying signals change.
- Retriever is read-only. It never mutates state.
- Panel is the only LLM-heavy component. It is invoked sparingly and produces structured output that downstream components consume.
- State machine is the only component that writes to
job_candidate_pipeline.stage. Every transition emits apipeline_stage_historyrow. No other code path is allowed to mutate stage directly. - Reconciliation engine is idempotent. Re-running it with no new evidence produces zero writes.
This discipline matters because the system runs on a small operator team. Strict component boundaries mean a junior engineer can extend any one piece without understanding the entire system, and bugs are localized rather than cascading.
What is novel here
Several pieces of this architecture are unusual in the recruiting tech space and constitute proprietary technical know-how:
- Hybrid deterministic + retrieval + adversarial-LLM architecture. Most "AI recruiting" platforms are either pure rules (no semantic understanding) or pure LLM (expensive, non-deterministic, biased). Refery's three-tier architecture inherits the strengths of each.
- Co-located vector and relational data via pgvector. Eliminates a class of consistency bugs that single-database systems do not have.
- Append-only state machine with evidence-bound transitions. Pipeline integrity is guaranteed by the database, not by application code.
- Skill-based execution boundaries. Each automation runs as a deterministic, bounded skill rather than an open-ended agent. This is the difference between predictable cost and runaway cost.
- Production-grade structured signals (logo tier, trajectory, pedigree, AI bonus) replacing free-text resume parsing. These are encoded as explicit data structures, not extracted ad-hoc per query.
The next chapters describe each of these in detail.