05. Rule Engines

Most decisions in a recruiting platform are not genuinely ambiguous. A senior engineering role at a US Series A company that lists "$50K base salary" is a typo. A "Director of Operations" at a 4-person consulting shop is not the right ICP for Refery's product. A candidate who explicitly asked not to be contacted should never appear in any outreach query.

Refery encodes these decisions in rule engines: small, deterministic, declarative codepaths that run at near-zero compute cost and produce auditable outputs. The principle is simple: deterministic decisions belong in rules, ambiguous decisions belong in the panel. Rules are free; LLM calls are not.

This chapter describes three of the production rule engines:

Auto-draft job filtering, which moves off-ICP roles to draft state before any matching happens.
Tier-based outreach waterfall, which selects the right contact at each company based on a five-tier hierarchy.
Hard-block compliance layer, which enforces blacklists at the SQL layer.

Auto-draft job filtering

When new jobs are ingested into Refery, they are passed through a sequence of filters that classify each job as either "ready for matching" or "moved to draft." Drafts are not matched, not surfaced to candidates, and not included in the live retrieval index.

The filters are ordered by cost: cheapest first. A job that fails on the location filter never reaches the keyword filter, never reaches the salary filter, and certainly never reaches the LLM-based ICP classifier (which is reserved for genuinely ambiguous edge cases).

Filter 1: Geography

US-based startups are Refery's primary ICP. Jobs explicitly tagged for non-US-only locations are moved to draft.

// rules/job-filters/geography.ts

const NON_US_INDICATORS = [
  'london', 'berlin', 'paris', 'tokyo', 'singapore', 'sydney',
  'mumbai', 'bangalore', 'hyderabad', 'mexico city', 'são paulo',
  'amsterdam', 'munich', 'zurich', 'stockholm', 'copenhagen',
];

const US_OVERRIDE_INDICATORS = [
  'us only', 'united states', 'us-based', 'us residents only',
  'sf', 'san francisco', 'nyc', 'new york', 'remote (us)',
];

export function passesGeographyFilter(job: Job): FilterResult {
  const location = job.location.toLowerCase();
  const description = job.description.toLowerCase().slice(0, 500);

  const hasUSOverride = US_OVERRIDE_INDICATORS.some(s =>
    location.includes(s) || description.includes(s)
  );

  if (hasUSOverride) {
    return { pass: true };
  }

  const hasNonUSIndicator = NON_US_INDICATORS.some(s =>
    location.includes(s)
  );

  if (hasNonUSIndicator) {
    return {
      pass: false,
      reason: `Non-US location: "${job.location}"`,
      suggestedStage: 'draft',
    };
  }

  return { pass: true };
}

Filter 2: Seniority and ICP

Junior roles are out of scope. Refery places senior engineers and senior GTM. The seniority filter is a keyword check against title and description.

// rules/job-filters/seniority.ts

const JUNIOR_TITLE_INDICATORS = [
  'intern', 'internship', 'apprentice', 'trainee',
  'junior', 'jr.', 'jr ', 'entry level', 'entry-level',
  'graduate', 'new grad', 'associate', 'assistant',
];

const SENIOR_OVERRIDE_INDICATORS = [
  'senior', 'sr.', 'sr ', 'staff', 'principal', 'lead',
  'director', 'vp', 'head of', 'chief',
];

export function passesSeniorityFilter(job: Job): FilterResult {
  const title = job.title.toLowerCase();

  // If title explicitly says senior+, pass immediately
  if (SENIOR_OVERRIDE_INDICATORS.some(s => title.includes(s))) {
    return { pass: true };
  }

  // If title contains junior indicators, fail
  if (JUNIOR_TITLE_INDICATORS.some(s => title.includes(s))) {
    return {
      pass: false,
      reason: `Junior title: "${job.title}"`,
      suggestedStage: 'draft',
    };
  }

  return { pass: true };
}

Filter 3: Salary floor

If a salary is explicitly listed and it falls below the senior-tech market floor for the role's location, the job is moved to draft. This catches both data-entry errors and roles that are systematically off-ICP.

// rules/job-filters/salary.ts

const SALARY_FLOORS_USD: Record<string, number> = {
  'eng_us':       150_000,
  'eng_us_sf':    180_000,
  'eng_us_nyc':   170_000,
  'gtm_us':       120_000,
  'gtm_us_sf':    140_000,
  'gtm_us_nyc':   130_000,
};

export function passesSalaryFilter(job: Job): FilterResult {
  if (job.salary_max == null || job.salary_max === 0) {
    return { pass: true };  // unknown salary, defer to other filters
  }

  const floorKey = computeFloorKey(job.function, job.location_metro);
  const floor = SALARY_FLOORS_USD[floorKey] ?? 100_000;

  if (job.salary_max < floor) {
    return {
      pass: false,
      reason: `Below floor: $${job.salary_max} < $${floor} (${floorKey})`,
      suggestedStage: 'draft',
    };
  }

  return { pass: true };
}

Filter 4: Category

Some jobs come tagged in categories Refery does not place for: marketing-only roles, customer support, finance/accounting, HR generalists. The category filter is a deny-list.

const OFF_CATEGORY_INDICATORS = [
  'recruiter', 'talent acquisition partner', 'sourcer',
  'social media manager', 'content marketing manager',
  'customer support specialist', 'cs rep',
  'accountant', 'bookkeeper',
  'office manager', 'executive assistant',
  'hr generalist', 'people ops coordinator',
];

Composition

The filters are composed sequentially. The first failure short-circuits the chain and assigns the suggested stage.

// rules/job-filters/index.ts

const FILTER_PIPELINE = [
  passesGeographyFilter,
  passesSeniorityFilter,
  passesSalaryFilter,
  passesCategoryFilter,
] as const;

export function classifyJob(job: Job): JobClassification {
  for (const filter of FILTER_PIPELINE) {
    const result = filter(job);
    if (!result.pass) {
      return {
        stage: result.suggestedStage,
        reason: result.reason,
        filterFailed: filter.name,
      };
    }
  }
  return { stage: 'open', reason: null, filterFailed: null };
}

This entire pipeline runs in microseconds per job. There is no LLM call. There is no API call. Approximately 60-70% of newly-ingested jobs are correctly classified by these rules alone, reserving LLM compute for the remaining genuinely ambiguous cases.

Tier-based outreach waterfall

When Refery wants to reach a prospective client company about a candidate, the question is not "should we reach out" but "who at this company is the right contact."

The answer depends on the company's stage, the role's seniority, and the relationship Refery already has with that company. Refery encodes this as a five-tier waterfall (1차 through 5차) that auto-shifts upward when higher tiers are missing or unreachable.

The five tiers

Tier	Role pattern	When to use
1차	CEO / Founder	Default for early-stage (pre-seed, seed). Founders make hiring decisions.
2차	Other co-founder	Fallback if CEO not reachable; specifically when role is in the co-founder's domain
3차	CTO / Head of Engineering / VP Eng	Default for engineering roles at Series A+
4차	Head of Talent / People / Recruiting	Use only when explicitly recruiting-focused or when company has dedicated hiring leadership
5차	Other senior contact (operator, BD, GM)	Last resort; only if all above tiers are blocked or unresponsive

Multi-round campaign logic

The waterfall is the basis for multi-round outreach campaigns. Round 1 hits the 1차 contact across all relevant companies. Companies that do not reply within the response window enter Round 2, which targets the 2차 contact, and so on.

// outreach/waterfall.ts

type Tier = '1차' | '2차' | '3차' | '4차' | '5차';

const TIER_ORDER: Tier[] = ['1차', '2차', '3차', '4차', '5차'];

interface CompanyContact {
  email: string;
  name: string;
  role: string;
  tier: Tier;
}

export function selectContactForRound(
  companyContacts: CompanyContact[],
  round: number,
  alreadyContacted: Set<string>  // emails contacted in prior rounds
): CompanyContact | null {
  const targetTier = TIER_ORDER[round - 1];
  if (!targetTier) return null;

  // Find a contact at the target tier who has not yet been contacted
  const candidate = companyContacts
    .filter(c => c.tier === targetTier)
    .find(c => !alreadyContacted.has(c.email));

  if (candidate) return candidate;

  // Auto-shift: if no contact at target tier, walk down to next tier
  for (let i = round; i < TIER_ORDER.length; i++) {
    const fallbackTier = TIER_ORDER[i];
    const fallback = companyContacts
      .filter(c => c.tier === fallbackTier)
      .find(c => !alreadyContacted.has(c.email));
    if (fallback) return fallback;
  }

  // Auto-shift: walk up if necessary (rare)
  for (let i = round - 2; i >= 0; i--) {
    const fallbackTier = TIER_ORDER[i];
    const fallback = companyContacts
      .filter(c => c.tier === fallbackTier)
      .find(c => !alreadyContacted.has(c.email));
    if (fallback) return fallback;
  }

  return null;
}

This logic looks simple, but the operational consequence is significant: a candidate who has 30 well-matched companies on the live board can be moved through a 3-round outreach campaign in two weeks, with each round automatically targeting the right person at the right level. The alternative ("manually pick a contact at each company") is what a human recruiter spends hours doing.

Logging

Every outreach action is logged to a dedicated table for audit and for retraining the contact-quality scoring model.

CREATE TABLE outreach_log (
  id                uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  candidate_id      uuid NOT NULL REFERENCES candidates(id),
  company_id        uuid REFERENCES companies(id),
  contact_email     text NOT NULL,
  contact_tier      text NOT NULL,
  round_number      int NOT NULL,
  channel           text NOT NULL CHECK (channel IN ('email', 'linkedin')),
  gmail_thread_id   text,
  subject           text,
  sent_at           timestamptz NOT NULL DEFAULT now(),
  reply_received    boolean DEFAULT false,
  reply_received_at timestamptz,
  outcome           text  -- 'positive_reply', 'negative_reply', 'no_reply'
);

CREATE INDEX idx_outreach_candidate ON outreach_log(candidate_id);
CREATE INDEX idx_outreach_company ON outreach_log(company_id);
CREATE INDEX idx_outreach_sent_at ON outreach_log(sent_at DESC);

This log becomes a training signal for which contacts at which companies actually reply, which is fed back into the contact-quality scoring used by future waterfalls.

The hard-block compliance layer

The blacklist is not a recommendation. It is enforced at the SQL layer, in every query that touches candidates or companies. There is no application code path that can accidentally bypass it.

Schema

-- candidates table has a do_not_contact column
ALTER TABLE candidates
  ADD COLUMN do_not_contact boolean NOT NULL DEFAULT false;

-- companies table has the same column
ALTER TABLE companies
  ADD COLUMN do_not_contact boolean NOT NULL DEFAULT false;

-- An index for fast filtering
CREATE INDEX idx_candidates_dnc ON candidates(do_not_contact) WHERE do_not_contact = true;
CREATE INDEX idx_companies_dnc ON companies(do_not_contact) WHERE do_not_contact = true;

Row-level security policy

The strictest enforcement happens at the RLS layer. Application code that uses the standard authenticated Supabase client cannot read do_not_contact = true rows by default; the policy excludes them.

-- Example RLS policy on candidates
CREATE POLICY "exclude_blacklisted_candidates" ON candidates
  FOR SELECT
  TO authenticated
  USING (do_not_contact = false OR auth.uid() = '<admin-uuid>');

This means a developer writing a new feature cannot accidentally surface a blacklisted candidate. The database itself refuses to return them.

Why this matters

Blacklists in most platforms are advisory: a flag in the application that gets checked in some queries and not others. Eventually, a feature ships that forgets the check, and a blacklisted contact gets surfaced. Refery's architecture makes this structurally impossible, because the rule is enforced at the database, not in application code.

Why these rule engines matter

Each of these engines handles a class of decision that, if punted to an LLM, would consume material compute and introduce non-determinism. Encoded as rules, they are:

Free. Rule evaluation is microseconds, not seconds.
Deterministic. The same input always produces the same output. Test once, trust forever.
Auditable. A rule's behavior is fully described by its source code.
Composable. Rules can be added, removed, or reordered without touching the rest of the system.
Cheap to extend. A new filter is dozens of lines of code, not a fine-tuning run.

This is the unfashionable engineering choice. The fashionable choice in 2026 is to throw an LLM at every problem. Refery's bet is the opposite: throw an LLM at problems that genuinely require LLM reasoning, and use rules for everything else. The numbers in chapter 07 describe exactly what that bet costs and what it earns.

Why this is novel

Database-enforced compliance. Most platforms enforce blacklists in application code. RLS-enforced exclusions make accidental bypass structurally impossible.
Cheapest-first filter pipeline. Geography (microseconds) before salary (microseconds) before LLM ICP classifier (seconds). The architecture is built around cost gradient.
Auto-shifting tier waterfall with multi-round logic. This encodes recruiting know-how directly. Most outreach tools assume a single contact per company; the waterfall is a structured policy.
Append-only outreach log feeding a training signal. The system gets smarter with every campaign.

These rule engines are unglamorous. They are also the largest single contributor to Refery's cost-per-decision advantage over comparable AI recruiting platforms.