arf.io / ARF / Agent Steering / ARF — Autonomous Request Filter · Agent Router & Filter
ARF · Autonomous Request Filter · Agent Router & Filter

What the agent sees
is what you allow
it to see.

The Autonomous Request Filter intercepts every message inbound from the runner, outbound from the model and gives you precise control over what goes through, what gets modified, and what gets blocked. No code changes. No agent awareness. Just the proxy, doing its job.

The Interception Model

Four interception points.
Full message control.

RUNNER
any AI coding CLI
① INBOUND
prompt intercept
ARF
FILTER
② OUTBOUND REQ
modified request
ENGINE
any model API
③ INBOUND RESP
completion intercept
ARF
FILTER
Inbound prompt rewrite, augment, block
Outbound request credential inject, header modify
Inbound completion filter, mid-stream trip
Outbound response rewrite before runner sees it
Inbound Steering

Add context.
Add constraints.

Inbound steering modifies the prompt before it reaches the model. You can inject additional context, prepend constraints, append project-specific coding standards, or strip out content you don't want the model to act on.

This is particularly powerful for teams: define a system prompt injection in your ARF policy that automatically adds your team's coding conventions to every request without every developer having to remember to include it themselves. The agent sees it; the developer doesn't have to type it.

Inbound steering rules are evaluated in order. Each rule can match on message content, session context, user identity, or time-based conditions. Rules can modify, augment, or reject messages.

# Inbound steering rules [[steering.inbound]] # Always inject team conventions name = "inject-conventions" action = "prepend_system" content = """ You are working in a TypeScript codebase. Always use strict null checks. Prefer functional patterns over mutation. All async functions must handle errors. """ [[steering.inbound]] # Block requests mentioning prod credentials name = "block-cred-exposure" match = "(?i)(production|prod).{0,20}(key|secret|pass)" action = "block" message = "Production credential references are not allowed." [[steering.inbound]] # Augment security-related queries name = "security-context" match_tags = ["security"] action = "append" content = "Consider OWASP Top 10. Flag any potential injection risks."
Outbound Filtering

Catch it before
the agent acts on it.

Outbound filtering evaluates model completions before they're returned to the runner. ARF reads the completion stream in real time. If a completion violates policy contains a disallowed code pattern, references a forbidden path, includes suspicious tool call arguments the stream is interrupted.

Interrupted completions are logged, the session health grade is decremented, and (depending on policy) a human approval prompt is surfaced. The runner sees a clean error response; it can retry with a modified approach.

This is your last line of defense before the agent takes action. The Autonomous Request Filter sees every tool call the model proposes before it's executed.

arf · steering · live filter
── Completion stream from engine ──────────────

chunk[1]: I'll update the config file...
chunk[2]: tool_use: bash
          args:
            cmd: "rm -rf /etc/nginx/conf.d/*"

── Filter evaluation ──────────────────────────
 MATCH: outbound.deny_pattern
  rule: block-destructive-ops
  pattern: rm -rf.*/(etc|var|usr|bin)
  action: BLOCK + INTERRUPT

── Response to runner ─────────────────────────
HTTP 451 Unavailable For Policy Reasons
{
  "error": "completion_blocked",
  "rule": "block-destructive-ops"
}

 Session grade: B → C  (policy violation logged)
Prompt Injection Defense

The proxy is not
the agent's friend.

"Prompt injection is the sleight of hand of the AI era: adversarial content in the environment that hijacks the agent's actions. ARF doesn't trust the content the agent is processing any more than it trusts the agent itself."
Detection

ARF monitors inbound content files the agent reads, tool call results, web page contents returned to the agent for injection signatures. Common patterns: instruction overrides ("Ignore previous instructions"), role jailbreaks, and credential exfiltration attempts hidden in data.

  • Pattern matching against known injection signatures
  • Semantic similarity detection for novel injection variants
  • Alert on context-anomalous tool call sequences
Response

When injection is detected, ARF's options range from logging-only to full session halt. Configure the response per rule: sanitize the injected content before it reaches the model, flag for human review, or block the request and alert immediately.

  • Sanitize: strip injection content, allow request to proceed
  • Flag: log and continue, but alert the operator
  • Block: reject the request, force human approval
  • Halt: trip the circuit breaker immediately