Agent Steering ARF

The Interception Model

Four interception points.
Full message control.

RUNNER
any AI coding CLI

① INBOUND

prompt intercept

ARF
FILTER

② OUTBOUND REQ

modified request

ENGINE
any model API

③ INBOUND RESP

completion intercept

ARF
FILTER

① Inbound prompt rewrite, augment, block
② Outbound request credential inject, header modify
③ Inbound completion filter, mid-stream trip
④ Outbound response rewrite before runner sees it

Inbound Steering

Add context.
Add constraints.

Inbound steering modifies the prompt before it reaches the model. Inject context, prepend constraints, append project-specific coding standards, or strip out content you don't want the model to act on.

This is useful for teams. Define a system prompt injection in your ARF policy that adds your team's coding conventions to every request, so no developer has to remember to include them. The agent sees it; the developer doesn't type it.

Inbound steering rules are evaluated in order. Each rule can match on message content, session context, user identity, or time-based conditions. Rules can modify, augment, or reject messages.

# Inbound steering rules

[[steering.inbound]]
# Always inject team conventions
name = "inject-conventions"
action = "prepend_system"
content = """
You are working in a TypeScript codebase.
Always use strict null checks.
Prefer functional patterns over mutation.
All async functions must handle errors.
"""

[[steering.inbound]]
# Block requests mentioning prod credentials
name = "block-cred-exposure"
match = "(?i)(production|prod).{0,20}(key|secret|pass)"
action = "block"
message = "Production credential references are not allowed."

[[steering.inbound]]
# Augment security-related queries
name = "security-context"
match_tags = ["security"]
action = "append"
content = "Consider OWASP Top 10. Flag any potential injection risks."
        

Outbound Filtering

Catch it before
the agent acts on it.

Outbound filtering evaluates model completions before they reach the runner. ARF reads the completion stream in real time. If a completion violates policy (a disallowed code pattern, a reference to a forbidden path, suspicious tool call arguments), the stream is interrupted.

Interrupted completions are logged, the session health grade is decremented, and, depending on policy, a human approval prompt is surfaced. The runner sees a clean error response and can retry with a different approach.

This is your last line of defense before the agent acts. The Autonomous Request Filter sees every tool call the model proposes before it executes.

arf · steering · live filter

── Completion stream from engine ──────────────

chunk[1]: I'll update the config file...
chunk[2]: tool_use: bash
          args:
            cmd: "rm -rf /etc/nginx/conf.d/*"

── Filter evaluation ──────────────────────────
✗ MATCH: outbound.deny_pattern
  rule: block-destructive-ops
  pattern: rm -rf.*/(etc|var|usr|bin)
  action: BLOCK + INTERRUPT

── Response to runner ─────────────────────────
HTTP 451 Unavailable For Policy Reasons
{
  "error": "completion_blocked",
  "rule": "block-destructive-ops"
}

● Session grade: B → C  (policy violation logged)

Prompt Injection Defense

The proxy is not
the agent's friend.

"Prompt injection is the sleight of hand of the AI era: adversarial content in the environment that hijacks the agent's actions. ARF doesn't trust the content the agent is processing any more than it trusts the agent itself."

Detection

ARF monitors inbound content (files the agent reads, tool call results, web page contents returned to the agent) for injection signatures. Common patterns: instruction overrides ("Ignore previous instructions"), role jailbreaks, and credential exfiltration attempts hidden in data.

Pattern matching against known injection signatures
Semantic similarity detection for novel injection variants
Alert on context-anomalous tool call sequences

Response

When injection is detected, ARF's options range from logging-only to full session halt. Configure the response per rule: sanitize the injected content before it reaches the model, flag for human review, or block the request and alert immediately.

Sanitize: strip injection content, allow request to proceed
Flag: log and continue, but alert the operator
Block: reject the request, force human approval
Halt: trip the circuit breaker immediately

What the agent seesis what you allowit to see.

Four interception points.Full message control.

Add context.Add constraints.

Catch it beforethe agent acts on it.

The proxy is notthe agent's friend.

What the agent sees
is what you allow
it to see.

Four interception points.
Full message control.

Add context.
Add constraints.

Catch it before
the agent acts on it.

The proxy is not
the agent's friend.