CI/CD Integration — Agent Mastered

Your GitHub Action with Claude works perfectly — until a rate limit hits mid-review, the step fails, and the entire pipeline blocks for 20 minutes. Running Claude in CI means planning for failures you don’t see in interactive mode.

Claude Code runs non-interactively in CI/CD pipelines for automated code review, test generation, documentation, and more. The official GitHub Action provides turnkey integration, while the CLI’s --print mode enables custom pipeline workflows with explicit cost caps and retry logic.

This is a comprehensive reference chapter. If you just want to get CI running, start with the GitHub Action section below — you can have code review in CI in under 5 minutes. Come back to retry patterns and resilience pipelines when you need them.

The patterns here come from dealing with three failure modes that don’t show up in local development: rate limits, model overload, and budget exhaustion hitting mid-pipeline.

GitHub Action

The anthropics/claude-code-action@v1 action wraps the CLI with sensible CI defaults. It auto-detects whether it is running on a PR or issue event, injects the diff context, and posts Claude’s response as a comment.

GitHub Action — PR Review Workflow

$ # .github/workflows/claude-review.yml

$ cat .github/workflows/claude-review.yml

name: Claude PR Review on: pull_request: types: [opened, synchronize] jobs: review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: anthropics/claude-code-action@v1 with: prompt: | Review this PR for: 1. Security vulnerabilities 2. Performance issues 3. Code style violations anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} max_budget_usd: 1.00

The action handles checkout, context injection, and comment posting automatically. You supply a prompt, your API key, and an optional budget cap. For more control, you can install the CLI directly and run it yourself.

Custom CLI Pipeline in Actions

$ # Custom CLI pipeline in GitHub Actions

$ npm install -g @anthropic-ai/claude-code

added 1 package in 4s

$ RESULT=$(claude -p "Review changes for bugs" \ --output-format json \ --max-budget-usd 0.50 \ --no-session-persistence \ --permission-mode bypassPermissions)

$ echo "Cost: $(echo "$RESULT" | jq -r '.total_cost_usd')"

Cost: 0.016079

CI Flags

Every CI invocation needs a specific set of flags to run safely in a non-interactive environment. Here is the breakdown.

Essential CI Flags

Flag	Purpose	Why It Matters in CI
`-p / —print`	Headless mode	Required — no interactive UI in CI
`—output-format json`	Machine-parseable output	Lets you extract cost, session ID, and result with `jq`
`—max-budget-usd N`	Cost cap per invocation	Prevents runaway spend — costs can vary 10x for the same prompt
`—no-session-persistence`	Don’t save session to disk	Keeps ephemeral CI runners clean — no leftover session files
`—permission-mode bypassPermissions`	No interactive prompts	Without this, the CLI hangs waiting for TTY input and eventually times out
`—strict-mcp-config`	Fail if any MCP server can’t connect	Important for CI: Without it, MCP configuration failures are silently ignored — your pipeline thinks everything is connected when it isn’t
`—fallback-model MODEL`	Use alternate model on overload	Only triggers on HTTP 429 (rate limiting), not other failures. Don’t rely on it as general error recovery
`—from-pr URL`	Resume session linked to a PR	Requires `GITHUB_TOKEN` in the environment

The minimum viable CI invocation combines four of these flags:

claude -p "Review this code for bugs" \
  --output-format json \
  --max-budget-usd 0.50 \
  --no-session-persistence \
  --permission-mode bypassPermissions

▸ Try This

Add budget protection to a CI command and see what happens:

claude -p “Review this PR for security issues” —max-budget-usd 0.50 —output-format json | jq ‘{subtype, cost: .total_cost_usd}’

Did the review complete within budget? What would your script do if subtype came back as “error_max_budget_usd”? Plan for that case now — before it happens in production.

Custom Pipelines

Beyond GitHub Actions, you can embed Claude in any CI system — GitLab CI, Jenkins, CircleCI, or a plain bash script. The pattern is the same everywhere: capture JSON output, parse with jq, and act on the result.

Cost-Capped PR Review:

claude -p "Review the changes in this PR" \
  --from-pr "$PR_URL" \
  --output-format json \
  --max-budget-usd 0.50 \
  --no-session-persistence \
  --permission-mode bypassPermissions

Batch File Processing:

for file in src/**/*.ts; do
  claude -p "Add JSDoc comments to $file" \
    --output-format json \
    --max-budget-usd 0.10 \
    --no-session-persistence \
    --permission-mode bypassPermissions
done

Each invocation is fully isolated — no session state leaks between files.

Plan-Review-Execute in CI:

A three-step workflow: generate a plan, post it for human review, then execute on approval.

# Step 1: Generate plan
PLAN_RESULT=$(claude -p "Fix all lint errors in src/" \
  --permission-mode plan \
  --output-format json \
  --max-budget-usd 0.50)

# Step 2: Post plan as PR comment
PLAN=$(echo "$PLAN_RESULT" | jq -r \
  '.permission_denials[] | select(.tool_name == "ExitPlanMode") | .tool_input.plan')
gh pr comment "$PR_NUMBER" --body "$PLAN"

# Step 3: On approval, execute
SESSION=$(echo "$PLAN_RESULT" | jq -r '.session_id')
claude -p "yes, proceed" \
  --resume "$SESSION" \
  --permission-mode bypassPermissions

In plan mode, Claude’s proposed changes appear as permission_denials with tool_name: "ExitPlanMode". The plan text lives in the tool_input.plan field.

CI Response Payload

Here is a real response from a CI run with --no-session-persistence, --max-budget-usd 0.50, and --output-format json.

CI Run — Budget Cap + No Persistenceartifacts/14/ci_flags_test.json

2 "type": "result",

3 "subtype": "success",← A

4 "is_error": false,

5 "duration_ms": 1513,

6 "duration_api_ms": 1497,

7 "num_turns": 1,

8 "result": "4",← B

9 "stop_reason": "end_turn",← E

10 "session_id": "94d02fdc-9f01-4b5c-8bc9-8f08e684d8a3",← C

11 "total_cost_usd": 0.016079,← D

12 "usage": {

13 "input_tokens": 3,

14 "cache_creation_input_tokens": 1410,

15 "cache_read_input_tokens": 14253,

16 "output_tokens": 5,

17 "server_tool_use": {

18 "web_search_requests": 0,

19 "web_fetch_requests": 0

20 },

21 "service_tier": "standard"

22 },

23 "modelUsage": {

24 "claude-opus-4-6": {

25 "inputTokens": 3,

26 "outputTokens": 5,

27 "cacheReadInputTokens": 14253,

28 "cacheCreationInputTokens": 1410,

29 "costUSD": 0.016079,

30 "contextWindow": 200000,

31 "maxOutputTokens": 32000

32 }

33 },

34 "permission_denials": [],

35 "fast_mode_state": "off"

36}

AAlways check this first in CI — false means the call succeeded

BThe actual output text — extract with jq -r '.result'

CPresent even with --no-session-persistence, but not resumable after exit

DAggregate this across pipeline steps for total CI run cost

Eend_turn means Claude finished naturally — watch for max_tokens if truncated

Key fields to parse in your CI scripts:

is_error / subtype — check for "error_max_budget_usd" to detect budget exceeded
total_cost_usd — aggregate across pipeline steps for total CI run cost
result — the actual output text
session_id — present even with --no-session-persistence (valid during process lifetime only)
stop_reason — "end_turn" means normal completion; other values indicate interruption

Parsing the Result in Bash

#!/usr/bin/env bash
set -euo pipefail

RESULT=$(claude -p "Review this diff for bugs" \
  --output-format json \
  --max-budget-usd 0.50 \
  --no-session-persistence \
  --permission-mode bypassPermissions)

# Check for budget exceeded
SUBTYPE=$(echo "$RESULT" | jq -r '.subtype // ""')
if [ "$SUBTYPE" = "error_max_budget_usd" ]; then
  echo "::error::Claude exceeded budget cap"
  exit 1
fi

# Extract result and cost
REVIEW=$(echo "$RESULT" | jq -r '.result')
COST=$(echo "$RESULT" | jq -r '.total_cost_usd')

echo "Review cost: $COST"
echo "$REVIEW"

The ::error:: prefix is GitHub Actions syntax for surfacing errors in the workflow summary. Replace it with your CI system’s equivalent.

`--from-pr` PR Review Workflow

The --from-pr flag injects PR context into a session, enabling automated code review:

# Review a PR by number (must be in the correct git repo)
claude -p "Review this PR for security issues" \
  --from-pr 42 \
  --output-format json \
  --max-budget-usd 0.25 \
  --permission-mode bypassPermissions

# Review by full URL
claude -p "Summarize what this PR changes" \
  --from-pr "https://github.com/org/repo/pull/42" \
  --output-format json

Key behaviors:

Both PR number and URL formats are accepted syntactically
Not a data pre-loader — --from-pr signals to the model that a PR review is requested. The model then uses tools (gh pr view, gh pr diff, git log) to investigate. This means it requires tool permissions and sufficient budget for multiple turns
Budget consideration — PR reviews are multi-turn operations. A $0.10 budget may not be sufficient for large PRs. Use $0.25+ for thorough reviews
Graceful degradation — invalid PR numbers and non-git directories produce helpful messages, not crashes
GitHub auth — for private repos, the gh CLI must be authenticated via GITHUB_TOKEN

Tip

For CI/CD PR review pipelines, combine —from-pr with —output-format json and parse the result field: claude -p “Review for security” —from-pr $PR_NUMBER —output-format json | jq -r ‘.result’

→ Now Do This

Add —max-budget-usd 1.00 —output-format json to your CI Claude command. Parse subtype in the response — if it’s “error_max_budget_usd”, log a warning instead of failing the pipeline. Budget protection in CI prevents surprise bills without breaking your workflow.

CI/CD Resilience Patterns

Your CI pipeline with Claude works 95% of the time. The other 5%, the API returns a 529, your step fails, and the entire PR workflow blocks for 20 minutes while developers wait. Production pipelines need retry logic, fallback models, and graceful degradation — not just a Claude command.

CI pipelines fail. APIs return 500s, rate limits kick in, networks drop. The --fallback-model flag handles one failure mode (429 rate limiting), but production pipelines need explicit retry logic, error classification, and graceful degradation to stay reliable.

Retry Patterns

The --fallback-model flag only handles 429 overload errors. For other transient failures — network errors, timeouts, API 500s — you need explicit retry logic.

Bash Retry Function

retry_claude() {
    local max_retries=3 prompt="$1"; shift
    for attempt in $(seq 1 "$max_retries"); do
        if RESULT=$(claude -p "$prompt" "$@" 2>/dev/null); then
            if echo "$RESULT" | jq -e '.is_error != true' >/dev/null 2>&1; then
                echo "$RESULT"
                return 0
            fi
        fi
        echo "Attempt $attempt/$max_retries failed, retrying in $((attempt * 5))s..." >&2
        sleep "$((attempt * 5))"
    done
    echo "All $max_retries attempts failed" >&2
    return 1
}

The function validates two things on each attempt: that the CLI process exited successfully, and that the JSON response does not have is_error: true. Backoff increases linearly — 5 seconds, 10 seconds, 15 seconds.

Node.js Retry Function

For Node.js pipelines, the same pattern translates directly:

async function retryClaudeSync(args, maxRetries = 3) {
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            const output = execFileSync('claude', args, {
                encoding: 'utf-8', timeout: 120_000,
                env: { ...process.env, CLAUDECODE: '' }
            });
            const data = JSON.parse(output);
            if (!data.is_error) return data;
        } catch (err) { /* retry */ }
        if (attempt < maxRetries) {
            await new Promise(r => setTimeout(r, attempt * 5000));
        }
    }
    throw new Error(`All ${maxRetries} attempts failed`);
}

Advanced Retry with Error Classification

Not all errors deserve retries. Auth failures (401/403) will never succeed on retry. Budget errors mean the work is done but exceeded limits. Only transient failures benefit from retrying:

retry_claude_smart() {
    local max_retries=3 prompt="$1"; shift
    for attempt in $(seq 1 "$max_retries"); do
        RESULT=$(claude -p "$prompt" "$@" 2>&1) && {
            SUBTYPE=$(echo "$RESULT" | jq -r '.subtype // ""')
            case "$SUBTYPE" in
                error_max_budget_usd)
                    echo "Budget exceeded — not retrying" >&2
                    echo "$RESULT"; return 1 ;;
                success)
                    echo "$RESULT"; return 0 ;;
                *)
                    # Unknown subtype — check is_error
                    if echo "$RESULT" | jq -e '.is_error != true' >/dev/null 2>&1; then
                        echo "$RESULT"; return 0
                    fi ;;
            esac
        }
        # Check for auth failure (non-JSON exit)
        if echo "$RESULT" | grep -qi "unauthorized\|forbidden\|invalid.*key"; then
            echo "Auth failure — not retrying" >&2
            return 1
        fi
        echo "Attempt $attempt/$max_retries failed, retrying in $((attempt * 5))s..." >&2
        sleep "$((attempt * 5))"
    done
    echo "All $max_retries attempts failed" >&2
    return 1
}

This version distinguishes between retryable failures (network errors, 500s, timeouts) and permanent failures (auth errors, budget exceeded) to avoid wasting time and money on doomed retries.

Retry with Exponential Backoff

▸ Try This

Test the —fallback-model flag by deliberately triggering a rate limit scenario:

claude -p “Hello” —output-format json —fallback-model claude-sonnet-4-6 | jq ‘{model: .modelUsage | keys[0], cost: .total_cost_usd}’

Which model was used? When the primary model is rate-limited (429), Claude automatically falls back to the cheaper model. Does your pipeline handle both cost tiers?

Fallback Models

The --fallback-model flag provides a backup when the primary model is rate-limited:

claude -p "Review this PR" \
  --fallback-model claude-haiku-4-5-20251001 \
  --max-budget-usd 0.50

There is a critical limitation here. Fallback only triggers on HTTP 429 (rate limiting). It does not trigger on other failures:

Fallback Trigger Matrix

Failure Type	Triggers Fallback?	Your Mitigation
Rate limiting (429)	Yes	`—fallback-model` handles this automatically
API error (500)	No	Use retry logic with backoff
Timeout	No	Use retry logic with backoff
Auth failure (401/403)	No	Fix credentials — retries will not help

This means --fallback-model is not a general resilience mechanism. Combine it with the retry function above for full coverage.

Invalid Fallback Names Are Silently Accepted

The CLI does not validate fallback model names at startup. A typo like —fallback-model claude-haku-4-5 is silently accepted — the error only surfaces when a 429 triggers fallback and the model name is unreachable. Check modelUsage keys in the response to confirm which model actually served the request: Object.keys(data.modelUsage).

Full Resilience Pipeline

Combining retry logic, fallback models, error classification, and budget controls gives you a production-ready pipeline function:

#!/usr/bin/env bash
set -euo pipefail

# Production-grade CI wrapper
ci_claude() {
    local prompt="$1"
    local budget="${2:-0.50}"
    local max_retries="${3:-3}"

    for attempt in $(seq 1 "$max_retries"); do
        RESULT=$(claude -p "$prompt" \
            --output-format json \
            --max-budget-usd "$budget" \
            --fallback-model claude-haiku-4-5-20251001 \
            --no-session-persistence \
            --permission-mode bypassPermissions 2>/dev/null) && {

            SUBTYPE=$(echo "$RESULT" | jq -r '.subtype // ""')

            if [ "$SUBTYPE" = "error_max_budget_usd" ]; then
                echo "::warning::Budget exceeded on attempt $attempt" >&2
                echo "$RESULT"
                return 1
            fi

            if echo "$RESULT" | jq -e '.is_error != true' >/dev/null 2>&1; then
                # Log cost for aggregation
                COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
                echo "::notice::Claude call cost: \$$COST (attempt $attempt)" >&2
                echo "$RESULT"
                return 0
            fi
        }

        if [ "$attempt" -lt "$max_retries" ]; then
            DELAY=$((attempt * 5))
            echo "::warning::Attempt $attempt/$max_retries failed, retrying in ${DELAY}s..." >&2
            sleep "$DELAY"
        fi
    done

    echo "::error::All $max_retries attempts failed" >&2
    return 1
}

# Usage
REVIEW=$(ci_claude "Review this PR for security issues" 1.00 3)
echo "$REVIEW" | jq -r '.result'

This wrapper combines all the resilience patterns: retry with backoff, fallback model for rate limiting, budget cap enforcement, error classification, and GitHub Actions annotation output (::warning::, ::error::, ::notice::).

Error Handling in CI

Budget Exceeded Detection

SUBTYPE=$(echo "$RESULT" | jq -r '.subtype // ""')
if [ "$SUBTYPE" = "error_max_budget_usd" ]; then
  echo "::error::Claude exceeded budget cap"
  exit 1
fi

The ::error:: prefix is GitHub Actions syntax for surfacing errors in the workflow summary. Replace it with your CI system’s equivalent.

Gotcha

—max-budget-usd is checked between turns, not mid-generation. The first turn can exceed the budget by any amount. For Opus, system prompt cache creation alone costs ~$0.016, so setting the budget below that guarantees an error_max_budget_usd response with no useful output.

Missing API Key

Gotcha

If ANTHROPIC_API_KEY is missing or invalid, the CLI exits with a non-zero code but may not produce JSON output. Your pipeline should check the exit code before attempting to parse JSON, or the jq step will fail silently and the CI job may report success when it actually did nothing.

Hooks in Headless Mode

Tip

SessionStart and SessionStop hooks do not fire in —print mode. Only PreToolUse, PostToolUse, and Stop hooks fire in —print mode. If your audit logging depends on SessionStart, it will silently skip in CI.

Resilience Next Steps

With resilience patterns in place, your CI pipelines can handle transient failures without human intervention. For managing the cost of running these pipelines at scale, see Budget Controls and Cost at Scale.

→ Now Do This

Add —fallback-model claude-sonnet-4-6 to your CI Claude commands. This handles 429 rate limits automatically — Claude falls back to Sonnet instead of failing. Wrap the call in a retry loop for other error types: for i in 1 2 3; do RESULT=$(claude -p ”…” —output-format json —fallback-model claude-sonnet-4-6) && break; sleep 5; done.

GitHub Action

CI Flags

Essential CI Flags

Custom Pipelines

CI Response Payload

Parsing the Result in Bash

--from-pr PR Review Workflow

CI/CD Resilience Patterns

Retry Patterns

Bash Retry Function

Node.js Retry Function

Advanced Retry with Error Classification

Fallback Models

Fallback Trigger Matrix

Full Resilience Pipeline

Error Handling in CI

Budget Exceeded Detection

Missing API Key

Hooks in Headless Mode

Resilience Next Steps

`--from-pr` PR Review Workflow