main·5006ce4·1m ago

CI/CD Pipeline

A unified GitHub Actions pipeline built for a solo developer. One workflow file handles all quality checks, image builds, and deployments for three service stacks (Python, Java, Go) and a Next.js frontend. Designed to automate everything from code push to production deploy, with a QA environment for visual inspection before shipping.

Pipeline Flow

Three triggers, one workflow. Every code change follows the same path through quality gates before reaching production.

Why a Unified Workflow

I started with separate workflow files for each language stack — Python, Java, and Go each had their own CI pipeline. That was helpful early on for refining the specific checks I wanted per stack. But as a solo developer, I found that I very rarely stopped between stages. The separate workflows added maintenance overhead without adding real decision points.

Consolidating into a single workflow made the pipeline easier to reason about — both for me and for the Claude Code agents that drive most of the development. One file to maintain, one set of status checks to watch, and a single place to debug when something fails. All quality gates run unconditionally on every trigger, which is slower but catches cross-stack issues that path filtering would miss.

Since I'm the only one working on this project, there's no need to rigorously defend the QA branch. I push minor tweaks directly to qawithout feature branches — there's no one else to disrupt. The E2E staging checks still run on those direct pushes, so regressions get caught before deploy.

Trigger Matrix

JobPR to qaPush to qaPush to main
Quality checks
E2E staging checks
Build images
DeployQAProd
Smoke tests

Quality Gates

22 parallel jobs run on every trigger. All must pass before images are built.

Python

Ruff lint + format, pytest with coverage (ingestion, chat, debug), Bandit SAST, pip-audit

Java

Checkstyle, unit tests (4 services), integration tests with Testcontainers

Go

golangci-lint, go test -race (3 services), migration pipeline test

Frontend

ESLint, TypeScript type check, Next.js build, npm audit

Security

Gitleaks (secrets), Hadolint (Dockerfiles), CORS guardrail (no wildcard origins)

Infrastructure

K8s manifest validation (kubeconform + kind dry-run), Grafana dashboard sync, Compose smoke test

QA Environment

QA runs in the same Minikube cluster as production using separate Kubernetes namespaces. Kustomize overlays patch the base manifests to set QA-specific CORS origins, database names, and ingress hosts — without duplicating the manifests themselves.

Visit the QA environment →See the latest pre-prod build before it ships.
ProductionQA
ai-servicesai-services-qa
java-tasksjava-tasks-qa
go-ecommercego-ecommerce-qa

What's currently staged on QA

QA is caught up with production — latest work is live.

Image Tagging

All 10 service images are built in a single matrix job and pushed to GitHub Container Registry. QA images use a commit-pinned tag for traceability; production uses :latest.

# QA (push to qa branch)
ghcr.io/kabradshaw1/portfolio/ingestion:qa-abc1234

# Production (push to main branch)
ghcr.io/kabradshaw1/portfolio/ingestion:latest

Deploy Mechanism

GitHub Actions joins a Tailscale VPN to reach the home server, then deploys via SSH. Kustomize overlays are built on the runner and piped to the server via kubectl apply.

# CI runner joins Tailscale VPN
- uses: tailscale/github-action@v3

# Build overlay locally, apply remotely
kubectl kustomize k8s/overlays/qa/ | \
  ssh PC@100.79.113.84 "kubectl apply -f -"

# Restart deployments to pull new images
ssh PC@100.79.113.84 \
  "kubectl rollout restart deployment -n ai-services-qa"

Why No Branch Protection

This is a solo developer project. By the time code reaches main, it has passed all quality checks on the PR, been deployed to QA, and been visually inspected. Branch protection requiring PR approval would mean one person approving their own PR — ceremony with no value.

The real protection is the CI pipeline itself: 22 quality gates that run on every push. If any fail, the deploy doesn't happen.

Agent Automation

Since I'm the only developer on this project, there's no risk to letting Claude Code agents drive the workflow from spec to production. No one else's deployment gets disrupted, and I review every spec thoroughly before any code gets written.

The agents use a superpowers plugin that adds built-in quality gates throughout the workflow — automated spec self-review, code review agents, and verification-before-completion checks that require evidence before claiming work is done.

  1. 1. Spec: Kyle and Claude brainstorm the design together. Claude writes a spec, then self-reviews it for placeholders, contradictions, and ambiguity before presenting it. Kyle reviews the spec thoroughly — this is the main human checkpoint.
  2. 2. Plan: Once the spec is approved, Claude writes a detailed implementation plan to keep track of what it needs to do during execution.
  3. 3. Implement → PR: Agent creates a feature branch, implements the plan, and pushes a PR to qa. A code-review agent examines the work against the plan before the PR is created.
  4. 4. CI Watch:Agent monitors CI, fixes lint/format/config failures autonomously. Verification checks confirm the fix before claiming it's resolved.
  5. 5. QA Deploy: Kyle reviews PR, merges. QA deploys automatically.
  6. 6. Ship It: Kyle inspects QA, tells agent to ship. Agent merges to main, watches prod deploy, cleans up.

Pipeline Optimization

Adding a RAG evaluation service exposed several performance bottlenecks. The original eval service depended on a third-party evaluation framework, which pulled in 200+ transitive packages including LangChain. That single addition pushed the pipeline from a manageable ~10 minutes to 30+ minutes per run, with most of the time spent on redundant work. Here's how each bottleneck was diagnosed and fixed.

1. Virtualenv Caching

Problem: pip install ran from scratch on every CI run. For the eval service with its 200+ transitive dependencies, this took ~20 minutes — longer than all other checks combined.

Investigation:The GitHub Actions runner starts fresh each time, so there was no pip cache to reuse. The eval service's former third-party evaluation dependency tree made cold installs exceptionally slow.

Fix: Cache the entire .venv directory using actions/cache@v4, keyed on the hash of requirements.txt and shared/pyproject.toml. On cache hit, the install step is skipped entirely.

# Cache key
venv-{service}-{hash(requirements.txt, shared/pyproject.toml)}

# On cache hit → skip pip install entirely

Result: Eval tests went from 20 minutes → 20 seconds. pip-audit dropped from 20 minutes → 9 seconds.

2. Conditional Image Builds

Problem: All 11 service images were rebuilt on every push, even when only one service changed. A one-line fix to the chat service triggered builds for all Go, Java, and Python images.

Investigation: The build matrix had no path awareness — every matrix entry ran unconditionally. Most builds were wasted compute producing identical images.

Fix: Each matrix entry declares a paths field listing the directories that affect its image. A git diff HEAD~1 check at the start of each build job skips the build when none of those paths changed.

- service: chat
  paths: services/chat services/shared
- service: go-auth-service
  paths: go/auth-service go/pkg go/go.work

Result: A typical single-service change rebuilds 1 image instead of 11. Unchanged services are skipped in ~20 seconds.

3. Compose Smoke: Pull Instead of Build

Problem: docker compose up --build rebuilt all Python images from source in CI, spending ~10 minutes per service on pip install with no layer cache (fresh runner each time).

Investigation:The compose-smoke job existed to verify service configuration — env vars, nginx routing, health checks, inter-service connectivity. It didn't need freshly built images to test those things. Code correctness was already covered by unit tests.

Fix: Pull pre-built :latest images from GHCR instead of building from source. The smoke tests verify configuration, not code.

# Before: build from source (~15 min)
docker compose up --build

# After: pull pre-built images (~95 sec)
for svc in ingestion chat debug; do
  docker pull "ghcr.io/.../${svc}:latest"
done
docker compose up -d

Result: Compose smoke went from ~15 minutes → 95 seconds.

4. QA Deploy: Job Immutability Fix

Problem: QA deploys were failing entirely. The Go kustomize overlay includes migration Jobs, and Kubernetes Jobs are immutable — once created, their spec.template cannot be patched.

Investigation: When kustomize apply tried to update existing Jobs with a new image tag, Kubernetes rejected it with field is immutable. The Jobs had completed successfully on the previous deploy but were still present in the namespace.

Fix: Filter Jobs out of the kustomize output using awk, then handle them separately: delete the old Job, create the new one, wait for completion.

# Apply overlay without Jobs
kubectl kustomize k8s/overlays/qa-go/ \
  | awk '...filter out kind: Job...' \
  | kubectl apply -f -

# Run migrations sequentially
kubectl delete job go-auth-migrate --ignore-not-found
kubectl apply -f auth-service-migrate.yml
kubectl wait --for=condition=complete job/go-auth-migrate

Result: QA deploy went from failing → succeeding in 85 seconds.

5. Precise Change Detection

Problem: The path filter from #2 evolved with the codebase. The first version compared against HEAD~1, which silently missed rebuilds when a fix was pushed in a multi-commit batch (the diff only saw the final commit). A post-incident hardening widened it to HEAD~5. That fixed the missed-rebuild bug but introduced the opposite failure mode: once Go work landed in the last 5 commits, every subsequent push — including docs-only or frontend-only ones — re-ran the full Go test, lint, and image-build matrices.

Investigation: GitHub gives the exact range for every event: pushes carry github.event.before (the previous tip of the branch); PRs carry github.event.pull_request.base.sha (the merge base). Both are precise — HEAD~N was always a heuristic dressed up as a window. The same overshoot also hit the test and lint matrices, which never had path filtering at all and re-ran the full Python, Java, and Go test suites on every push regardless of what changed.

Fix: A composite action at .github/actions/check-changes picks the compare base based on the event type — push.before for pushes, PR base SHA for PRs, with HEAD~5 kept only as a fallback for first pushes and workflow_dispatch. Wired into 14 gated jobs: the original three (go-tests, go-lint, build-and-push-images) plus python-tests, java-unit-tests, java-integration-tests, frontend-checks, k8s-manifest-validation, go-migration-test, all three compose-smoke jobs, security-pip-audit, and security-hadolint. Every gated entry's paths: value includes ci.yml and the action's own action.yml, so a workflow refactor triggers every matrix entry — a safeguard against silent pipeline regressions.

- name: Check for changes
  id: changes
  uses: ./.github/actions/check-changes
  with:
    paths: services/chat services/shared
           .github/workflows/ci.yml
           .github/actions/check-changes/action.yml

# Subsequent steps gated on:
#   if: steps.changes.outputs.changed == 'true'

Result: A docs-only push now skips ~14 matrix entries in seconds. The pipeline failure mode shifted with each iteration: HEAD~1 silently missed rebuilds, HEAD~5 silently over-rebuilt, and the precise push-range / PR-base approach rebuilds exactly what changed.

Lesson: The first merge after extending the gate broke CI entirely. Every run showed “This run likely failed because of a workflow file issue” with an empty workflow graph — no jobs visible because nothing ever started. Locally, the pre-flight check used python -c "yaml.safe_load(...)", which passed because YAML accepts duplicate keys (last value wins). actionlint caught it instantly: a step had both the new gate if: steps.changes.outputs.changed == 'true' and an existing if: steps.venv-cache.outputs.cache-hit != 'true' — two if:keys on the same map, which GitHub's parser rejects. Validate workflow files with actionlint, not just a YAML parser. YAML's permissiveness is the wrong shape for CI configs.

Combined Impact

The five optimizations together reduced the pipeline from 30+ minutes to ~5 minutes on a typical push, and a docs-only or single-stack push now skips most of it entirely.

StageBeforeAfter
Python Tests (eval)20 min20 sec
pip-audit (eval)20 min9 sec
Compose Smoke~15 min95 sec
Image Builds (no change)~3 min each~20 sec (skipped)
Deploy QAfailing85 sec
Test/lint matrices (no change)always 1-3 min each~30 sec (skipped)
Total pipeline30+ min~5 min

Smoke Tests

After every deployment, automated smoke tests verify the services are healthy. QA runs health endpoint checks against qa-api.kylebradshaw.dev covering the Python AI, Java, and Go stacks — auth, products, cart, orders, payments, and the saga happy-path. Production runs Playwright tests against the live site — including an end-to-end RAG flow that uploads a PDF, asks a question, and verifies a streamed response.