main·5006ce4·1m ago

AI / Gen AI Engineer

Building intelligent systems with retrieval-augmented generation, agentic architectures, and Model Context Protocol (MCP) servers that any AI client can call. This section demonstrates an MCP server fronting twelve tools, RAG pipelines with evaluation, vector search, and tool-using agents — built with FastAPI, Qdrant, Ollama, and Go, deployed on Kubernetes.

Prometheus scrapes every AI service and streams metrics to a live Grafana dashboard.

MCP Server

The Go ai-service exposes twelve tools to any Model Context Protocol client over HTTPS. Built on the official modelcontextprotocol/go-sdk, it fronts both the ecommerce backend (REST + gRPC) and a Python RAG pipeline (HTTP, circuit breaker, OTel trace propagation). Authentication is optional: catalog and knowledge-base tools work anonymously; cart, order, and return tools require a Bearer JWT.

Architecture

External MCP clients connect over the HTTPS Streamable transport. The Go server enforces optional JWT auth, then routes tool calls to either the ecommerce backend or the Python RAG bridge. The bridge uses a circuit breaker with a 30-second timeout and propagates OTel trace context across the language boundary.

Tool Catalog

The MCP server exposes twelve tools across four domains. Catalog and knowledge-base tools are public; order, cart, and return tools require a Bearer JWT. Knowledge-base tools call the Python RAG pipeline through a circuit-breaker HTTP bridge with a 30-second timeout. Checkout (place_order) is deliberately excluded — the agent can advise but not transact.

Internal MCPs for Engineering Operations

These MCPs are Codex-facing control surfaces over larger engineering systems. The route stays focused on what the MCP layer adds: bounded tool access, evidence gathering, repeatable workflows, and links into the deeper platforms behind each service.

Read-only production evidence

Observability MCP

A local MCP endpoint over the metrics, logs, traces, dashboard, and runbook layers. It turns operational questions into bounded evidence bundles for system health, checkout incidents, AI pipeline failures, eval runs, streaming analytics, service logs, and trace lookup without mutating the cluster.

PrometheusLokiJaegerGrafana5 dashboards16 alert rules
Representative tools
get_system_healthinvestigate_checkoutinvestigate_ai_pipelineinvestigate_eval_runinvestigate_streaming_analyticsget_service_evidencesearch_logsget_trace
See the full observability platform →
RAG experiment control plane

Eval MCP service

An operator MCP for repeatable RAG evaluation work. It coordinates evaluator datasets, run history, study records, conclusions, collection metadata from ingestion, retrieval settings, ranking comparisons, worst-case review, and the eval-run evidence handoff into the observability control plane.

Eval APIRAG collectionsdataset fixturesevaluation runsexperimentsreranktop_k
Representative tools
start_eval_runwait_for_eval_runcompare_eval_runsget_worst_eval_casesget_rag_collection_configrecord_eval_experiment_conclusionsummarize_eval_experiment
Open the RAG evaluation workflow →
Structured practice feedback

QA MCP

A compact internal MCP for structured practice sessions, expected answer feedback, weak-topic tracking, review attempts, and scoring. It keeps interview preparation in the same tool-call workflow without competing with the production observability and RAG eval control planes.

Example tool-call trace
Why did checkout stall in QA?
observability.investigate_checkout
observability.get_trace
Payment was created, but the order saga never observed completion. The trace points to a RabbitMQ reply timeout.
Why this matters
  • Turns production questions into bounded evidence requests.
  • Keeps practice feedback and weak-topic tracking in the same workflow.
  • Makes RAG changes measurable before they are treated as improvements.

Try it interactively

The same tool registry powers an in-browser agent loop on the Go section. The agent runs Qwen 2.5 14B locally and streams tool calls and results live.

Connect your own client

The MCP server is publicly reachable at https://api.kylebradshaw.dev/ai-api/mcp. Public tools (catalog search, RAG search, list_collections) work without auth. Auth-scoped tools require a Bearer JWT — register at /go/register, log in, and copy the access token from the Authorization header in DevTools.

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "kyle-portfolio": {
      "transport": "http",
      "url": "https://api.kylebradshaw.dev/ai-api/mcp"
    }
  }
}

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.kyle-portfolio]
transport = "http"
url = "https://api.kylebradshaw.dev/ai-api/mcp"

MCP Inspector

Browse and invoke tools directly:

npx @modelcontextprotocol/inspector https://api.kylebradshaw.dev/ai-api/mcp

View source on GitHub →

RAG Evaluation

A measurement tool for systematically tracking RAG pipeline quality. Create golden datasets with expected answers, run first-party RAG quality evaluations against the live pipeline, and view scorecards with per-query breakdowns — faithfulness, answer relevancy, context precision, and context recall.

What It Demonstrates

  • First-party RAG quality evaluator with LLM-judged metrics
  • Cross-service JWT authentication (Go auth → Python eval)
  • Async evaluation with polling for long-running LLM judge calls
  • Golden dataset management for repeatable quality tracking
Try RAG Evaluation →

Document Q&A Assistant

A full-stack Retrieval-Augmented Generation (RAG) application that lets users upload PDF documents and ask questions about their content. The system parses, chunks, and embeds documents into a vector database, then retrieves relevant context to generate accurate, grounded answers using a local LLM.

Tech Stack

  • FastAPI microservices (ingestion + chat)
  • Qdrant vector database
  • Ollama with Mistral 7B (chat) and nomic-embed-text (embeddings)
  • Next.js + TypeScript + shadcn/ui frontend
  • Minikube Kubernetes deployment (production), Docker Compose (local dev)
  • CI/CD with GitHub Actions, security scanning, E2E tests

How It Works

Try the Demo →

Debugging Assistant

An agentic debugging tool that indexes a Python codebase into a vector store and uses a ReAct-style agent loop to search the code, retrieve relevant context, and stream a grounded diagnosis of the described bug.

Tech Stack

  • FastAPI debug service (index + agent endpoints)
  • Qdrant vector database (per-project collections)
  • Ollama with Qwen 2.5 14B (agent reasoning) and nomic-embed-text (embeddings)
  • LangChain Python-aware text splitter
  • SSE streaming for real-time agent event output
  • Minikube Kubernetes deployment (production)

What It Demonstrates

  • Agentic tool-use loop (ReAct pattern) with a local LLM
  • Language-aware code chunking for higher-quality retrieval
  • Named SSE events for streaming structured agent state
  • Per-request Qdrant collections for isolated debug sessions

How It Works

Try the Debug Demo →