Blog

  • How I Used WordPress as a Frontend for a Python FastAPI Backend

    WordPress has a reputation as a blogging platform. FastAPI has a reputation as a Python microservice framework. Neither reputation tells the full story — and together, they make a surprisingly capable stack for building AI-powered web applications.

    This post covers how I built a production-style integration between a WordPress frontend and a Python FastAPI backend, including how requests are secured, how the two systems communicate, and why I’d choose this combination again.

    You can see it running in my AI Integration Demo and the Healthcare Plan RAG Demo.


    Why This Combination?

    Most AI integrations I see go one of two routes: a bespoke React/Next.js frontend talking directly to an API, or a WordPress plugin that bundles everything together. Both have tradeoffs.

    The React approach gives you flexibility but means building and maintaining a full frontend stack. The plugin approach is fast but opaque — you’re at the mercy of someone else’s architecture and update cycle.

    WordPress as a frontend with a decoupled Python backend hits a different target:

    • WordPress handles content management, authentication, forms, caching, and the UI — things it’s genuinely good at
    • FastAPI handles AI logic, data processing, and anything that benefits from Python’s ecosystem (vector search, ML libraries, async I/O)
    • The two systems are loosely coupled — either can change independently

    The Architecture

    [ WordPress Frontend ]
      - Renders the UI (shortcode or block)
      - Handles form submission via JavaScript
      - Proxies requests to FastAPI through PHP
            |
            ▼
    [ WordPress PHP Proxy ]
      - Validates nonce
      - Enforces rate limiting
      - Injects private credentials
      - Forwards request to FastAPI
            |
            ▼
    [ FastAPI Backend (Python) ]
      - Validates auth key
      - Runs AI / RAG logic
      - Returns structured JSON
            |
            ▼
    [ WordPress returns response to browser ]

    The browser never communicates with FastAPI directly. WordPress acts as a secure intermediary — more on why that matters in The Right Way to Integrate AI into WordPress.


    The FastAPI Backend

    FastAPI is a natural fit for this role. It’s fast (async by default), self-documenting (automatic OpenAPI docs), and has excellent support for the Python libraries you’d use in an AI backend — Pydantic, SQLAlchemy, pgvector, LangChain, and so on.

    A minimal endpoint for the WordPress integration looks like this:

    from fastapi import FastAPI, Header, HTTPException
    from pydantic import BaseModel
    
    app = FastAPI()
    
    class QuestionRequest(BaseModel):
        question: str
    
    @app.post("/ask")
    async def ask(
        request: QuestionRequest,
        x_demo_key: str = Header(...)
    ):
        if x_demo_key != settings.DEMO_KEY:
            raise HTTPException(status_code=401, detail="Unauthorized")
    
        # Run RAG pipeline
        answer, citations = await run_rag_pipeline(request.question)
    
        return {
            "answer": answer,
            "citations": citations
        }

    The X-Demo-Key header is injected by the WordPress PHP proxy — it’s never in the browser. FastAPI validates it on every request before running any logic.


    The WordPress Side

    On the WordPress side, a custom shortcode renders the UI and a PHP AJAX handler manages the proxy:

    // Register the shortcode
    add_shortcode('mrobbieb_ai_demo', 'render_ai_demo');
    
    function render_ai_demo() {
        wp_enqueue_script('mrobbieb-demo', get_stylesheet_directory_uri() . '/demo.js', ['jquery'], null, true);
        wp_localize_script('mrobbieb-demo', 'mrobbiebDemo', [
            'ajaxUrl' => admin_url('admin-ajax.php'),
            'nonce'   => wp_create_nonce('mrobbieb_ai_nonce'),
        ]);
        return '
    '; } // AJAX handler — this is the proxy add_action('wp_ajax_nopriv_mrobbieb_ai_query', 'handle_ai_query'); function handle_ai_query() { check_ajax_referer('mrobbieb_ai_nonce', 'nonce'); // rate limiting, validation, wp_remote_post to FastAPI... // (see full implementation in the proxy pattern post) }

    The shortcode outputs a container div. The JavaScript sends requests to admin-ajax.php, not to FastAPI. WordPress handles everything in between.


    Deploying FastAPI Alongside WordPress

    The two services run independently. In my setup:

    • WordPress runs on a standard managed WordPress host
    • FastAPI runs on a separate VPS or cloud instance (a small DigitalOcean droplet or Railway deployment works fine for demos)
    • FastAPI is not publicly exposed — it’s behind a private URL or firewall rule, accessible only to the WordPress server’s IP

    For production, you’d run FastAPI behind a reverse proxy (nginx or Caddy), use HTTPS end-to-end, and lock down the FastAPI port to only accept traffic from the WordPress server.


    What This Stack Gets You

    • Full Python ecosystem for AI work — vector DBs, async I/O, ML libraries, structured outputs
    • WordPress content management — pages, users, forms, media, caching — without rebuilding any of it
    • Clean separation of concerns — UI and content in WordPress, logic and data in FastAPI
    • Independent deployability — update the AI backend without touching the WordPress site and vice versa
    • Automatic API docs — FastAPI generates OpenAPI documentation at /docs, useful for debugging and collaboration

    When to Use This Pattern

    This combination is a good fit when:

    • You want to add AI to an existing WordPress site without rebuilding the frontend
    • Your AI logic is complex enough to justify a dedicated Python service (RAG pipelines, multi-step agents, vector search)
    • You need the backend to evolve independently of the UI
    • Security matters — you want credentials and logic fully server-side

    It’s less appropriate if your AI integration is simple enough for a WordPress plugin, or if you’re starting from scratch and would rather use a unified stack.


    The full source for the energy-exchange project that uses this pattern is on GitHub. Questions or want to talk through your own setup? Get in touch.


    Building something like this?

    Get a free 30-minute architecture review or a written AI readiness audit — no commitment.


    Martin Baker

    Martin Baker — Solutions Architect specializing in AI, RAG systems, and WordPress engineering. 15+ years building systems that hold up under real business pressure.

    LinkedIn · GitHub · Get in touch

  • Building a RAG Pipeline That Actually Cites Its Sources

    Most RAG demos return an answer and call it done. Production systems can’t afford that. If an AI tells a user their insurance covers something it doesn’t, you need to know why it said that — and so does the user.

    Citation-grounded RAG is the difference between an AI that sounds confident and one that can be audited. This post walks through how I built a pipeline where every answer is traceable to an exact document, page number, and excerpt — based on my Healthcare Plan RAG Demo using FastAPI, PostgreSQL, and an LLM.


    Why Most RAG Pipelines Skip Citations

    The standard RAG pattern is:

    1. Embed the user’s query
    2. Retrieve the top-k similar chunks from a vector store
    3. Stuff those chunks into a prompt
    4. Return the LLM’s response

    That works for demos. It fails in production because you lose the connection between what the model said and where it came from. Chunks go in, an answer comes out, and there’s no way to verify it or explain it to a user.

    Adding citations requires a bit more structure at every stage — but it’s not complicated once you design for it from the start.


    Step 1: Store Metadata Alongside Embeddings

    Citations start at ingestion time. When you chunk your documents, each chunk needs to carry its provenance with it — not just its text and embedding vector.

    In my healthcare demo, each chunk stored in PostgreSQL includes:

    CREATE TABLE plan_chunks (
        id          SERIAL PRIMARY KEY,
        plan_id     TEXT NOT NULL,
        source_doc  TEXT NOT NULL,   -- e.g. "SBC_BlueCross_Gold.pdf"
        page_number INT,
        section     TEXT,            -- e.g. "Out-of-Network Emergency Care"
        chunk_text  TEXT NOT NULL,
        embedding   VECTOR(1536)
    );

    Every chunk knows which plan it belongs to, which document it came from, and which page. That metadata is retrieved alongside the chunk text — so you always know where your context came from before you even call the LLM.


    Step 2: Return Chunks with Their Metadata

    When you retrieve chunks at query time, don’t just pass the text to the LLM. Keep the full structured result:

    async def retrieve_chunks(query: str, plan_ids: list[str], top_k: int = 8):
        query_embedding = await embed(query)
        results = await db.fetch("""
            SELECT chunk_text, plan_id, source_doc, page_number, section,
                   1 - (embedding <=> $1) AS similarity
            FROM plan_chunks
            WHERE plan_id = ANY($2)
            ORDER BY embedding <=> $1
            LIMIT $3
        """, query_embedding, plan_ids, top_k)
        return [dict(r) for r in results]

    Each item in the returned list has both the text the LLM will read and the metadata you’ll include in the citation. Don’t discard the metadata before the LLM call — you need it after.


    Step 3: Tell the LLM to Cite Its Sources

    The prompt is where you enforce citation behavior. The key is to number the chunks and instruct the model to reference them explicitly — and to say so when the answer isn’t in the provided context.

    def build_prompt(question: str, chunks: list[dict]) -> str:
        context_blocks = "\n\n".join([
            f"[{i+1}] Plan: {c['plan_id']} | Doc: {c['source_doc']} | Page: {c['page_number']}\n{c['chunk_text']}"
            for i, c in enumerate(chunks)
        ])
        return f"""You are answering questions about health insurance plans.
    Use ONLY the excerpts below. For each claim, cite the excerpt number like [1], [2].
    If the answer varies by plan, say so explicitly.
    If the information is not present in the excerpts, say: "This information is not available in the provided plan documents."
    
    Excerpts:
    {context_blocks}
    
    Question: {question}
    Answer:"""

    This prompt does three things: constrains the model to the provided context, forces explicit citation markers, and gives it an honest escape hatch when the answer isn’t there.


    Step 4: Parse Citations from the Response

    After the LLM responds, extract the citation numbers and map them back to your chunk metadata:

    import re
    
    def extract_citations(answer: str, chunks: list[dict]) -> list[dict]:
        cited_indices = set(int(n) - 1 for n in re.findall(r'\[(\d+)\]', answer))
        return [
            {
                "plan_id":    chunks[i]["plan_id"],
                "source_doc": chunks[i]["source_doc"],
                "page":       chunks[i]["page_number"],
                "section":    chunks[i]["section"],
                "excerpt":    chunks[i]["chunk_text"][:300]
            }
            for i in cited_indices if i < len(chunks)
        ]

    The final API response then looks like this:

    {
      "answer": "Out-of-network emergency care is covered under all three plans [1][2], though cost-sharing varies. Plan A covers it at 80% after deductible [1], while Plan B requires a $500 copay [2].",
      "citations": [
        {
          "plan_id": "PLAN_A",
          "source_doc": "SBC_PlanA_2026.pdf",
          "page": 4,
          "section": "Emergency Care",
          "excerpt": "Emergency services provided by out-of-network providers are covered at 80% of allowed amount after the annual deductible is met..."
        },
        {
          "plan_id": "PLAN_B",
          "source_doc": "SBC_PlanB_2026.pdf",
          "page": 6,
          "section": "Out-of-Network Benefits",
          "excerpt": "Out-of-network emergency room visits require a $500 copayment, which is waived if admitted..."
        }
      ]
    }

    Step 5: Render Citations in the UI

    On the frontend, render the citations as an expandable accordion — visible but not overwhelming. The user sees the answer first. If they want to verify it, the source is one click away with the exact document, page, and excerpt.

    This is what I've built in the HCGov Demo — try asking "Is out-of-network covered for emergency care?" and expanding the citations panel to see exactly which plan documents the answer came from.


    Why This Matters for Production

    In 2026, the retrieval step is increasingly recognized as the critical bottleneck in RAG pipelines — not generation. The RAGAS evaluation framework measures faithfulness (does the answer match the retrieved context?) and citation quality as first-class metrics. Systems that can't be audited can't be trusted in high-stakes domains.

    Citation-grounded RAG is also the foundation for everything more advanced: user-defined weighting, personalized recommendations, explainable AI decisions. You can't build on top of a black box.


    Summary

    1. Store metadata at ingestion — every chunk should know its document, page, and section.
    2. Retrieve metadata alongside text — don't strip it before the LLM call.
    3. Prompt for explicit citation markers — number your chunks and tell the model to reference them.
    4. Parse and map citations back — extract citation numbers and attach the full source metadata.
    5. Surface citations in the UI — give users a way to verify what the AI told them.

    If you're building a RAG system that needs to hold up under real scrutiny, this pattern is the foundation. Happy to dig into any part of it — reach out via the contact page or explore the live demo.


    Building something like this?

    Get a free 30-minute architecture review or a written AI readiness audit — no commitment.


    Martin Baker

    Martin Baker — Solutions Architect specializing in AI, RAG systems, and WordPress engineering. 15+ years building systems that hold up under real business pressure.

    LinkedIn · GitHub · Get in touch

  • The Right Way to Integrate AI into WordPress: Server-Side Proxy Pattern

    Most tutorials on adding AI to WordPress get one thing critically wrong: they show you how to call an AI API from the browser. That means your API key — and your bill — are one browser inspection away from being someone else’s problem.

    With WordPress 7.0 shipping native AI infrastructure in May 2026 and security researchers immediately flagging API key exposure issues, this topic has never been more relevant. The correct pattern isn’t new, but it’s still widely misunderstood.

    This post explains the server-side proxy pattern: what it is, why it matters, and how I implemented it in a production-style WordPress + Symfony demo you can explore right now.


    The Problem: Browser-Side API Calls

    The naive approach to adding AI to a WordPress site looks something like this:

    // ❌ Don't do this
    fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer sk-YOUR-SECRET-KEY', // exposed in browser
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ ... })
    })

    Anyone who opens DevTools on your site can see that key. They can copy it, use it, and run up charges on your account. This isn’t theoretical — it happens constantly, and it’s exactly the kind of exposure that WordPress 7.0’s new AI Connectors API has already been criticized for enabling if not configured carefully.


    The Solution: WordPress as a Secure Server-Side Proxy

    The correct pattern keeps the browser out of the conversation with the AI entirely. Instead:

    Browser → WordPress (PHP, server-side) → AI Backend → Browser

    The browser never talks to the AI directly. WordPress acts as the gatekeeper — validating requests, enforcing rate limits, injecting credentials, and forwarding only what’s appropriate.

    What the proxy layer handles

    • Nonce validation — ensures the request came from your WordPress frontend, not a bot or external script
    • Rate limiting — prevents abuse (e.g. 5 requests per 5 minutes per IP)
    • Endpoint allowlisting — only specific backend routes can be reached through the proxy
    • Credential injection — the API key or demo key is added server-side, never sent to the browser
    • Response sanitization — you control what comes back to the client

    How I Built It: WordPress + Symfony

    In my demo implementation, the full request chain looks like this:

    [ Browser UI ]
          |
          ▼
    [ WordPress PHP (proxy layer) ]
      - Verifies nonce
      - Enforces rate limit (5 req / 5 min / IP)
      - Validates allowed endpoint path
      - Injects private demo key
          |
          ▼
    [ Symfony API Backend ]
      - Validates demo key
      - Runs AI / RAG logic
      - Returns structured JSON with citations
          |
          ▼
    [ WordPress returns response to browser ]

    The WordPress proxy in PHP

    The proxy is a WordPress AJAX handler. It uses wp_remote_post() — WordPress’s built-in HTTP client — rather than raw cURL, which keeps it portable and respects WordPress’s SSL and timeout configuration.

    add_action('wp_ajax_nopriv_mrobbieb_ai_query', 'mrobbieb_handle_ai_query');
    
    function mrobbieb_handle_ai_query() {
        // 1. Verify nonce
        check_ajax_referer('mrobbieb_ai_nonce', 'nonce');
    
        // 2. Rate limiting (transient-based per IP)
        $ip  = $_SERVER['REMOTE_ADDR'];
        $key = 'mrobbieb_rate_' . md5($ip);
        $hits = (int) get_transient($key);
        if ($hits >= 5) {
            wp_send_json_error(['message' => 'Rate limit exceeded. Try again shortly.'], 429);
        }
        set_transient($key, $hits + 1, 5 * MINUTE_IN_SECONDS);
    
        // 3. Validate and forward to backend
        $question = sanitize_text_field($_POST['question'] ?? '');
        $response = wp_remote_post(MROBBIEB_API_URL . '/ask', [
            'headers' => [
                'X-Demo-Key'   => MROBBIEB_DEMO_KEY, // never sent to browser
                'Content-Type' => 'application/json',
            ],
            'body'    => json_encode(['question' => $question]),
            'timeout' => 30,
        ]);
    
        // 4. Return sanitized response
        $body = json_decode(wp_remote_retrieve_body($response), true);
        wp_send_json_success($body);
    }

    Notice that MROBBIEB_API_URL and MROBBIEB_DEMO_KEY are PHP constants defined in wp-config.php or a separate config file — never hardcoded in JavaScript, never visible to the browser.


    What This Prevents

    • API key leakage — credentials live in PHP constants, never in JavaScript or the DOM
    • Direct backend abuse — the AI backend is never reachable from the browser directly
    • Open-proxy vulnerabilities — the endpoint allowlist means you can’t use the proxy to hit arbitrary URLs
    • Runaway costs — rate limiting ensures no single user can drain your API quota

    Why This Matters More Now: WordPress 7.0

    WordPress 7.0 shipped a native AI Connectors API that stores provider credentials in the database and provides a unified interface for AI calls. In theory, this is exactly the right direction — server-side credential management, standardized interfaces, plugin-level abstraction.

    In practice, the launch-day security disclosure was a reminder that the pattern only works if every layer of the implementation is careful. Storing keys in the database is better than hardcoding them in JavaScript — but it introduces new attack surfaces (database dumps, plugin vulnerabilities, admin form exposure) that need to be understood and mitigated.

    The server-side proxy pattern described here complements WordPress 7.0’s approach: let WordPress manage credentials server-side, and ensure the browser never has a direct line to anything sensitive.


    See It Running

    The full implementation — WordPress proxy, Symfony backend, rate limiting, nonce validation, and structured JSON responses with citations — is live in my AI Integration Demo. You can interact with it and inspect exactly what the browser sends and receives.

    For a more advanced example of the same pattern applied to a real domain (healthcare plan comparison), see the Healthcare Plan RAG Demo — a FastAPI backend with PostgreSQL vector search, where the AI answers strictly from cited plan documents.


    Key Takeaways

    1. Never make AI API calls from the browser. Always proxy through server-side PHP.
    2. Store credentials in PHP constants or environment variables, not JavaScript.
    3. Use WordPress nonces to validate that requests come from your own frontend.
    4. Enforce rate limiting server-side using WordPress transients.
    5. Allowlist specific backend endpoints — never build an open proxy.

    This pattern scales from a simple OpenAI call to a full RAG pipeline. The browser doesn’t need to know anything about your backend — and it shouldn’t.


    Building something like this?

    Get a free 30-minute architecture review or a written AI readiness audit — no commitment.


    Martin Baker

    Martin Baker — Solutions Architect specializing in AI, RAG systems, and WordPress engineering. 15+ years building systems that hold up under real business pressure.

    LinkedIn · GitHub · Get in touch