Artem Melnyk

7 min read

Building a RAG-Powered Blog Search with Mistral AI

How I replaced keyword search with a Retrieval-Augmented Generation pipeline using Mistral AI, a client-side search index, and a secure PHP proxy — all on static shared hosting.

AIRAGMistralNext.jsDevOps
Building a RAG-Powered Blog Search with Mistral AI

Building a RAG-Powered Blog Search with Mistral AI

This blog runs on a fully static Next.js export deployed to OVH shared hosting — no Node.js server, no database, no serverless functions. Despite that constraint, I added a working Retrieval-Augmented Generation (RAG) chat widget to the articles page, powered by Mistral AI.

This post explains how it works under the hood, why it is meaningfully better than classic keyword search, and what comes next.


1. The Problem with Classic Search

The existing search was powered by js-search, a client-side full-text search library. It works by loading a pre-built JSON index and doing fuzzy keyword matching in the browser.

It does its job well for exact queries, but has several hard limits:

Limitation Example
Exact token matching only "NMEA 2000 CAN" finds nothing for "CAN bus temperature telemetry"
No understanding of intent Asking "how do I secure SSH?" returns posts containing the word SSH, regardless of relevance
No synthesised answer The user gets a list of links — they still have to read everything
No cross-article reasoning Can't answer "which of your projects use both Docker and LLMs?"

The result: a search that works for people who already know what they are looking for, but fails for exploratory or conversational queries — which is exactly what a personal technical blog attracts.


2. What RAG Adds

Retrieval-Augmented Generation is a pattern where a language model answers a question using a specific curated context, rather than relying solely on its training data.

The pipeline has three stages:

Question
  → Retrieval: find the most relevant documents from my corpus
  → Augmentation: inject those documents into the LLM prompt
  → Generation: the model synthesises a grounded, cited answer

Applied to this blog, it means:

  • A user asks "How do you tunnel SSH through a VPS?"
  • The system retrieves the 3 most relevant articles from my index
  • Mistral generates a precise answer based on exactly what I wrote, with links back to the source posts

The model doesn't hallucinate from general training data — it is constrained to my actual content.


3. Architecture

The constraint of static hosting on OVH shapes the entire design. There is no Next.js API route available at runtime — only static files and a PHP runtime.

┌─────────────────────────────────────────────────┐
│  Browser                                        │
│                                                 │
│  1. User types question                         │
│  2. js-search retrieves top-3 blog post chunks  │  ← client-side retrieval
│     from /content/search/index.json            │
│  3. POST /chat.php  { question, context[0..2] }│
└──────────────────────────┬──────────────────────┘
                           │  HTTPS
┌──────────────────────────▼──────────────────────┐
│  OVH PHP runtime  (chat.php)                    │
│                                                 │
│  4. Validates + rate-limits (5 req/min/session) │
│  5. Builds system prompt + article snippets     │
│  6. Calls api.mistral.ai  ← API key stays here  │
│  7. Returns { answer }                          │
└──────────────────────────┬──────────────────────┘
                           │
┌──────────────────────────▼──────────────────────┐
│  Browser                                        │
│  8. Renders answer + clickable source links     │
└─────────────────────────────────────────────────┘

Key design decisions:

  • API key lives in PHP only — never in a NEXT_PUBLIC_ variable, never in the browser bundle.
  • Retrieval is client-side — js-search runs locally on the pre-built index with no round-trip.
  • Context is capped at 3 articles — keeps token usage predictable and cost near zero on the free tier.
  • Model: mistral-small-latest — fast, cheap, accurate enough for technical Q&A over short contexts.

4. Enriching the Search Index

The original index.json only contained slug, title, and description — too little context to ground an LLM answer.

At build time (npm run build), saveSearchData() now also extracts a 600-character plain-text snippet from the raw Markdown of each post, stripping code blocks, headings, and link syntax:

const extractSnippet = (markdown: string, maxLength = 600): string =>
    markdown
        .replace(/```[\s\S]*?```/g, '')       // remove code blocks
        .replace(/`[^`]*`/g, '')              // remove inline code
        .replace(/^#{1,6}\s+/gm, '')          // remove headings
        .replace(/!\[.*?\]\(.*?\)/g, '')      // remove images
        .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // keep link text
        .replace(/[*_~]/g, '')                // remove emphasis
        .replace(/\n+/g, ' ')
        .trim()
        .substring(0, maxLength);

Each index entry now looks like:

{
  "slug": "2026-01-18-STM32F334C8T6-CAN-NMEA2000-Implementation",
  "title": "Implementing NMEA 2000 Temperature Telemetry on STM32F334",
  "description": "Building a dual-message CAN bus system...",
  "category": "blogs",
  "snippet": "This article documents the implementation of a dual PGN NMEA 2000..."
}

The snippet is sent as part of the context to Mistral — it is what makes the answer grounded rather than generic.


5. The PHP Proxy

chat.php is the only server-side component. Its responsibilities:

Security

  • API key is hardcoded server-side — never exposed to the browser
  • CORS restricted to https://melnyk.ovh and localhost:3000
  • Input length capped at 500 characters
  • Context array stripped of HTML via strip_tags

Rate limiting

  • PHP sessions track request timestamps per visitor
  • Maximum 5 requests per 60-second window per session

Prompt construction

$system_prompt =
    "You are a helpful assistant for Artem Melnyk's personal technical blog. " .
    "Answer questions using ONLY the blog post excerpts provided below. " .
    "If the answer cannot be found in the context, say so honestly. " .
    "Be concise and technical. Always reference the relevant post title.";

$user_message = "Blog post excerpts:\n\n{$ctx_str}\n\nQuestion: {$question}";

The system prompt constrains the model to the provided context and explicitly asks it to admit when it doesn't know — reducing hallucination risk.


6. The React Component

BlogChat.tsx is dynamically imported (ssr: false) on the blog list page so it doesn't affect static generation or server bundle size.

The flow on each user query:

  1. ContentIndexer.search(question) — runs js-search on the local index, filters to category: 'blogs', takes top 3
  2. If no results — short-circuits with a UI message, no API call made
  3. POST /chat.php with { question, context } — waits for Mistral response
  4. Renders the answer as pre-wrapped text, followed by source links

A client-side 3-second throttle prevents accidental double-submissions without relying on server state.


7. Classic Search vs. RAG — Side by Side

Feature js-search (before) RAG with Mistral (now)
Query type Keywords Natural language
Answer format List of links Synthesised prose
Cross-article reasoning No Yes
Handles typos / synonyms Partial Yes
Cited sources No Yes (with links)
Works offline Yes No
API cost Free ~$0.002 / query
Latency < 50 ms 1–3 s
Hallucination risk None Low (grounded)

The two approaches are complementary. The existing keyword search remains for fast exact lookups; RAG handles exploratory and conversational queries.


8. Next Steps

Short term

  • Stream the response — Mistral supports SSE streaming; showing tokens as they arrive would improve perceived latency
  • Expand the index — include full article content in chunks (512 tokens each) rather than a 600-char flat snippet; improves recall on long posts
  • Add the RAG widget to individual blog pages — ask follow-up questions while reading a post

Medium term

  • Vector embeddings — replace js-search retrieval with cosine similarity over mistral-embed vectors stored in a static JSON file; significantly better semantic recall
  • Multi-turn conversation — maintain a message history so users can ask follow-up questions in context
  • Feedback loop — a thumbs up/down button on answers to learn which queries the current retrieval handles poorly

Long term

  • Hybrid retrieval — BM25 (keyword) + embedding similarity, re-ranked by a cross-encoder; the gold standard for production RAG
  • Move to an edge runtime — if the site ever migrates off shared hosting, a Cloudflare Worker or Vercel Edge Function would drop latency to under 200 ms globally

Conclusion

A personal blog on shared static hosting is not the obvious place to deploy RAG. But the combination of a pre-built JSON index for retrieval, a PHP proxy to guard the API key, and Mistral's free tier for generation makes it entirely practical — with zero infrastructure changes and about two hours of implementation.

The result is a search experience that understands questions rather than matching tokens, and answers them with direct citations to the actual content. For a technical blog covering diverse topics from STM32 firmware to LLM platforms, that is a meaningful improvement.