Building a RAG-Powered Blog Search with Mistral AI
This blog runs on a fully static Next.js export deployed to OVH shared hosting — no Node.js server, no database, no serverless functions. Despite that constraint, I added a working Retrieval-Augmented Generation (RAG) chat widget to the articles page, powered by Mistral AI.
This post explains how it works under the hood, why it is meaningfully better than classic keyword search, and what comes next.
1. The Problem with Classic Search
The existing search was powered by js-search, a client-side full-text search library. It works by loading a pre-built JSON index and doing fuzzy keyword matching in the browser.
It does its job well for exact queries, but has several hard limits:
| Limitation | Example |
|---|---|
| Exact token matching only | "NMEA 2000 CAN" finds nothing for "CAN bus temperature telemetry" |
| No understanding of intent | Asking "how do I secure SSH?" returns posts containing the word SSH, regardless of relevance |
| No synthesised answer | The user gets a list of links — they still have to read everything |
| No cross-article reasoning | Can't answer "which of your projects use both Docker and LLMs?" |
The result: a search that works for people who already know what they are looking for, but fails for exploratory or conversational queries — which is exactly what a personal technical blog attracts.
2. What RAG Adds
Retrieval-Augmented Generation is a pattern where a language model answers a question using a specific curated context, rather than relying solely on its training data.
The pipeline has three stages:
Question
→ Retrieval: find the most relevant documents from my corpus
→ Augmentation: inject those documents into the LLM prompt
→ Generation: the model synthesises a grounded, cited answer
Applied to this blog, it means:
- A user asks
"How do you tunnel SSH through a VPS?" - The system retrieves the 3 most relevant articles from my index
- Mistral generates a precise answer based on exactly what I wrote, with links back to the source posts
The model doesn't hallucinate from general training data — it is constrained to my actual content.
3. Architecture
The constraint of static hosting on OVH shapes the entire design. There is no Next.js API route available at runtime — only static files and a PHP runtime.
┌─────────────────────────────────────────────────┐
│ Browser │
│ │
│ 1. User types question │
│ 2. js-search retrieves top-3 blog post chunks │ ← client-side retrieval
│ from /content/search/index.json │
│ 3. POST /chat.php { question, context[0..2] }│
└──────────────────────────┬──────────────────────┘
│ HTTPS
┌──────────────────────────▼──────────────────────┐
│ OVH PHP runtime (chat.php) │
│ │
│ 4. Validates + rate-limits (5 req/min/session) │
│ 5. Builds system prompt + article snippets │
│ 6. Calls api.mistral.ai ← API key stays here │
│ 7. Returns { answer } │
└──────────────────────────┬──────────────────────┘
│
┌──────────────────────────▼──────────────────────┐
│ Browser │
│ 8. Renders answer + clickable source links │
└─────────────────────────────────────────────────┘
Key design decisions:
- API key lives in PHP only — never in a
NEXT_PUBLIC_variable, never in the browser bundle. - Retrieval is client-side — js-search runs locally on the pre-built index with no round-trip.
- Context is capped at 3 articles — keeps token usage predictable and cost near zero on the free tier.
- Model:
mistral-small-latest— fast, cheap, accurate enough for technical Q&A over short contexts.
4. Enriching the Search Index
The original index.json only contained slug, title, and description — too little context to ground an LLM answer.
At build time (npm run build), saveSearchData() now also extracts a 600-character plain-text snippet from the raw Markdown of each post, stripping code blocks, headings, and link syntax:
const extractSnippet = (markdown: string, maxLength = 600): string =>
markdown
.replace(/```[\s\S]*?```/g, '') // remove code blocks
.replace(/`[^`]*`/g, '') // remove inline code
.replace(/^#{1,6}\s+/gm, '') // remove headings
.replace(/!\[.*?\]\(.*?\)/g, '') // remove images
.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // keep link text
.replace(/[*_~]/g, '') // remove emphasis
.replace(/\n+/g, ' ')
.trim()
.substring(0, maxLength);
Each index entry now looks like:
{
"slug": "2026-01-18-STM32F334C8T6-CAN-NMEA2000-Implementation",
"title": "Implementing NMEA 2000 Temperature Telemetry on STM32F334",
"description": "Building a dual-message CAN bus system...",
"category": "blogs",
"snippet": "This article documents the implementation of a dual PGN NMEA 2000..."
}
The snippet is sent as part of the context to Mistral — it is what makes the answer grounded rather than generic.
5. The PHP Proxy
chat.php is the only server-side component. Its responsibilities:
Security
- API key is hardcoded server-side — never exposed to the browser
- CORS restricted to
https://melnyk.ovhandlocalhost:3000 - Input length capped at 500 characters
- Context array stripped of HTML via
strip_tags
Rate limiting
- PHP sessions track request timestamps per visitor
- Maximum 5 requests per 60-second window per session
Prompt construction
$system_prompt =
"You are a helpful assistant for Artem Melnyk's personal technical blog. " .
"Answer questions using ONLY the blog post excerpts provided below. " .
"If the answer cannot be found in the context, say so honestly. " .
"Be concise and technical. Always reference the relevant post title.";
$user_message = "Blog post excerpts:\n\n{$ctx_str}\n\nQuestion: {$question}";
The system prompt constrains the model to the provided context and explicitly asks it to admit when it doesn't know — reducing hallucination risk.
6. The React Component
BlogChat.tsx is dynamically imported (ssr: false) on the blog list page so it doesn't affect static generation or server bundle size.
The flow on each user query:
ContentIndexer.search(question)— runs js-search on the local index, filters tocategory: 'blogs', takes top 3- If no results — short-circuits with a UI message, no API call made
POST /chat.phpwith{ question, context }— waits for Mistral response- Renders the answer as pre-wrapped text, followed by source links
A client-side 3-second throttle prevents accidental double-submissions without relying on server state.
7. Classic Search vs. RAG — Side by Side
| Feature | js-search (before) | RAG with Mistral (now) |
|---|---|---|
| Query type | Keywords | Natural language |
| Answer format | List of links | Synthesised prose |
| Cross-article reasoning | No | Yes |
| Handles typos / synonyms | Partial | Yes |
| Cited sources | No | Yes (with links) |
| Works offline | Yes | No |
| API cost | Free | ~$0.002 / query |
| Latency | < 50 ms | 1–3 s |
| Hallucination risk | None | Low (grounded) |
The two approaches are complementary. The existing keyword search remains for fast exact lookups; RAG handles exploratory and conversational queries.
8. Next Steps
Short term
- Stream the response — Mistral supports SSE streaming; showing tokens as they arrive would improve perceived latency
- Expand the index — include full article content in chunks (512 tokens each) rather than a 600-char flat snippet; improves recall on long posts
- Add the RAG widget to individual blog pages — ask follow-up questions while reading a post
Medium term
- Vector embeddings — replace js-search retrieval with cosine similarity over
mistral-embedvectors stored in a static JSON file; significantly better semantic recall - Multi-turn conversation — maintain a message history so users can ask follow-up questions in context
- Feedback loop — a thumbs up/down button on answers to learn which queries the current retrieval handles poorly
Long term
- Hybrid retrieval — BM25 (keyword) + embedding similarity, re-ranked by a cross-encoder; the gold standard for production RAG
- Move to an edge runtime — if the site ever migrates off shared hosting, a Cloudflare Worker or Vercel Edge Function would drop latency to under 200 ms globally
Conclusion
A personal blog on shared static hosting is not the obvious place to deploy RAG. But the combination of a pre-built JSON index for retrieval, a PHP proxy to guard the API key, and Mistral's free tier for generation makes it entirely practical — with zero infrastructure changes and about two hours of implementation.
The result is a search experience that understands questions rather than matching tokens, and answers them with direct citations to the actual content. For a technical blog covering diverse topics from STM32 firmware to LLM platforms, that is a meaningful improvement.