Artem Melnyk

4 min read

Building a Structured LLM Wiki from Scientific Sources

A practical workflow for building a structured LLM knowledge base using Obsidian, Claude Code, and curated scientific sources.

AILLMKnowledge Management
Building a Structured LLM Wiki from Scientific Sources

Building a Structured LLM Wiki from Scientific Sources

This article documents the practical implementation of a structured LLM knowledge ingestion workflow.

The objective was to transform scattered web pages and peer-reviewed articles into a queryable knowledge system capable of supporting structured reasoning.

Inspired by the LLM Wiki architecture, the workflow focuses on building a persistent knowledge layer before querying any model.


1. Creating a Dedicated Knowledge Workspace

The first step was to initialize a clean knowledge environment.

Create a New Obsidian Vault

This vault acts as the central repository for all knowledge artifacts.

Structure:

Vault/
 ├── Clipping/
 ├── PDFs/
 ├── Wiki/
 └── Logs/

This separation ensures:

  • traceability
  • modular structure
  • predictable ingestion

2. Setting Up the Agentic Environment

The ingestion workflow is executed inside an agent-assisted development environment.

Environment used:

  • Antigravity
  • Claude Code
  • Obsidian Vault (mounted workspace)

This enables:

  • repeatable prompts
  • automated indexing
  • structured content generation

3. Initializing the LLM Wiki Schema

The wiki structure is initialized using the Karpathy LLM Wiki prompt.

Reference:

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

This schema defines:

  • page organization
  • indexing rules
  • linking strategy
  • maintenance logic

Example:

index.md
topics/
concepts/
references/
log.md

This creates the foundation of the knowledge graph.


4. Capturing Source Material

Two categories of sources were collected:

Web Sources

Captured using:

Obsidian Web Clipper

Each page was:

  • reviewed
  • saved as Markdown
  • placed into:
Clipping/

Scientific PDFs

Peer-reviewed, open-access papers were manually collected and stored in:

PDFs/

Source discipline included:

  • publication validation
  • relevance filtering
  • topic alignment

This ensured high signal-to-noise ratio.


5. Indexing the Clipping Folder

The first transformation step begins here.

Raw Markdown files are converted into structured entries.

Prompt Used

Index the Markdown files in the Clipping folder
and create initial wiki entries with topics,
concepts, and references.

This produces:

  • topic pages
  • concept definitions
  • source references

At this stage:

the knowledge graph begins to form.


6. Ingesting Scientific PDF Articles

Scientific articles are processed using a structured ingestion prompt.

Reference:

https://gist.github.com/artemmelnyk-extern/e9e54d962284838d6c246a99caf04125

Each article is:

  1. Renamed consistently
  2. Processed using the prompt
  3. Converted into structured wiki pages

Expected outputs:

  • structured summaries
  • extracted concepts
  • cross-links
  • references

This step significantly expands the knowledge network.


7. Querying the Knowledge Base

After ingestion, the system becomes queryable.

Not searchable.

Queryable.

That distinction is fundamental.

Example interaction:

Query:
Explain the relationship between Topic A and Topic B
based on the ingested sources.

The answer now reflects:

  • structured context
  • linked knowledge
  • curated sources

Not raw text fragments.


8. Visualizing the Knowledge Graph

The Obsidian graph provides a visual confirmation of structure.

Each:

  • node = structured knowledge
  • edge = contextual relationship

This visualization confirms:

the transformation from documents to knowledge.


Architecture Overview

Raw Sources
      ↓
Clipping
      ↓
Indexing
      ↓
PDF Ingestion
      ↓
Linked Wiki
      ↓
Queryable Knowledge

This architecture transforms:

Information → Knowledge → Reasoning


Result

After full ingestion:

  • knowledge became modular
  • relationships became visible
  • querying became meaningful

The system evolved from:

file storage

to:

knowledge infrastructure


Key Takeaways

  1. Knowledge must be structured before reasoning
  2. Source quality determines reasoning quality
  3. Linking is as important as content
  4. Indexing creates the first real structure
  5. Querying is the final step — not the first

Conclusion

This workflow demonstrates how modern LLM systems benefit from structured knowledge ingestion rather than raw document retrieval.

The shift is not about smarter models.

It is about better knowledge architecture.