Building a Structured LLM Wiki from Scientific Sources

This article documents the practical implementation of a structured LLM knowledge ingestion workflow.

The objective was to transform scattered web pages and peer-reviewed articles into a queryable knowledge system capable of supporting structured reasoning.

Inspired by the LLM Wiki architecture, the workflow focuses on building a persistent knowledge layer before querying any model.

1. Creating a Dedicated Knowledge Workspace

The first step was to initialize a clean knowledge environment.

Create a New Obsidian Vault

This vault acts as the central repository for all knowledge artifacts.

Structure:

Vault/
 ├── Clipping/
 ├── PDFs/
 ├── Wiki/
 └── Logs/

This separation ensures:

traceability
modular structure
predictable ingestion

2. Setting Up the Agentic Environment

The ingestion workflow is executed inside an agent-assisted development environment.

Environment used:

Antigravity
Claude Code
Obsidian Vault (mounted workspace)

This enables:

repeatable prompts
automated indexing
structured content generation

3. Initializing the LLM Wiki Schema

The wiki structure is initialized using the Karpathy LLM Wiki prompt.

Reference:

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

This schema defines:

page organization
indexing rules
linking strategy
maintenance logic

Example:

index.md
topics/
concepts/
references/
log.md

This creates the foundation of the knowledge graph.

4. Capturing Source Material

Two categories of sources were collected:

Web Sources

Captured using:

Obsidian Web Clipper

Each page was:

reviewed
saved as Markdown
placed into:

Clipping/

Scientific PDFs

Peer-reviewed, open-access papers were manually collected and stored in:

PDFs/

Source discipline included:

publication validation
relevance filtering
topic alignment

This ensured high signal-to-noise ratio.

5. Indexing the Clipping Folder

The first transformation step begins here.

Raw Markdown files are converted into structured entries.

Prompt Used

Index the Markdown files in the Clipping folder
and create initial wiki entries with topics,
concepts, and references.

This produces:

topic pages
concept definitions
source references

At this stage:

the knowledge graph begins to form.

6. Ingesting Scientific PDF Articles

Scientific articles are processed using a structured ingestion prompt.

Reference:

https://gist.github.com/artemmelnyk-extern/e9e54d962284838d6c246a99caf04125

Each article is:

Renamed consistently
Processed using the prompt
Converted into structured wiki pages

Expected outputs:

structured summaries
extracted concepts
cross-links
references

This step significantly expands the knowledge network.

7. Querying the Knowledge Base

After ingestion, the system becomes queryable.

Not searchable.

Queryable.

That distinction is fundamental.

Example interaction:

Query:
Explain the relationship between Topic A and Topic B
based on the ingested sources.

The answer now reflects:

structured context
linked knowledge
curated sources

Not raw text fragments.

8. Visualizing the Knowledge Graph

The Obsidian graph provides a visual confirmation of structure.

Each:

node = structured knowledge
edge = contextual relationship

This visualization confirms:

the transformation from documents to knowledge.

Architecture Overview

Raw Sources
      ↓
Clipping
      ↓
Indexing
      ↓
PDF Ingestion
      ↓
Linked Wiki
      ↓
Queryable Knowledge

This architecture transforms:

Information → Knowledge → Reasoning

Result

After full ingestion:

knowledge became modular
relationships became visible
querying became meaningful

The system evolved from:

file storage

to:

knowledge infrastructure

Key Takeaways

Knowledge must be structured before reasoning
Source quality determines reasoning quality
Linking is as important as content
Indexing creates the first real structure
Querying is the final step — not the first

Conclusion

This workflow demonstrates how modern LLM systems benefit from structured knowledge ingestion rather than raw document retrieval.

The shift is not about smarter models.

It is about better knowledge architecture.

Building a Structured LLM Wiki from Scientific Sources

A practical workflow for building a structured LLM knowledge base using Obsidian, Claude Code, and curated scientific sources.

Building a Structured LLM Wiki from Scientific Sources

1. Creating a Dedicated Knowledge Workspace

Create a New Obsidian Vault

2. Setting Up the Agentic Environment

3. Initializing the LLM Wiki Schema

4. Capturing Source Material

Web Sources

Scientific PDFs

5. Indexing the Clipping Folder

Prompt Used

6. Ingesting Scientific PDF Articles

7. Querying the Knowledge Base

8. Visualizing the Knowledge Graph

Architecture Overview

Result

Key Takeaways

Conclusion