Is Knowledge Just Data?
Series: A Programmer’s Philosophical Reflections #02/11 | Reading time: 25-30 min | Concept: Epistemology — Data, Information, Knowledge, and Wisdom
Author: Wina @ Code & Cogito
The Day the Database Humbled Me
One afternoon, I picked up what looked like a straightforward task: pull last month’s sales trends from the database.
The data was all there. Millions of transaction records, timestamps precise to the millisecond, each one tagged with a product ID, amount, and user segment. I wrote the query, ran the aggregation, generated the chart. The line went up. Looked good.
I tossed the report to the product manager. She glanced at it for three seconds and asked a question I couldn’t answer: “Is this because we did something right, or is it just a seasonal effect?”
I froze.
I had data. I had numbers. I even had a trend line. But I didn’t have understanding. I knew what happened, but not why. I could tell you the details of every transaction, but couldn’t explain what those details meant when assembled together.
That afternoon, I started thinking about a question that sounds naive: what’s the actual difference between data and knowledge?
I later discovered that this question has been haunting philosophers for twenty-five centuries. And their answers turn out to be remarkably useful for people who build things for a living.
Background: The Distance from Bits to Wisdom
The DIKW Pyramid: An Underrated Architecture
In information science, there’s a classic hierarchy — the DIKW pyramid: Data, Information, Knowledge, Wisdom. Most people treat it as a slogan and move on. But if you look at it with an engineer’s eyes, it actually describes a transformation pipeline:
raw = b"01001000 01100101 01101100 01101100 01101111"
info = raw.decode("utf-8") # "Hello" -- now it has format
knowledge = "English greeting, used in initial encounters" # now it has context
wisdom = "Is saying Hello to this person appropriate right now?" # now it has judgment
Each layer of transformation doesn’t happen automatically. From data to information, you need encoding and format. From information to knowledge, you need context and connections. From knowledge to wisdom, you need experience and judgment.
Here’s a point that engineers particularly appreciate: each transformation introduces uncertainty. The data itself is deterministic (those bits are those bits), but the moment you start interpreting, you’ve entered philosophical territory.
The Philosopher’s Version: JTB and Its Collapse
Philosophy’s most classic definition of “knowledge” is JTB — Justified True Belief. To count as “knowledge,” you need to satisfy three conditions simultaneously:
- You believe it (Belief) — there’s a record in your system
- It’s true (True) — it actually corresponds to reality
- You have good reasons to believe it (Justified) — your belief isn’t a lucky guess
Translated into engineering:
def is_knowledge(claim, evidence, reality):
believed = claim in my_beliefs # Do I believe it?
true = claim.matches(reality) # Is it true?
justified = evidence.supports(claim) # Do I have good reasons?
return believed and true and justified
Clean, right? But in 1963, a philosopher named Edmund Gettier blew this definition apart with a two-page paper. The essence of his counterexamples: you can have excellent reasons to believe something that happens to be true, yet your reasons have nothing to do with why it’s true. You got the right answer, but your reasoning was wrong.
Engineers see this all the time. Your tests are all green, and the code is indeed running correctly — but not because your logic was right. Two bugs happened to cancel each other out. Do you “know” the program is correct?
Schema Is Worldview
How You Store Is How You Think
This is one of the most important insights in this article, so let me develop it carefully.
When you design a database schema, you’re making what appears to be a purely technical decision. But think about it: you’re actually doing something deeply philosophical — you’re deciding what the world is made of.
You decompose “person” into fields: name, age, gender, job title, department. These fields represent what you consider the most important attributes of a “person.” You didn’t include “dreams,” “fears,” or “childhood memories” — not because these don’t exist, but because they’re outside the scope of your model.
You decompose “relationships” into foreign keys and join tables: friend, colleague, manager. But the subtleties of human connection — the frenemy, the once-close-now-drifting-apart, the polite-on-the-surface-but-competing-underneath — these don’t exist in your schema.
A schema is an ontological declaration: it defines what your system can “see.”
And what you can’t see, you can never ask questions about.
# Your schema determines what questions you can ask
class Employee:
name: str
department: str
salary: float
# You can ask: Who has the highest salary? Which department is biggest?
# You can't ask: Who is the most creative? Who quietly holds the team together?
In philosophy, this is called “ontological commitment” — your language and frameworks determine what you acknowledge as existing. A SQL schema is an ontological commitment. An API’s response format is an ontological commitment. The metrics you choose to track are, too.
The Three Hidden Dimensions of Knowledge
Beyond the DIKW pyramid, knowledge has three dimensions that engineers tend to overlook.
First, knowledge has context. “Water boils at 100 degrees” — true at sea level, not on a mountaintop. Your feature flag works perfectly in the test environment but fails in production — because the context changed. Knowledge removed from the context in which it holds degrades from “knowledge” to “information that might mislead you.”
Second, knowledge has a shelf life. “This API endpoint returns JSON” — until the next version switches to Protocol Buffers. Your understanding of framework best practices — until community consensus shifts. Knowledge expires, but we rarely assign a TTL to our beliefs.
Third, knowledge has provenance. Where did you learn this? Did you verify it yourself, hear it from a colleague, read it in documentation, or get it from an AI? Each source carries a different degree of reliability. But in daily work, we rarely trace the provenance chain of our beliefs — until some “everybody knows that” assumption turns out to be false.
Knowledge Version Control: Your Beliefs Need Git Too
Why Your Knowledge System Needs retract
Your code has Git. Your documents have version control. But what about your beliefs?
Think about it — three years ago, you read an article and learned that “microservices is the right architectural direction.” That belief entered your knowledge system. From then on, you carried it into every architectural decision. But you never went back to update it — even though you’ve since experienced the pain of microservices, read dissenting voices, and seen different success patterns.
A mature knowledge system must be able to do three things:
class KnowledgeRepo:
def commit(self, claim, evidence, confidence):
"""Add a piece of knowledge with evidence and confidence level"""
self.log.append({"claim": claim, "evidence": evidence,
"confidence": confidence, "status": "active"})
def retract(self, claim, reason):
"""Retract a piece of knowledge -- not delete, but mark as retracted"""
self.log.append({"retract": claim, "reason": reason})
def trace(self, claim):
"""Trace the complete history of a piece of knowledge"""
return [entry for entry in self.log if entry.get("claim") == claim]
Update: When new evidence arrives, you need to merge it. Retract: When an old claim is wrong, you need to explicitly mark it — not silently delete it, but leave a record explaining why you changed. Trace: Every conclusion must be traceable to its source and reasoning.
This is exactly the logic of an incident postmortem. You don’t delete the logs after a system failure and pretend it didn’t happen — you trace, document, and update preventive measures. Your belief system deserves the same rigor.
Knowledge Graphs: Not About Size, But Queryability
Knowledge graphs are hot in engineering, but most people focus on scale. Philosophy offers a different lens: the value of a knowledge graph lies not in how many nodes it has, but in its inferential power.
A good knowledge structure lets you derive the unknown from the known:
# Minimal viable knowledge inference
facts = {
("seawater_density", "affected_by", "salinity"),
("seawater_density", "affected_by", "temperature"),
("salinity", "estimated_from", "conductivity"),
}
def what_affects(subject):
return {obj for (s, p, obj) in facts if s == subject and p == "affected_by"}
# what_affects("seawater_density") -> {"salinity", "temperature"}
The point isn’t the code’s complexity. It’s that you’ve structured knowledge to the degree that a machine can query and reason over it. That’s the critical leap from “I know it in my head” to “the system can use it.”
Modern Connections: Does an LLM Have Knowledge?
Fluency Is Not Understanding
This is the most urgent epistemological question of the AI era.
You ask ChatGPT a question. It gives you a fluent, organized, plausible-sounding answer. Does it know these things?
Check it against the JTB framework:
- Belief: Strictly speaking, an LLM doesn’t “believe” anything. It computes probability distributions over next tokens. But behaviorally, it acts as if it has beliefs.
- Truth: Its answers are sometimes correct, sometimes elaborately packaged errors. It doesn’t distinguish between the two.
- Justification: This is the fatal weakness. LLMs typically can’t point to a verifiable chain of sources. Their “reasons” are statistical patterns, not reasoning processes.
So an LLM’s output is more like — it passes the coherence theory test (looks internally consistent), but doesn’t necessarily pass the correspondence theory test (doesn’t necessarily match facts). It resembles knowledge but lacks knowledge’s most essential ingredient: traceable justification.
The Epistemological Significance of RAG
This is why RAG (Retrieval-Augmented Generation) is so important in engineering practice — it isn’t just a technical optimization. It’s an epistemological patch.
What RAG does: before generating an answer, it queries a reliable knowledge base and attaches retrieved sources alongside the response. In philosophical terms, it upgrades “belief without justification” into “justified belief with source tracing.”
Imperfect, but the direction is right. What engineers are doing here is essentially repairing an epistemological vulnerability.
Knowledge Supply Chain Management
In the AI era, knowledge is no longer a one-time judgment. It needs supply chain management:
- Provenance — where did this claim come from?
- Confidence scoring — how certain am I?
- Cross-validation (multi-source validation) — is there an independent second source?
- TTL management — when does this need re-verification?
The CI/CD pipeline you run every day is essentially a knowledge maintenance system — it continuously verifies that your code (beliefs) still holds in new environments (reality).
Reflections & Takeaways: Are You Collecting Data, or Building Understanding?
The Antidote to Data Anxiety
Many people hoard data as a form of security blanket: “As long as I have enough data, I won’t make bad judgments.”
But the reality is the opposite: the more data you have, the more clearly you see what you don’t know. More data brings more dimensions, more dimensions bring more contradictions, more contradictions bring more uncertainty.
A mature epistemological stance isn’t “I know” — it’s “I know where I’m uncertain.”
Four Mental Models to Take With You
-
Distinguish the layers — Next time you say “I know,” ask yourself: do you have data, information, knowledge, or understanding? Different layers support different quality of decisions.
-
Audit your schema — Your thinking frameworks (the metrics you track, the questions you ask, the models you build) determine what you can see. Periodically ask: what important things aren’t in my schema?
-
Tag your sources — For every important belief, try to trace its provenance. Did you verify it yourself? Or is it just “what everyone says”? Source reliability directly determines belief reliability.
-
Set a TTL — Give your core beliefs an expiration date. Periodically ask: does this belief still hold? Has the world changed? Is there new evidence?
Conclusion
Back to that afternoon in front of the database.
I had millions of records. But I learned something important: data doesn’t automatically become knowledge, just as ingredients don’t automatically become a meal. In between, you need structure, context, judgment, experience — and a person willing to admit “I don’t understand enough yet.”
Next time you pull a bunch of numbers from a database and prepare to write them into a report, you might pause for a beat:
Is this data, or knowledge? Do you really “know” anything yet?
Next Article Preview
cogito_ergo_sum.py
Your program can do reflection, monitor itself, and automatically adjust its strategy based on state. Does it have “self-awareness”?
- What’s left after Descartes doubts everything? Why does it look like a fallback mechanism?
- Reflection and recursion — can self-referential engineering structures explain consciousness?
- AI agents can plan, self-correct, and say “I think” — how should we handle this kind of projection?
- What exactly makes the “hard problem” of consciousness so hard? Why might engineering methods never solve it?
Next time, we start from Descartes’ methodological doubt and use inspect, self, and monitoring systems to disassemble the mystery of consciousness.
References
- Plato. Theaetetus. (The classic definition of knowledge: justified true belief)
- Gettier, Edmund. “Is Justified True Belief Knowledge?” Analysis, 1963. (JTB counterexamples)
- Ackoff, Russell. “From Data to Wisdom.” Journal of Applied Systems Analysis, 1989. (The DIKW pyramid)
- Quine, W.V.O. “On What There Is.” Review of Metaphysics, 1948. (Ontological commitment)
- Floridi, Luciano. The Philosophy of Information. Oxford University Press, 2011. (Philosophy of information)
- Lewis, Patrick et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS, 2020. (RAG)
