How Source’s DefraDB Uses Content-Addressable Data to Make Edge-First Development More Efficient, Secure, and Dynamic

11 min read

Addo Smajic avatar

Addo Smajic

Oct 28, 2025

How Source’s DefraDB Uses Content-Addressable Data to Make Edge-First Development More Efficient, Secure, and Dynamic

If you've built software with MongoDB, Postgres, or Firebase, you know the pattern: your application talks to a database server. That server is the source of truth. When users need data, they request it from that server. When they update data, they send it to that server.

This works fine when you control the infrastructure. But at the edge, this pattern breaks down completely.

Edge-first software runs on devices you don't control. Phones. Laptops. IoT sensors. Autonomous vehicles. Users go offline. Networks partition. Devices sync peer-to-peer. There's no central server to be the source of truth.

Traditional databases weren't built for this. DefraDB was.

The foundation is content-addressable data. It's an elegant inversion of how databases typically work, and once you understand it, you'll see why edge-first development becomes not just possible, but surprisingly straightforward.

What is Content-Addressable Data?

Traditional databases use location-based addressing. You request data by specifying where it lives. A table name. A collection. A document ID. The database server at that location returns the data.

Here's the problem: at the edge, location is meaningless.

Content-addressable data solves this with a clever trick: instead of asking "where is this data?" you ask "what is this data?"

Data gets identified by its content. A cryptographic hash of the data becomes its unique identifier. Change even one bit of the data, you get a completely different identifier.

These identifiers are called Content Identifiers (CIDs). A CID is generated using a hash function over the input data. The result is a collision-resistant identifier for any piece of data.

To retrieve data, you provide the CID. Any node on the network that has the data can return it. You verify what you received by hashing it and comparing it to the CID you requested. Perfect match means you got exactly what you asked for.

This is fundamentally different from how REST APIs or database connections work. With HTTP, you connect to a specific server at a specific URL. With DefraDB, Source's distributed database, you connect to a network that finds the data for you. Location doesn't matter. A node running on AWS is treated the same as a node running on a user's phone.

The beauty of this approach: the identifier itself proves correctness. You don't need to trust anyone. The math guarantees it.

If you've ever dealt with cache invalidation, eventual consistency, or conflict resolution in distributed systems, you know how gnarly these problems get. Content-addressable data eliminates entire classes of these problems because verification is built into the addressing scheme itself.

What are the benefits of using Content-Addressable Data?

Content-addressable data solves two fundamental problems that traditional databases can't handle at the edge.

Self-verification. The identifier is a hash of the content itself. When you receive data, hash it. Compare the result to the identifier you requested. If they match, you know the data is exactly what you asked for.

At the edge, you can't trust the source. Data might come from another user's device. It might pass through multiple nodes. It might sit in local storage for weeks before syncing. Self-verification means you never have to trust any intermediary. The cryptographic hash proves the data is correct.

If you've built mobile apps with offline sync, you know the pain of resolving conflicts when multiple devices edit the same data. Content-addressable data makes conflicts detectable and resolvable because you can verify exactly what each device changed.

Storage separates from infrastructure. The owner of a server no longer has to be the only hosting provider for data. Any node on the network can provide the data someone requests.

Think about how Firebase works. Your app talks to Firebase servers. If Firebase goes down, your app breaks. If you want to move off Firebase, you're rebuilding everything.

With content-addressable data, the data can live anywhere. On the user's phone. On a local edge device. On a peer's machine. On a cloud server. Your software doesn't care. It asks for data by CID. The network finds it. This is how Source enables true data portability.

These benefits eliminate major limitations of traditional architectures. Dead links disappear because the content is the address. Centralized servers become optional. Data integrity comes built-in through cryptographic verification.

How DefraDB Leverages Content-Addressable Data

DefraDB uses content-addressable data to solve the hardest problems in edge-first development: syncing across distributed devices, maintaining data integrity without centralized verification, and enabling peer-to-peer collaboration.

If you've used MongoDB, you're familiar with ObjectIDs. They're unique identifiers for documents. But they don't tell you anything about the document's content. ObjectIDs are just random numbers. They work fine when you have one authoritative database server. But at the edge? They're useless for verification.

DefraDB, Source's distributed database, uses CIDs instead. The identifier is derived from the content. This seemingly small change unlocks powerful capabilities:

You don't have to build syncing logic. DefraDB handles it. Nodes exchange CIDs. The network finds the data. Verification happens automatically. If you've ever built custom sync logic for offline-first mobile apps, you know this is thousands of lines of code you no longer have to write or maintain.

You don't have to trust any single node. Data can come from anywhere. The CID proves it's correct. This is fundamentally different from database replication in Postgres or MongoDB, where you have primary and replica servers with complex failover logic. With DefraDB, every node is equal.

You don't have to choose between offline functionality and data consistency. Nodes work with local data. When they reconnect, they sync using CIDs. Conflicts resolve through CRDTs. Your users get both offline capability and eventual consistency without you building custom conflict resolution logic.

Here's how it works in practice:

You query data using a CID. The DefraDB network returns the requested data. Verification happens automatically behind the familiar NoSQL query interface. The system becomes globally distributed. Data lives on any node that has it. Your edge-first software runs anywhere.

If you're used to thinking about databases as centralized services, this is the mental shift: DefraDB is a database that runs everywhere your software runs. Each instance can operate independently. They sync when connected. The CID-based addressing makes this work.

It's like Git for your application data, but real-time and automatic.

Merkle DAGs

Content-addressable data starts with individual pieces of data. Real software needs complex data structures. Nested objects. Relationships between entities. Version history.

If you've worked with Git, you already understand Merkle DAGs conceptually. Git uses content-addressable storage. Each commit has a hash derived from its content plus the hash of its parent commit. Change anything in history, and all subsequent hashes change. This makes Git's history tamper-proof.

Merkle DAGs apply the same concept to any data structure.

Merkle DAG stands for Merkle Directed Acyclic Graph. Each node in the graph has an identifier generated by hashing the node's contents. Merkle DAG nodes are immutable. Change anything in a node and you get a new identifier, creating an entirely different DAG.

This solves a critical edge-first problem: version control across distributed nodes.

In MongoDB or Postgres, when multiple clients try to update the same document, the database server decides which write wins. That doesn't work at the edge. Nodes work offline. They can't ask a server for permission to write.

Merkle DAGs solve this by making every version of your data a distinct, addressable object. Node A edits a document, creating a new DAG with a new CID. Node B edits independently, creating a different DAG with a different CID. When the nodes sync, they exchange CIDs. DefraDB's CRDT layer merges the changes. No data loss. No conflicts that require manual resolution.

The elegance here is that concurrent edits don't create ambiguity. They create branches. Just like Git. The CRDT layer knows how to merge those branches deterministically.

Think of Merkle DAGs as JSON objects with superpowers. They have keys and values. They can reference other Merkle DAG objects. The only rule: no circular references.

When you update data within a Merkle DAG, you create a new Merkle DAG with a new CID. This is what makes the entire structure content-addressable by nature.

If you've struggled with database migrations in production systems, Merkle DAGs offer a radically different approach. Instead of altering tables in place and hoping nothing breaks, you create new versions of your data structures. Old versions remain accessible. You can query any point in history. Rollbacks are instant because you're just changing which CID you reference.

No more "we need a maintenance window to run migrations." Just deploy the new schema. Old and new coexist.

IPLD

Inter-Planetary Linked Data (IPLD) brings content-addressable data and Merkle DAGs together into a unified format that works across different systems and protocols.

If you've worked with JSON Schema or Protocol Buffers, IPLD serves a similar purpose: it defines how data structures are represented and linked. But unlike those formats, IPLD is built specifically for content-addressable systems.

This matters because your software might need to work with data from multiple sources. IoT sensors. User devices. Third-party APIs. Legacy systems. IPLD provides a consistent way to address and verify data regardless of where it comes from.

DefraDB leverages IPLD to create a NoSQL document storage model using semantically-linked, content-addressable data in the form of Merkle DAGs.

What does this mean in practice?

You get a peer-to-peer, distributed NoSQL database where data can be queried like any other NoSQL document store. You write GraphQL queries. You define schemas. You get familiar results. Behind the scenes, DefraDB handles distribution, verification, syncing, offline operation, and peer-to-peer networking.

Compare this to Firebase or MongoDB Atlas. Those services give you a nice developer experience, but you're locked into their infrastructure. With DefraDB and IPLD, the developer experience is just as good, but the architecture is fundamentally different. Any node can host any data. You're not locked into any provider.

The main challenge with IPLD is mutability. Since everything relies on CIDs, any change to underlying data creates a new CID. In a traditional database, you update a record in place. With IPLD, you create a new version.

At first, this seems like a limitation. But it's actually a feature. Immutability gives you time travel for free. Every version of every document is addressable and verifiable.

DefraDB handles the mutability challenge with Conflict-Free Replicated Data Types (CRDTs). CRDTs ensure that when multiple nodes edit the same data independently, the changes merge correctly. No data loss. No conflicts that require manual resolution.

If you've dealt with distributed systems before, you know about the CAP theorem. IPLD plus CRDTs give you eventual consistency and partition tolerance, which is exactly what you need for edge-first software.

Strong consistency requires coordination. Coordination requires communication. Communication fails at the edge. Eventual consistency embraces this reality and makes it work.

What Content-Addressable Data Enables For Edge-First Software

Content-addressable data eliminates architectural constraints that make edge-first development difficult with traditional databases.

Traditional stacks assume centralized coordination, reliable networks, and the ability to ask a server for the current state. These assumptions break at the edge.

Content-addressable data removes these assumptions. Data is identified by content, not location. Verification is cryptographic, not trust-based. Distribution happens peer-to-peer.

This enables software that was difficult or impossible before:

Mobile apps and collaborative tools work offline by default. Data lives locally. Users sync directly with each other. When connectivity returns, nodes sync using CIDs. If you've built mobile apps with CoreData or Room, you know the complexity DefraDB handles automatically. Compare this to Firebase. With DefraDB, adding users adds capacity.

Autonomous systems operate independently. Vehicles, drones, and robotics platforms store and process data locally. They sync peer-to-peer whether devices are in proximity or across the internet. Your autonomous vehicle doesn't stop working when it drives through a tunnel.

Edge AI runs inference locally. Devices run models on-device. If you're building AI features, you know the cost of API calls to OpenAI or Anthropic. Running inference at the edge eliminates that cost entirely. Plus your users get instant responses and their data stays private.

Version control comes naturally. Each change creates a new Merkle DAG. This is like Git for your application data, built into the database.

Costs drop and reliability increases. Infrastructure costs drop because DefraDB uses edge device resources instead of expensive cloud operations. Data lives on multiple nodes, so individual node failures don't break your software.

Getting Started

Through Source's edge-first data management stack, content-addressable data becomes accessible. You don't sacrifice usability or developer experience. The familiar NoSQL interface remains. GraphQL queries work. Schemas define your data model.

You write code like you would for any NoSQL database. DefraDB handles the hard parts: content addressing, Merkle DAGs, IPLD formatting, peer-to-peer networking, CRDT conflict resolution, cryptographic verification.

Whether you're building mobile apps, autonomous systems, collaborative tools, or edge AI software, content-addressable data gives you the foundation to build beyond cloud dependency.

Ready to dive deeper into how content-addressable data works in DefraDB? Check out our CID documentation for implementation details, code examples, and advanced patterns.

Explore our GitHub and developer portal for more documentation and examples.

Start building.


Share

Start Building the Future

We built Source to help you build the next generation of intelligent software, from Earth to orbit. It’s time to break free from cloud constraints.