To a layman, a database is an unglamorous part of modern systems, a record of occurrence or activity that can be looked up and referenced to check something. To a developer, it’s so much more, the database is the fundamental pillar of any online service, the informational conduit through which all systems work.
Every time you query an API, update a user setting, synchronize data between devices or service platforms, you interact with a database. Authentication, content delivery, real-time analytics: all this activity is fundamentally underpinned by databases. Click on a web link, database, place an e-commerce order, database, upload a job application - you guessed it, a database is involved.
Data is akin to Archimedes - give it a place to stand and it will move the world. How we store, manage, access, and protect that data is one of the most important ongoing technical missions in computer science - decentralized or otherwise. Faster and more efficient databases with fluid schema and easy cross-language transposition and translation leads to a faster, better, and more interoperable internet. Much of the work of major centralized corporations is dedicated to healthy database upkeep and appropriate access controls to the information held within.
The Importance of Data Decentralization
And therein lies the problem, appropriate access control. Databases are the batons that relay running APIs consistently pass between each other, then how and who gets to be part of the race is key. If an online service, say PSN, stores your data inappropriately and ineffectively, then disaster is not too far behind. As wider consciousness about corporate overreach and data harvesting comes to the fore, we all have a right to worry not only about how our personal data is stored, but also about how the databases that fuel our modern online activity are maintained.
In short, centralized databases pose a huge risk not only to the functional operation of global systems, but also to our personal privacy, our rights to access, and our digital autonomy and self-sovereignty. All it takes is one weak runner in the race and you, the online you, is no longer yours to control. For corporations, businesses, developers, projects and online services where the data they collect and use has financial value, the problem is more acute. If an app can’t control and own its data, its very value is significantly undermined. Accidentally doxxing all your users is never wise.
Of course, developers know this. Many great initiatives have seized upon the potential of blockchain and its decentralized immutable consensus and privacy protections as a way to fight back against the sin of centralized data. To restore digital autonomy and financial self-sovereignty to the individual, and give corporations and users the chance to own and profit from their data.
But there is more than one way to wave a wand. And not all decentralized databases are as decentralized as they seem. Yes, using blockchain data insertion to secure public data on the blockchain is a step in the right direction, but it does not solve the problem of centralized runners in the relay race undermining the process in the first place. Generally, efforts towards decentralization databases fall into one of three camps.
Different Shades of Decentralized Data Storage
Centralized Databases Hashing Transactions On-Chain: These DINOs (Decentralized in Name Only) should already be extinct, but alas stubborn reptiles only die with meteoric pushback. Centralized databases using protocol-based hashing record transactions or specific data entries by recording these hashes to allow for transparency and auditability, but there is no direct blockchain interaction with the data, and the protocol that uses it can often be permissioned. A good example would be BigChainDB, whose actual data remains entirely centralized, entirely permissioned, with the blockchain only storing the hash for verifiable purposes, but whose validator set is a core centralized cartel who can, if they choose to, edit you out of the chain.
Centralized Databases Deployed on Incentivized Infrastructure: Better but flawed. These structures still perform all standard CRUD operations in a centralized way, but are integrated with a blockchain to ensure transactions are hashed on-chain, making them slow. This lack of speed is often sidestepped through centralized sequencers who batch the transactions on chain, but in doing so undermine some of its decentralization.
The data is maintained by a decentralized set of consensus nodes who continually check, verify and in some cases manage access to data using a public blockchain. Of course, who owns those nodes can be problematic, and in a lot of the cases developers building these apps also provide the node infrastructure, depending on the chain used.
Individual nodes can be incentivized to maintain the validity of the database through rewards to ensure data availability but, since the data is usually centrally stored (although some use IPFS or Arweave to back up data), the whole system is still fallible to the standard attack vectors, and the scalability of the database is hampered by the scalability of the blockchain on which its deployed. Chainlink is a good example of a service that uses a decentralized set of nodes to ensure data integrity, but where the actual data comes from centralized sources, while Ceramic uses traditional event-driven architecture but using the blockchain to guarantee trust.
Smart-Contract Based Database Emulation: Now we approach a decentralized structure worthy of the name, but one with significant flaws. Using smart-contracts to emulate traditional database operations like inserts, updates and queries means that the interactivity with the database is governed in a decentralized manner. Self-executing smart contacts that can be checked by anyone and can ensure that the database itself is not being compromised or modified without consensus.
Data is stored on platforms like Arweave, with data distributed across many nodes which is then compiled upon request, reducing the threat of user or corporate data being undermined. It creates high integrity and transparency of data, but there are flaws. The performance and scalability of these systems is poor since smart contracts are required for all operations. Emulating complex database queries like full text searches, real-time analytics or joins is both challenging and inefficient, and comes at a significant cost.
Worse, development or maintenance of the database is extremely difficult, and one buggy contract can undermine the data-availability of the database as a whole.
Source Network: Decentralized From the Ground Up
Source Network’s DefraDB and decentralized stack of tools is built from the ground up to ensure the principles of decentralization while maintaining a clear route to the scalability required to have distributed data become the norm. Source Network uses blockchain for operational logic, governance, and trust mechanisms - including auditability, access control, secrets management, and authentication.
DefraDB is interoperable with existing databases, with schema translation handled through LensVM, a data translation tool that allows bi-directional schemas to exist. The beauty of this is that DefraDB can be deployed alongside existing database architectures to ensure the best end-user product experience, for example using DefraDB for PII data or mobile app data to take advantage of DefraDB’s edge deployment capabilities, whilst retaining their traditional architecture (e.g. MongoDB) for their current database.
This allows organizations of all shapes and sizes to transition their stack to a decentralized architecture through a gradual process and use DefraDB to augment their current tech stack, not replace it. Data on Source Network is encrypted at the core, through every operation, to ensure privacy and, even more crucially, complete data ownership and control for both apps and individuals.
Source’s P2P architecture provides developers with flexibility in managing application and user data across numerous deployment environments, and maintains operational viability even in the case that parts of the network are compromised and, with encrypted data distributed widely, the fast retrieval times necessary for the modern throughput of online systems is achieved. Moreover, DefraDB is modular, not monolithic, and open-source, allowing developers to independently develop, deploy and manage their data across multiple devices and infrastructures and adapt Source Network’s tools to the requirements or constraints a particular organization requires. Modular architecture means there are multiple ways of deploying DefraDB depending on the infra the database is intended for, with data replicable to new DefraDB nodes without affecting the underlying infrastructure, including deployment on edge devices with specific requirements.
Source Network empowers developers to approach full-stack decentralization without relying on a pre-built architecture with non-scalable market forces and without resorting to centralization. Although we applaud partial decentralization - it’s better than fully centralized activity - Source Network is committed to create an open web of self-sovereign citizens who interact with databases without ever surrendering their data, and where organizations can build use-cases requiring high-throughput by taking our modular tools and embarking on the path to true data autonomy.