Decentralized Cloud Storage Utilizing Blockchain
|✅ Paper Type: Free Essay||✅ Subject: Information Technology|
|✅ Wordcount: 4847 words||✅ Published: 8th Feb 2020|
Abstract—This paper introduces a novel approach that allows users to store data securely across a decentralized cloud, which is comprised of a network of computers with storage space to spare. Files that the user needs offloaded, will be broken up into a variable number of shards, with each shard being encrypted on the client side. Client computer then makes a request to the network with the metadata of the file. After receiving an acknowledgment, the shards will be distributed over the network and replicated for higher availability. Storage providers will be compensated proportional to how the client is charges, on the same pay-per-use basis. A penalty system is also put forth to punish malicious nodes by seizing their stake I the system. Instead of a middleman, a mining network provides this orchestration service, which increases security and reduces the probability of collusion. Metadata regarding where the offloaded file gets stored is entered into a distributed public ledger that is made available to all the nodes, allowing anyone to verify a transaction. Files offloaded can be retrieved and decrypted using the client’s private key. Furthermore, a model to check the integrity of the data is also put forth.
The rapid evolution of the internet spurred an explosion in the volume of data created every year. It is estimated that over 92% of the data that’s available today is produced in the last two years alone. This has paved way for an eruption in the number of companies vying to provide storage as a service, with costs of high-volume Solid State Drives (SSD) only complementing that. This method of centralized data storage solves many issues while presenting many more, information ownership and data breaches being two of the major concerns. As characterized by GNU project founder, Richard Stallman, as of late, rising costs and information ownership have become primary concerns for someone planning on migrating their data to the cloud. Most cloud storage providers’ privacy policies allow for sharing customer information, under certain circumstances, with third party, giving the provider unwarranted authority over the user’s data. In addition, there is also the uncertainty of service uptime, which could result in data loss and at times, irrevocably impact the client.
According to Wikipedia, the top ten worst data leaks of all time have happened in the last five years alone. There may have of course, been several factors that contributed to these breaches. However, one major factor revolves around the centralized server storage of such information. This indicates that the current centralized storage model struggles to keep the ever-growing data of users’ secure. Decentralized storage utilizing blockchain is a potential solution. The internet is comprised of billions of interconnected devices, with massive amounts of unused storage space. This spare storage space can be pooled together and offered as a storage solution with virtually unlimited space. As this model utilizes blockchain to log all the transactions, there is a certain level of accountability that its centralized counter-part cannot match. It substitutes the middleman for a network of computers to process each user request, thereby considerably reducing costs and most importantly, allows the user to truly own his/her data.
If you need assistance with writing your essay, our professional essay writing service is here to help!Essay Writing Service
Blockchain is relatively new technology introduced to the world of science and engineering. Originally devised by a person or, a group of people known by the pseudonym, Satoshi Nakamoto, blockchain serves as the backbone of the digital cryptocurrency Bitcoin . Blockchain is an incorruptible digital ledger of economic transactions that can be programmed to record not just financial transactions but virtually everything of value (Don & Alex Tapscott, Blockchain Revolution, 2016). The ingenuity behind blockchain is that it is immutable, meaning a transaction entered onto the blockchain ledger, stays on the ledger forever.
In this paper, we propose a system that allows the participating nodes to securely rent out their spare disk space, to nodes that need it, for digital tokens of value, native to the system.
The remainder of this paper is organized into six additional chapters. Chapter II discusses the early works that took place in the field of blockchain and material that was helpful in conducting this research. Chapter III gives a deep dive into the design of our decentralized storage model. Chapter IV addresses the limitations of our proposed model. Chapter V includes the acknowledgments. Chapter VI summarizes everything and concludes the research. Finally, Chapter VII lists all the references that were utilized in compiling this paper.
- LITERATURE REVIEW
Early research indicated that the concept of blockchain was born out of the Subprime mortgage crisis that contributed to the recession of December 2007 to June 2009. It was developed to support the cryptocurrency bitcoin, by Satoshi Nakamoto . It was the first ever blockchain deployed, that still functions today. It used what’s called a “Proof of Work” Consensus protocol1, that utilized one-way mathematical functions to make the nodes in the system reach an agreement on what the transaction history is.
Since then, numerous other blockchain based companies have emerged surrounding the cryptocurrency market, with Ethereum  being the most prominent one, which utilizes what’s called a “Proof of Stake” consensus protocol2, as Proof of Work doesn’t produce the desired transaction throughput. The primary advantage of Ethereum over bitcoin is that it combined the power of traditional blockchain with the isolation that can be achieved with Virtual Machines. This allowed for the creation of a programmable blockchain, that could, in theory, run completely autonomously and allow for transferring anything of tangible value (currency and even assets).
Storj , by Storj Labs, Inc. details the construction of a decentralized object storage model without the utilization of blockchain.
Filecoin , by Protocol labs, Inc. has implemented the proposed model, albeit using a custom designed consensus algorithm called “Proof of Storage”.
The first design component of any distributed system is the consensus algorithm. In a distributed environment, where each node doesn’t know all the other nodes, there must exist a mechanism by which all the nodes in the network can come to an agreement on what the truth is. For instance, all the nodes in the bitcoin blockchain must agree on the transaction history.
In a centralized system, all the participants trust that the authority will behave honestly and share the truth with the rest of the members. Since only the trusted party has the means to modify data, it is straightforward to achieve consensus. Every node accepts and believes what the central authority states. However, in a decentralized network, there is no central authority and there exists no trust among any of the nodes. The challenge would be to get all the nodes in a trustless system to achieve consensus. In Computer Science, this is known as the Byzantine Generals’ problem, originally presented in 1982 .
1 It was called Proof of Work because, nodes in the trustless distributed system verified each other, by performing considerably heavy computation, that was easy to verify.
This problem is abstractly described as a group of generals of the Byzantine army camped with their troops surrounding the enemy city. The generals must agree upon a common battle plan and they can only communicate with each other using messengers. However, one or more of the generals may be traitors who will try to confuse the other. The problem is to find an algorithm that ensures the loyal generals will all reach an agreement on the battle plan regardless of what the traitors do.
In the case of blockchain, each general could be thought of as a node in the network and all the honest nodes must agree on what the true history of transactions is. A malicious node can send conflicting transactions to different parts of the network, yet the network should function as it should and reject the conflicting transaction.
Several blockchain consensus protocols exist at present, but most of them are bound by the FLP impossibility proof , which states that in an asynchronous distributed consensus mechanism can at most have 2 out of the 3 properties, Fault Tolerance, Safety preference, and Liveness. Fault tolerance means a system can survive even after a node(s) is down at any point in time. Safety is a guarantee that nothing bad, in the context of blockchain, a hard fork, can happen. This means that even if the network cannot agree upon the ledger, it will not fork3. Liveness means that the network will always close a ledger to be live and accepting future transactions.
Most consensus protocols choose Fault tolerance and liveness while sacrificing safety. This leads to the possibility of forks.
For the design of blockchain that can facilitate decentralized storage, we have chosen to use Federated Byzantine Agreement, developed by the Stellar foundation . It is a variation of Proof of Stake consensus algorithm, that favors Safety and fault tolerance over liveness. This ensures that there will never exist more than one chain that’s currently in operation and this in effect, brings down the block confirmation time4, from several minutes to mere 3-5 seconds.
Unlike proof of work, there is no mathematical puzzles involved. Instead, every node uses message passing as the mode of communication, and a voting process to elect a node to propose the new block (will be detailed in the coming sections). In addition, Federated Byzantine Agreement also ensures asymptotic security5, which basically means, no amount of computing power can overtake the network.
2It was called Proof of Stake because, nodes had to stake tokens amounting to considerable value in the system.
3Fork merely means that more than one instance of the chain can exist that is different from the rest.
4Time taken for a transaction to get entered into the main blockchain.
5Proof of Work is not asymptotically secure, meaning if more than 51% of nodes collude, they can take over the network.
Federated Byzantine Agreement defines a new class of users called Validators (will be detailed later in the document), which are selected by the stakeholders. These validators form quorum slices with other validators that they trust. These quorum slices of each validator overlap to form what’s called a quorum. These quorums achieve consensus with an open membership network.
Any node can act as a validator and participate in consensus, if any of the other validators adds that node into their quorum slice. This leads to the creation complex web of overlapping quorum slices and makes it almost impossible for even a supermajority of nodes to collude to control the network.
Fig 1: Quorum slices overlapping to form secure quorums to avoid an external attacker from controlling consensus
As there is no one master authority deciding which nodes get to participate in consensus, the network’s construction inherently allows for growing decentralization.
When a transaction is processed between 2 parties, it needs to be verified by at least one whole quorum for it be deemed valid. All these valid transactions enter into a “Transaction Pool” and every 3-5 seconds a validator node is chosen via a voting process, to put together all these valid transactions into a block and add it to the chain.
Fig 2: Proposed Decentralized Storage Model utilizing Blockchain
Our proposed architecture has 3 significant components among others.
- Validation Quorum:
As state above, Federated Byzantine Agreement consensus protocol creates a web of validator quorums. In the architecture, a validation quorum component can be any one of these quorums. It is responsible for producing blocks and adding them to the chain. Each node in the validation quorum runs a custom designed Virtual Machine, to execute the smart contract (will be detailed later in the document) that matches client requests with storage provider nodes. The most logical way to implement smart contract in a blockchain environment is to have every node in the validation quorum execute the code in the contract. This ensures that the state of the VM remains homogeneous across all the nodes.
When a client needs to offload a file from its system, it creates a request object, with the details of the file, like the size of the file and the number of tokens the client is willing to spend upfront, encapsulated within the object. The client then selects a validator node and contacts it with the request object. This request object triggers the smart contract to execute on the validator node’s VM. The job of this smart contract is to traverse the blockchain and find a storage provider node, that can match the details of the request.
After the smart contract completes its execution, it returns a list of providers. The validation node then recursively makes an ACK (acknowledgment) request, with a time to live, to each of the node present in the list returned by the smart contract. As soon as the provider node receives this ACK request, it replies with a message stating it can comply with the requirements of the client. In addition, it also sends an SFTP address, that the client can utilize to upload the chunk, signed with the client’s public key.
The validator node then returns this signed SFTP address to the client and marks it as a transaction. This transaction is then verified by other validators. After the transaction has been deemed valid, it is forged into a block and entered into the blockchain.
When a client wants to download a file it has previously uploaded, it contacts one of the nodes from the validation quorum with the details of the file, like filename, size of the file and the public key of the client, encapsulated within a request object. The validation node traverses the blockchain and retrieves the file metadata, like the number of chunks of the file available to download and the list of SFTP addresses of the provider nodes on which that respective chunk is stored and returns it to the client. The client can then recursively make a request to each of the provider nodes and retrieve each chunk and stitch it together using its own private key.
- Blockchain based on FBA consensus protocol:
A blockchain is a data structure consisting of records called “blocks” (will be detailed later in the document), which are linked cryptographically. Each block contains a cryptographic hash to the previous block, logically binding each other.
The basic unit of functionality for a blockchain is a transaction. A transaction between 2 parties can be anything, context of decentralized storage, a signed message indicating the name digital hash of the file that needs to be uploaded/downloaded.
As stated previously, the blockchain consists of blocks. Each block consists of 2 parts,
- Header: It consists of the SHA256 hash of the previous block and the Merkle root  of all the transactions present in that block.
- Body: The body of the block is comprised of all the valid transactions that go in the block and the state of the VM that executes the smart contract. There is no hard and fast rule as to how many transactions go in the body while forging a block.
It is divided into 2 parts, to make faster transaction validations possible. When a transaction between 2 parties is initiated, at least 66% of the validator nodes must agree that the transaction is valid, for it to go through. To verify the transaction’s validity, each validator needn’t traverse the entire blockchain, instead, it can just traverse the Merkle root of each block and verify the transaction.
- Smart contract:
A smart contract is piece of code running on the blockchain. It is a computer protocol intended to digitally facilitate, verify, or enforce the negotiation or performance of a contract. It is automatically run after certain criteria are met, or through an incoming transaction or message.
The smart contract in our system is set to run a piece of code on the validator node’s VM, when it receives a request from a client node. Upon receipt of the request, the smart contract traverses the blockchain to find providers that have registered themselves with the system to rent out their spare disk space. It finds providers with enough space to satisfy the client’s request and returns a list of providers, in ascending order starting from just the required amount of storage space. It does so, to utilize the space available within the system in the most efficient way possible.
In this section, we detail the different entities present in our system.
- Client node:
The client is a program running on the user’s system. It provides the user with a point of access to the network. It enables the user to utilize of the collective storage space available within the network. The user can offload a file or a set of files from his/her system onto the network of nodes.
To offload a file, the client first creates a hash of the file and then divides the file up into a variable number of chunks. Each chunk is then encrypted by the client program, chooses a validation node at random, and contacts it with a request object which encapsulates all the metadata regarding the file and its chunks. Upon receiving a reply from the validation node, it parses the reply and acquires the SFTP address of the provider node, required to upload the file and starts the upload. Upon completion of the transfer, the transaction is entered into the blockchain by the validation node. To get a confirmation that the file got uploaded successfully, the client node makes a request to the checker node with the hash of the file. The checker node then traverses the chain and polls the respective provider node for randomly chosen chunks and returns an appropriate response to the client.
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.View our services
To download a file, the client program makes a request to the validation quorum with the filename and hash value of the file that needs to be retrieved. Upon receiving the response, the client node parses it and recursively asks each provider node for the respective chunk and downloads it. All the chunks are decrypted using the client’s private key and stitched together.
- Validator node:
It is the class of nodes that participates in the consensus protocol to process user requests, transactions in the system and advance the blockchain. Each validator node runs the smart contract code for every incoming request, to find matching provider nodes. It basically acts as an orchestration service that substitutes a middleman. Each validator node earns tokens for processing transactions and forging blocks.
- Provider node:
The provider node is a class of participating nodes that choose to rent out their spare disk space for tokens of value. The nodes that wish to provide storage as a service, register themselves with the validation quorum. Upon request for registration, the provider nodes are required to “buy into” the system and stake their buy-in, which will be locked in for a certain period, decided by the validation quorum based on the amount space they are willing to offer. After this step is completed successfully, they are registered with the system and the details of this registration is parsed into a transaction and entered into the blockchain. Providers receive tokens as compensation for their storage space, based on how long they’ve stored a certain amount of data.
- Checker node:
The checker node is a class of participating nodes that act as an auxiliary for the validation quorum. The major problem with a decentralized model to store data is, there would be no method to know whether the storage provider node had held its end of the contract to store the client’s data for the specified period. Introducing a checker node to complement the duties of a validation quorum can solve that. The checker node can periodically receive instructions from the validation quorum, to make sure that a randomly chosen piece of data is indeed stored with the respective provider node. This kind of a secondary verification system will allow for penalizing dishonest provider nodes, by seizing their stake in the system. Half of the dishonest provider’s seized stake goes to the checker as a reward.
- Token Economy
The storage providers and validator nodes need to be compensated for expending their computing power to provide their respective services. This compensation cannot exist in the form of fiat currency, as that would require linking bank accounts, that would make the systems ultimately depend on a central authority, defeating the whole point of decentralization.
This is the primary reason why every decentralized platform that needs to exchange something of value, prefers to introduce its own currency or tokens that are native to the system. The value of these tokens depends on how well the system is performing, i.e., the token value is directly proportional to the underlying blockchain technology.
Blockchain value is calculated based on a variation of Metcalfe’s law  proposed by Andrew Odlyzko, et al. . It calculates the value of the network based on conservative assumptions and suggests that for large values of n, n being the number of users of the system, the value of the network grows more closely in proportion to n*log(n).
One important aspect to take into consideration while designing token economy, is the supply. A token’s supply and distribution are vey important to a token’s inherent value and the system’s overall success. The supply of a token can be capped, like bitcoin, or produced infinitely maintaining a steady inflation rate, like traditional fiat currency.
In this section, we detail the limitations that the proposed model has.
- Prone to generation attacks:
Generation attack means logically separated storage provider and the checker on the same physical node. This way the providers could claim to be storing massive amounts of data and collect the token rewards associated with it, when in fact they might not be storing anything at all. This can be mitigated by introducing Verifiable Random Functions  to refine the checker selection process.
- Sybil Attacks:
Federated Byzantine Agreement consensus protocol allows for open registration of new users, given that they have been vouched for by at least one other validator. If at all a malicious user is to enter validation quorum, he/she could potentially register more malicious users and launch a DDoS  attack, bringing the entire network to a halt, for the stakeholders to resolve the issue. The FBA protocol is designed in such a way to increase decentralization while increasing scalability as well.
- The provider node can hold the client’s data for ransom:
As there is no Service Level Agreement in place in a decentralized system, a provider node could potentially hold the client’s data for ransom. Though this would result in the provider losing its stake in the system, the client will be irrevocably impacted and this drives down the value of the token.
- Token Regulation:
As a decentralized system has no governance or a regulatory body in place, to oversee the flow of value between entities, it is impossible to account for an entity’s worth in that system. Though this provides a certain level of anonymity, the legality of the system remains questionable.
A blockchain based model will always be slower compared to its centralized counterpart. Storage is no exception. Though, usage of FBA protocol improves the overall speed drastically compared to other blockchain technologies, it will still be slower than a centralized provider like Amazon S3 or Google drive. But this is a tradeoff that entails security.
As the research has demonstrated, storage decentralization can eliminate most of the issues plaguing the current centralized model, with service downtime, rising costs and, information ownership being the most prominent ones. Client-side encryption ensures data integrity and security. Use of a validation quorum instead a middleman reduces transaction fees and blockchain ensures that all the transactions are accounted for and all the parties involved are compensated/charged for fairly. There are a few limitations to this approach but, the advantages are significant enough to outweigh the limitations.
 S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System”, 2008
 V. Buterin, “Ethereum: A next Generation Smart Contract & Decentralized Application Platform”, 2013
 Storj Labs Inc., “StorjL A Decentralized Cloud Storage Network Framework”, 2018
 Protocol Labs Inc., “Filecoin: A Decentralized Storage Network”, 2017
 L. Lamport, R. Shostak, M. Pease, “The Byzantine Generals Problem”, in ACM Transactions on Programming Languages and Systems, 1982
 M. J. Fischer, N. A. Lynch, M. A. Paterson, “Impossibility of Distributed Consensus with One Faulty Process”, in Journal of the ACM, 1985
 D. Mazières, “The Stellar Consensus Protocol: A Federated Model for Internet-level Consensus”, 2017
 R. C. Merkle, “A Digital Signature Based on a Conventional Encryption Function”, in A Conference on the Theory and Applications of Cryptographic Techniques on Advances in cryptography, 1987
 T. Peterson, “Metcalfe’s Law as a Model for Bitcoin’s Value”, in Alternative Investment Analyst Review, 2018
 A. Odlyzko, B. Tilly, “A refutation of Metcalfe’s Law and a better estimate for the value of networks and network interconnections”, 2005
 D. Hofheinz, T. Jager, “Verifiable Random Functions from Standard Assumptions”, 2015
 M. Malik, Y. Singh, “A Review: DoS and DDoS Attacks”, in International Journal of Computer Science and Mobile Computing, 2015
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: