Mentors
- Inbasekaran.P
- Mohan Gunakara Nayak
- R S Muthukumar
Members
- Fahim Ahmed
- Vignaraj Pai
- Vedant Tarale
Aim
BitTorrent is a peer-to-peer (P2P) file sharing protocol that allows users to share large files with each other. BitTorrent clients are applications that implement the protocol and allow users to download and upload files using the BitTorrent network. In this project, we have implemented a BitTorrent client in Rust programming language with a GUI built using React and tauri framework.
Introduction
BitTorrent is built on the principle of not downloading files from just one machine, instead, it encourages us to download pieces of the file concurrently from multiple machines. Therefore, it allows faster download, upload load gets distributed across, better utilization of download capacity of a machine and even when a large number of nodes join the network, the load is increased only by a small margin. We used Rust Programming Language because it is a modern, safe, and high-performance programming language that is suitable for building systems-level applications. It provides memory safety and thread-safety by enforcing strict rules at compile time, which makes it an ideal choice for building network applications like BitTorrent clients.
Methodology
- Research and planning: The initial step in developing the torrent client was to conduct thorough research and plan the project’s scope, objectives, and requirements. We researched various torrent clients, their features, and their technical aspects to identify the best practices and approaches to implement in our client. We also identified the algorithms necessary to build the client, such as bencode, choke algorithm, piece selection algorithm, distributed hash table implementation (Kademlia), and multithreading.
- Requirements gathering: Once the research and planning phase were completed, we gathered the requirements for the torrent client. We identified the key features and functionalities that the client must have, such as downloading and uploading files, peer-to-peer communication, and error handling. We also determined the hardware and software requirements for running the client.
- Design and architecture: After gathering the requirements, we designed the client’s architecture and created a detailed design document. We identified the components and modules necessary to build the client and determined how they would interact with each other. We also identified the data structures required for storing and managing the torrent metadata.
- Implementation: Once the design was finalized, we began implementing the client using the Rust programming language. We used the algorithms identified in the research phase, such as bencode, choke algorithm, piece selection algorithm, distributed hash table implementation (Kademlia), and multithreading, to implement the client’s features and functionalities. We also used Rust’s features, such as ownership and borrowing, to ensure memory safety and prevent common programming errors.
Bencoding
- Bencode is a binary serialization format used to encode and decode data in a structured way. It is a simple and compact encoding scheme that is typically used in BitTorrent to represent metadata files, such as .torrent files.
- The bencode format consists of four data types: integers, byte strings, lists, and dictionaries. Integers are represented by the letter ‘i’ followed by the integer value and then the letter ‘e’. Byte strings are represented by their length followed by a colon and then the actual string data. Lists are represented by the letter ‘l’ followed by the encoded elements of the list, and then the letter ‘e’. Dictionaries are represented by the letter ‘d’ followed by the encoded key-value pairs, and then the letter ‘e’.
- The bencode format is a self-describing format, meaning that the structure of the data is encoded along with the data itself. This makes it easy to parse and decode bencoded data without needing to know anything about the specific data being encoded.
- In a torrent client, bencode is used extensively for decoding and encoding torrent files, which contain all the necessary information about the file being downloaded, such as the file name, size, hash, and tracker URLs.
- When a torrent file is downloaded, it is first parsed using the bencode format to extract the metadata. The metadata contains the necessary information about the file to be downloaded, which includes the hash of the files, the number of pieces, and the size of each piece. Once the metadata is extracted, the actual file download can begin.
- Similarly, when a file is being shared as a torrent, its information is first encoded using bencode format and then included in the torrent file. When other peers download the torrent file, they can extract the encoded information and use it to download the file.
- In our project, bencode is crucial for decoding and encoding torrent files, which is an essential part of the torrent client.
Choke Algorithm
BitTorrent works on a P2P network , hence there is no central resource allocation Unit. Thus, peers would naturally try to download from whoever it can. The choke algorithm was introduced to guarantee a reasonable upload and download reciprocation. Choking is a temporary refusal to upload. We say that peer A has choked peeps because CHOKED peer B is unable to download files from A, but peer A can download from peer B.
Choking is necessary because
- TCP congestion when we send across many connections at once.
- prevent network abuse and starvation
Any peer will upload to peers ,who give the best download rate. Every 30 seconds, one additional interested peer is unchoked at random to promote fairness to new peers. Seeder is not unchoked based on upload rate instead it is based on time they were last unchoked.
Peers in the active peer set are changed regularly at random.
Piece Selection Algorithm
When a file is downloaded through a torrent, the file is split into many small pieces, and each piece can be downloaded from multiple users(seeders).A piece is a unit of transmission and when a user has all the pieces it concatenates them and re-creates the file.
Having a piece selection strategy is important as it is highly inefficient if all the peers start with the same piece. This would increase the load on the peers/seeders that have the piece, ultimately resulting in reduced distribution speed.There are different algorithms that can be used for piece selection, but the most common one is the “rarest first”.
The core idea of the “Rarest First” Algorithm is to prioritize the download of the piece that is rarest in the network. This strategy ensures that only ‘new’ pieces are downloaded from the seeder. Doing so results in faster distribution of the piece in the network.
Peers having a rare piece interest other peers and thus they are frequently unchocked by the other peers therefore establishing a better download speed.
To implement the ‘Rarest First’ Algorithm, the client maintains a list of all the pieces of the file and the number of peers that have each piece. The rarest piece is requested first and as more peers download those pieces, they become less rare and thus have a lower priority now. The algorithm continues to request and download the rarest pieces until the download is complete.
Distributed Hash Table implementation (Kademlia)
To get information about peers, a node in the BitTorrent network talks to Tracker. Having a central entity is still prone to attacks and failures. Say, we have a gigantic set of KV pairs that one node cannot store or handle. Hence we use a Distributed Hash Table.
Every node (machine) participating gets a unique 160bit ID. The node that is closest to the key, owns it. In order to find the “closest” node we need a distance metric that quantifies the closeness. For two nodes/keys in our Kademlia distribution, the distance metric is d(x,y) = X ⊕ y
Every node in the network would need to keep track of a few nodes, and hope they keep track of others, and so on. Every node knows at least one node in each subtree that it is not part of. Thus each node only has to keep track of ip port, node id a small subset of nodes and the routing takes care of converging to the target node.
Every node for each subtree, holds k entries and each K-bucket is sorted by time last seen when a node receives any message from other node in the network, it updates its appropriate k-bucket with the node id.
There are some network remote procedure calls (RPCs) available in Kademlia that help us effectively use it. A GET(key) operation is a computer lookup at key followed by a FIND_VALUE() call for the k closest computers to key by id. A PUT(key, value) operation is a computer lookup at key followed by a STORE(key) operation at the k closest computers
Multithreading
A process is divided into several different subtasks called threads, which has its own path of execution. This concept is called multithreading. Multithreading is required in a BitTorrent client to enable parallel downloading and uploading of multiple pieces of a file from and to different peers. BitTorrent is a peer-to-peer (P2P) file sharing protocol, where users share pieces of a file with each other rather than downloading the whole file from a single server. A file shared on a BitTorrent network is split into multiple small pieces, and each piece is downloaded from multiple peers simultaneously.
Multithreading allows the client to connect to multiple peers simultaneously and download different pieces of the file from different peers at the same time, making the download faster and more efficient. Additionally, multithreading also allows the client to upload different pieces of the file to different peers simultaneously, thus increasing the client’s contribution to the network and improving the overall download speed for all peers.
Without multithreading, the client would have to download each piece of the file sequentially, one piece at a time, and would only be able to upload one piece to one peer at a time, which would significantly slow down the download process.
Rust provides a lightweight green threading implementation called “async/await” which enables lightweight task management without the overhead of full threads. Overall, Rust’s focus on memory safety, ownership, and safe concurrency makes it easier to write multithreaded programs that are fast, efficient, and reliable.
Conclusion
In this project, we have implemented a BitTorrent client in Rust programming language. We have demonstrated the advantages of Rust for building network applications by implementing a modular, high-performance, and robust client. We believe that our implementation can serve as a useful reference for building BitTorrent clients in Rust.