Project Jehoover: file sharing for mobbed wifi networks.
Summary:
During the
SoS #1 in London, we discovered that even a few peers uploading and downloading files accross a wireless segment completely hosed the bandwidth for everyone on the network.
Using wires and a switch is a great temporary solution, but it isn't exactly discreet, and someone has to bring a battery to power the switch. So, we're talking about building a scheduling system for UDP broadcast file sharing in order to optimise file sharing over a multi-peer wireless segment such as you'd find on a wi-fi hotspot.
Plan:
This is a peer-to-peer file sharing application which will run a UDP client/server in a fast event loop using Twisted Python. A GUI that works more or less equally on Windows, OS X, Linux, BSD, etc. will be built using Glade-2 and PyGTk 2.
Peers:
Each peer would have a list of directories that it's sharing. Every so often, each peer would broadcast UDP announcements that contain file metadata. Peers listen for other peer's metadata broadcasts, and present this information to the user, who can then select which ones they want to download. Based on the user's selections, peers announce their preferences for file broadcasts (ie. which files they want, in order of preference), ranked from 1-N.
Protocol:
The Jehoover protocol currently supports three message types: ANNOUNCE, BROADCAST, and REQUEST. Data is transferred in 1k blocks. Files are identified by their MD5 sum.
- All Jehoover packets begin with a 32-bit message type and a 32-bit peer ID.
- ANNOUNCE packets consist of a variable-length list of (file ID, 32-bit file size in blocks, relative file path).
- BROADCAST packets consist of the MD5 sum of the file being sent, the 32-bit block number, the MD5 sum of the block, and the block itself.
- REQUEST packets consist of a variable-length list of (file ID, 32-bit block number).
Interaction:
Peers have two states, LISTEN and SEND. On joining the network, peers select a random 32-bit peer ID and enter the LISTEN state.
LISTEN
SEND
- Send one or more ANNOUNCE messages for some of the files we're sharing.
- Wait 500 ms for other ANNOUNCE messages. If an ANNOUNCE message is heard, check the sender ID. If the sender's ID is less than our ID but greater than the last sender ID, then LISTEN.
- If we have blocks queued to broadcast, BROADCAST n of them.
- Send a single REQUEST message for some of the blocks we want.
- LISTEN.
User Interface:
The user interface has 5 displays:
- An ‘outbox' which displays folders you want to share.
- An ‘inbox' which shows folders announced by other peers along with the contents of each folder, and the size of each file.
- A ‘wanted' window to which you can add files you want, and which allows you to shuffle the order in terms of you preference.
- A ‘what's currently being broadcast' process bar that shows you how much of the currently broadcast file you've downloaded, and a ‘kill' button allowing to stop that download
- A display showing files you've downloaded. Downloaded files are automatically added to the list of shared files once their checksum is verified.
Cute Things
- Because one file is being scheduled at a time, you could even have a little bit of music – a party tune perhaps, that plays on the laptop of the person sharing the current file. Then everyone can turn to that person and admire them for bringing such a wonderful bit of data.
- The Jehoover protocol would easily support an IRC-like chatroom environment, for discussing what the heck the files actually are.
Technical Limitations
- Jehoover will have to rely on the OS to provide a local IP address, e.g. via DHCP, zeroconf, whatever.
- Word has it that performance on 802.11b is optimized when the packet size matches the MTU size, which defaults to 1500. Accounting for protocol overhead (e.g. SHA-1 sums , which take up 20 bytes), this makes 1400 - 1450 bytes the apparently optimal UDP block size.
- SHA-1 was recently broken but better hashing algorithms aren't shipped with Python. It'll have to do. MD5 would probably work just as well (and maybe we should use it instead to save CPU time).
- We could use ?BitTorrent-like accounting to decide which blocks to send rather than simply queuing all requests.