Saturday, November 12, 2011

Distributing Large Files

One of the requirements of my new project is to be able to distribute large files to 10-15 clients as fast as possible. Fortunately, all (server and clients) NICs supports Gigabit connectivity.
I tested it first simply by copying large data from server to many clients simultaneously. Of course, it was a disaster. While theoretically the copying speed would decrease proportional to the number of clients, in reality the network connection between clients and server, and also between clients got disrupted. Network was down for sometimes. I did not make any further investigation on what was happening, all I know is this is far from the desired speed even if it was going well.
What's next? I gather some info's on how to distribute large files over LAN. Some suggest bittorrent, parallel filesystem, multicast copy, (win) mesh, and some other way. My concern is that I have to use a solution which is as simple as possible, since there would not be any expert IT guy around in the future.
While bittorrent sounds pretty promising, I had to leave out this option. Not only because of the capability of the human resources that will maintain the system, also it would take some times to prepare the large file to be distributed by bittorrent. Another thing is that small torrent packets could really saturate the network.

About parallel filesystem, I have to ditch this one too, because this would be more complicated and confusing for the basic admin guy. Also I need to mention that all clients are Windows 7, not really sure there are stable parallel filesystem support

Multicast copy should be interesting, but could not find any apps for Windows 7.

So, I figured that I need to make my own simple solution. The goal here is to distribute large files, around 1-5 GigaBytes each, without overloading the connection between source (server) and target (pc).

I simply came up with creating some kind of chain distribution. Transfer file from Server to PC1, and from PC1 to PC2, and so on. Doing this, I could use all of the available incoming gigabit bandwidth as well as the outgoing one. I tested it and worked quite well. Of course there will be a problem if one of the PC is down, then the distribution could not be continued to the rest of the PCs after that. So, rerouting must be done either manually or automatically.

As for the tool to copy the data, one can use simple copy, robocopy, synctoy, and other mechanism. I am planning to use command line robocopy.

I also plan to use load balancing technique using both NICs on the server. So, I can create 2 chains within the network, and perform the transfer simultaneously to those chains.

No comments:

Post a Comment