When transferring files between systems usually rsync or scp are all that is needed. Sometimes, there are difficult edge cases. For example using rsync to transfer many files spread across 1000s of directories quickly. Recently, I needed to move several TB worth of files in varying size spread across 1000s of directories. I found that rsync spent most of the time traversing directories and not copying the data. I turned to the webs to find a better solution. The excellent solution I found had an old school feel using tar and netcat (nc).

The solution seemed too simple to work efficiently especially considering how slow rsync had been at transferring the files. I was quite wrong, really wrong. During my first attempt rsync had been using about 20% - 40% of a 1Gb network link. With tar | nc I was now seeing 80% - 95% usage on the 1Gb link.

The only issue I have with using tar | nc was with doing incremental updates. There is probably a way by using tar to do an incremental backup but for my usage rsync was acceptable once the initial copy was completed.

How to Use This Awesomeness

tar -c /some/dir/to/copy | pv --size 'du -sb /some/dir/to/copy | cut -f1' | pigz -5 | nc -l 9999
cd /some/dir/to/store/files
nc source_server 9999 | gunzip | tar xvf -

What is happening

Essentially, netcat is providing a way to extend the stdout from the source server to the destination server. Netcat is connecting the stdout from the source system to the stdin on the receiving system. The above is basically the same thing as stringing commands together with | and leveraging stdout.

Credit

Thanks to Andrew’s post for the inspiration and saving me a few days.

comments powered by Disqus