When transferring files between systems usually
scp are all that is needed. Sometimes, there are difficult edge cases. For example using
rsync to transfer many files spread across 1000s of directories quickly. Recently, I needed to move several TB worth of files in varying size spread across 1000s of directories. I found that
rsync spent most of the time traversing directories and not copying the data. I turned to the webs to find a better solution. The excellent solution I found had an old school feel using
tar and netcat (
The solution seemed too simple to work efficiently especially considering how slow
rsync had been at transferring the files. I was quite wrong, really wrong. During my first attempt
rsync had been using about 20% - 40% of a 1Gb network link. With
tar | nc I was now seeing 80% - 95% usage on the 1Gb link.
The only issue I have with using
tar | nc was with doing incremental updates. There is probably a way by using tar to do an incremental backup but for my usage
rsync was acceptable once the initial copy was completed.
How to Use This Awesomeness
On the source server you will be running
tar | pv | pigz | nc.
tar: archives the directory.
pv: is simply used to calculate the size remaining to transfer.
pigz: is a parallel gzip compression utility.
nc: sends the data to the recieving system that connects to TCP port “9999”.
ncdoes not have any authentication or encryption. I do not recommend using this over the Internet without stunnel or a vpn.
- You will also need to ensure the TCP port 9999 is open between the source and destination.
Open the port and set the directory on the source server
tar -c /some/dir/to/copy | pv --size 'du -sb /some/dir/to/copy | cut -f1' | pigz -5 | nc -l 9999
- On the destination server, simply change to the directory you want the files in. Then, connect to the source server. This is basically reversing what is happening in step 1.
nc: connects to the source_server on TCP port 9999.
gunzip: decompresses the data from
pigzfrom the source server.
tar: unarchives the data to disk.
Connect to the source server and start the transfer
cd /some/dir/to/store/files nc source_server 9999 | gunzip | tar xvf -
What is happening
Essentially, netcat is providing a way to extend the stdout from the source server to the destination server. Netcat is connecting the stdout from the source system to the stdin on the receiving system. The above is basically the same thing as stringing commands together with
| and leveraging stdout.
Thanks to Andrew’s post for the inspiration and saving me a few days.