When transferring files between systems usually rsync
or scp
are all that is needed. Sometimes, there are difficult edge cases. For example using rsync
to transfer many files spread across 1000s of directories quickly. Recently, I needed to move several TB worth of files in varying size spread across 1000s of directories. I found that rsync
spent most of the time traversing directories and not copying the data. I turned to the webs to find a better solution. The excellent solution I found had an old school feel using tar
and netcat (nc
).
The solution seemed too simple to work efficiently especially considering how slow rsync
had been at transferring the files. I was quite wrong, really wrong. During my first attempt rsync
had been using about 20% - 40% of a 1Gb network link. With tar | nc
I was now seeing 80% - 95% usage on the 1Gb link.
The only issue I have with using tar | nc
was with doing incremental updates. There is probably a way by using tar to do an incremental backup but for my usage rsync
was acceptable once the initial copy was completed.
How to Use This Awesomeness
-
On the source server you will be running
tar | pv | pigz | nc
.tar
: archives the directory.pv
: is simply used to calculate the size remaining to transfer.pigz
: is a parallel gzip compression utility.nc
: sends the data to the recieving system that connects to TCP port “9999”.nc
does not have any authentication or encryption. I do not recommend using this over the Internet without stunnel or a vpn.- You will also need to ensure the TCP port 9999 is open between the source and destination.
-
Open the port and set the directory on the source server
tar -c /some/dir/to/copy | pv --size 'du -sb /some/dir/to/copy | cut -f1' | pigz -5 | nc -l 9999
-
On the destination server, simply change to the directory you want the files in. Then, connect to the source server. This is basically reversing what is happening in step 1.
nc
: connects to the source_server on TCP port 9999.gunzip
: decompresses the data frompigz
from the source server.tar
: unarchives the data to disk.
-
Connect to the source server and start the transfer
cd /some/dir/to/store/files
nc source_server 9999 | gunzip | tar xvf -
What is happening
Essentially, netcat is providing a way to extend the stdout from the source server to the destination server. Netcat is connecting the stdout from the source system to the stdin on the receiving system. The above is basically the same thing as stringing commands together with |
and leveraging stdout.
Credit
Thanks to Andrew’s post for the inspiration and saving me a few days.