Home

Advertisement

Customize
 
 
08 April 2008 @ 04:17 pm
dllargefile: Resumable rsync of large files  
Before today the timestamp for dllargefile was 2006-01-13, but in preparing this article I've modified it to remove some hard coding.  This script is a wrapper around rsync to handle the download of largefiles.  It is specially designed to effeciently resume the download after being interrupted.  I used it to download some 1-2 GB movies from  a friends computer through RoadRunner cable internet service.

BEGIN dllargefile
#!/bin/bash

BWLIMIT=20

function dllargefile {
        local FILE SEDFILE HOST;
        FILE=$1;
        HOST=`echo $FILE | sed -e 's/:.*//'`;
        FILE=`echo $FILE | sed -e 's/.*://'`;
        SEDFILE=/tmp/dllargfile-sedfile
        if [ -z "$FILE" ]
        then
                echo usage $0 hostname:filename [hostname:filename ...]
                exit 1
        fi

        echo "s/\([ ']\)/\\\\\1/g" > $SEDFILE;
        QFILE=`echo "$FILE" | sed -f "$SEDFILE"`
        rm $SEDFILE;
        if [ -e "$FILE" ];
        then
                chmod u+rw "$FILE";
        fi

        rsync -v --progress --partial --bwlimit=$BWLIMIT --inplace "$HOST":"$QFILE" .
}

while [ $# -gt 0 ]; do
        F="$1";
        shift;
        dllargefile "$F";
done
exit 0;


END dllargefile 

EXAMPLE: dlllargefile "impson.tzo.com:Star Wars: The Legacy Revealed.avi"

Let' see... It rate limits the bandwidth used to 20kbps on the assumption that if the file is so big and it will take so long to download that you're resolved to it running overnight (with auto-resume after interruption), and in the mean time you'd rather not use up all your bandwidth allocation.

You invoke it similarly to how you'd invoke rsync or scp, using the 'host:file' style to specify what file to get from what host.  Only you don't specify a destination--it assumes the current directory.

The yucky call to sed with the $QFILE variable is to escape spaces and apostrophes (which show up a lot in my multimedia filenames and are difficult to escape correctly in rsync and scp).

The downloading file is made writable if it already exists because rsync usually will mark a file read-only until it's fully downloaded (after which it sets it to the original file permissions).  This is necessary should the download get interrupted and retarted.  The '--partial' flag to rsync tells it to leave the incomplete file should the download get interrupted.  Finally, the --in-place flag tells rsync not to use temporary files for the downloading file.  These three things (the chmod and the two rsync flags) let dllargefile resume after an interruption.

Something it doesn't do that it might is to detect interruptions and try again (maybe after a delay) until sucessful download.

The while loop at the bottom handles multiple host:files on the commandline.
 
 
 
 

Advertisement

Customize