Monday, April 30, 2012

Bypass proxy server's file size download limit restriction


   Many organizations and colleges restrict their employees and students respectively from downloading files from the Internet which are larger than a prescribed limit. It is way too low at 14MB where I work. Fret not! There are ways to bypass this. And here is a simple bash script I wrote to download much larger files at my workplace.

Note: This script works only with direct links and with servers which support resume-download functionality.

  I'm continually working on it. So, the latest version will be available on my github account.

How to run it?
  1. Download the following code to a text file named curldownload.sh
  2. Give executable permissions to it chmod +x curldownload.sh
  3. File size limit, fsize_limit variable, is set to 14MB. You may change it to your liking.
  4. The script takes two arguments; the first one being the url of the file to be downloaded; the second one which is optional (defaults to "./") is the output directory.
  5. For ex:- ./curldownload.sh http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/12.0/linux-i686/en-US/firefox-12.0.tar.bz2 "$HOME/Downloads"
  6. A little more complex example of it using multiple urls, and two command-line arguments (-d for output directory, and -u for user-agent http header) is:  ./curl-multi-url.sh -d ~/downloads/ -u "Chromium/18.0" http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/11.0/linux-x86_64/en-US/firefox-11.0.tar.bz2 http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/12.0/linux-i686/en-US/firefox-12.0.tar.bz2
#!/bin/bash
#
# Vikas Reddy @
#   http://vikas-reddy.blogspot.in/2012/04/bypass-proxy-servers-file-size-download.html
#
# 
# Usage:
#     ./curl-multi-url.sh -d OUTPUT_DIRECTORY -u USER_AGENT http://url-1/ http://url-2/;
#     Arguments -d and -u are optional
#
#

# Defaults
fsize_limit=$((14*1024*1024))
user_agent="Firefox/10.0"
output_dir="."


# Command-line options
while getopts 'd:u:' opt "$@"; do
    case "$opt" in
        d) output_dir="$OPTARG";;
        u) user_agent="$OPTARG";;
    esac
done
shift $((OPTIND - 1))


# output directory check
if [ -d "$output_dir" ]; then
    echo "Downloading all files to '$output_dir'"
else
    echo "Target directory '$output_dir' doesn't exist. Aborting..."
    exit 1
fi;


for url in "$@"; do
    filename="$(echo "$url" | sed -r 's|^.*/([^/]+)$|\1|')"
    filepath="$output_dir/$filename"

    # Avoid overwriting the file
    if [[ -f "$filepath" ]]; then
        echo -n "'$filepath' already exists. Do you want to overwrite it? [y/n] "; read response
        [ -z "$(echo "$response" | grep -i "^y")" ] && continue
    else
        cat /dev/null > "$filepath"
    fi

    echo -e "\nDownload of $url started..."
    i=1
    while true; do   # infinite loop, until the file is fully downloaded

        # setting the range
        [ $i -eq 1 ] && start=0 || start=$(( $fsize_limit * ($i - 1) + 1))
        stop=$(( $fsize_limit * i ))

        # downloading
        curl --fail --location --user-agent "$user_agent" --range "$start"-"$stop" "$url" >> "$filepath"

        exit_status="$?"

        # download finished
        [ $exit_status -eq 22 ] && echo -e "Saved $filepath\n" && break

        # other exceptions
        [ $exit_status -gt 0 ] && echo -e "Unknown exit status: $exit_status. Aborting...\n" && break

        i=$(($i + 1))
    done
]]>