Thoughts. Linux. Scripts. Programming.: April 2012

Many organizations and colleges restrict their employees and students respectively from downloading files from the Internet which are larger than a prescribed limit. It is way too low at 14MB where I work. Fret not! There are ways to bypass this. And here is a simple bash script I wrote to download much larger files at my workplace.

Note: This script works only with direct links and with servers which support resume-download functionality.

I'm continually working on it. So, the latest version will be available on my github account.

How to run it?

Download the following code to a text file named curldownload.sh
Give executable permissions to it chmod +x curldownload.sh
File size limit, fsize_limit variable, is set to 14MB. You may change it to your liking.
The script takes two arguments; the first one being the url of the file to be downloaded; the second one which is optional (defaults to "./") is the output directory.
For ex:- ./curldownload.sh http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/12.0/linux-i686/en-US/firefox-12.0.tar.bz2 "$HOME/Downloads"
A little more complex example of it using multiple urls, and two command-line arguments (-d for output directory, and -u for user-agent http header) is: ./curl-multi-url.sh -d ~/downloads/ -u "Chromium/18.0" http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/11.0/linux-x86_64/en-US/firefox-11.0.tar.bz2 http://ftp.jaist.ac.jp/pub/mozilla.org/firefox/releases/12.0/linux-i686/en-US/firefox-12.0.tar.bz2

#!/bin/bash
#
# Vikas Reddy @
#   http://vikas-reddy.blogspot.in/2012/04/bypass-proxy-servers-file-size-download.html
#
# 
# Usage:
#     ./curl-multi-url.sh -d OUTPUT_DIRECTORY -u USER_AGENT http://url-1/ http://url-2/;
#     Arguments -d and -u are optional
#
#

# Defaults
fsize_limit=$((14*1024*1024))
user_agent="Firefox/10.0"
output_dir="."


# Command-line options
while getopts 'd:u:' opt "$@"; do
    case "$opt" in
        d) output_dir="$OPTARG";;
        u) user_agent="$OPTARG";;
    esac
done
shift $((OPTIND - 1))


# output directory check
if [ -d "$output_dir" ]; then
    echo "Downloading all files to '$output_dir'"
else
    echo "Target directory '$output_dir' doesn't exist. Aborting..."
    exit 1
fi;


for url in "$@"; do
    filename="$(echo "$url" | sed -r 's|^.*/([^/]+)$|\1|')"
    filepath="$output_dir/$filename"

    # Avoid overwriting the file
    if [[ -f "$filepath" ]]; then
        echo -n "'$filepath' already exists. Do you want to overwrite it? [y/n] "; read response
        [ -z "$(echo "$response" | grep -i "^y")" ] && continue
    else
        cat /dev/null > "$filepath"
    fi

    echo -e "\nDownload of $url started..."
    i=1
    while true; do   # infinite loop, until the file is fully downloaded

        # setting the range
        [ $i -eq 1 ] && start=0 || start=$(( $fsize_limit * ($i - 1) + 1))
        stop=$(( $fsize_limit * i ))

        # downloading
        curl --fail --location --user-agent "$user_agent" --range "$start"-"$stop" "$url" >> "$filepath"

        exit_status="$?"

        # download finished
        [ $exit_status -eq 22 ] && echo -e "Saved $filepath\n" && break

        # other exceptions
        [ $exit_status -gt 0 ] && echo -e "Unknown exit status: $exit_status. Aborting...\n" && break

        i=$(($i + 1))
    done
]]>

Thoughts. Linux. Scripts. Programming.

Monday, April 30, 2012

Bypass proxy server's file size download limit restriction