I’ve figured out how to script the Google Documents Viewer into reading any office document — doc, docx, xls, xlsx, odt, ods, and probably a bunch of others — and converting it to PDF. There are tons of tools, such as unoconv, but Google’s service is well sandboxed, which makes it a nice choice if you want to convert untrusted documents, such as in the case of a web service. So without further ado, here you go:

convert-url-to-pdf.sh:

#!/bin/sh
 
# by Jason A. Donenfeld
# www.zx2c4.com
 
if [ $# -ne 2 ]; then
        echo "Usage: $0 url output-pdf-file"
        exit 1
fi
 
set -e
documenturl="$(echo -n "$1" | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g')"
viewerurl="http://docs.google.com/viewer?url=$documenturl"
pdfurl="$(printf "$(curl -s "$viewerurl" | sed -n "s/.*gpUrl:'\\([^']*\\)'.*/\\1/p" | sed 's/%/%%/g')")"
cookiejar="$(mktemp)"
curl -s -L -c "$cookiejar" -o "$2" $pdfurl
rm -f "$cookiejar"

November 18, 2011 · [Print]

Leave a Reply