I know you can have too much of a good thing, but I do find Google Print pretty interesting :-) Here’s a quick run-down of the URL parameters they are using, and what they do. (Note that the service currently appears to be Slashdotted, with 502 Server Errors popping up everywhere. Surprising to get that from a company of their size, but there you go.)
Sample URL:
http://print.google.com/print?id=ULQSG0Zs7vcC&lpg=3&pg=3&sig=QD6xDOsosnwh8uXQuXRJL5old88
- id: specifies the book.
- pg: specifies the page number. If you remove it, you get a list of all the pages in the book. But you can’t just increment it, because…
- sig: some sort of hash which also uniquely identifies the page. This is presumably to stop robots spidering the site and just monotonically increasing the page number.
- q: query – the words in this parameter will get highlighted in yellow. This indicates that they are generating the JPEG graphics on-the-fly from a computer-readable source on the back end. I suspect they’ve implemented display engines for several common book print formats.
Earlier today, URLs requires “img=<something>“, but they don’t seem to now. Instead, an lpg parameter has appeared. I’m not sure exactly what that does (why does it need two page numbers?), and I can’t investigate until they fix the server…
It appears lpg means left page or something. You can view two pages each side of lpg – if lpg is 3 then you can view pages 1,2,3,4,5. If its 10, you can view pages 8, 9, 10, 11 and 12. On pages 8 and 12 one of the buttons is grayed – a measure to stop you from reading the whole book I suppose.
With lpg and sig, that makes it pretty hard to automatically download a whole book.
Cow: Good call. Indeed, that’s what it’s for. I suspect the “l” is for “link”. Of course, if you can get link pages for most pages in the book, you are OK. One way to do that is to search within the book for a common word like “the” (example). I’m sure there’ll make common words into “stop words” soon to prevent this from happening, though.