Sat, 03 Apr 2010
After "Scrape the Web", 2010
As you might remember, I gave my Scrape the Web tutorial again at PyCon this year, 2010.
I finally got to publishing the 2010 tidbits on the web. So without further ado:
- A video of me on blip.tv
- A tar.gz file of code samples
- A brief four-page cheat sheet
- My full slides, photos, jokes and all
I also referred to some useful tools in the talk. You might want to check these out:
- Selenium IDE, for WYSIWYG scraping code generation
- Selenium RC, for reaching right into a web browser and having that do your page loading
- the everyblock templatemaker (see the cheat sheet)
- Firequark, for finding CSS selectors of elements on the page
- FireBug, for the magical "Inspect Element"
And the old standbys:
- mechanize
- lxml.html
My mini feedback for myself:
My WP-Hashcash demo didn't go as planned this year, but it's still possible in theory. The attack in the code still works against last year's version of WP HashCash. Kids, don't upgrade your demo site the night before your presentation!
Speaking of "the night before," again I didn't sleep very much before the talk. I think that worked better for me last year. In the future, if I basically stay up all night, I should give a talk before noon.
So I actually think last year's video was probably better, though I haven't watched them in full.
Take freely from the code samples and "cheat sheet"!