Sat, 03 Apr 2010

After "Scrape the Web", 2010

As you might remember, I gave my Scrape the Web tutorial again at PyCon this year, 2010.

I finally got to publishing the 2010 tidbits on the web. So without further ado:

I also referred to some useful tools in the talk. You might want to check these out:

Selenium IDE, for WYSIWYG scraping code generation
Selenium RC, for reaching right into a web browser and having that do your page loading
the everyblock templatemaker (see the cheat sheet)
Firequark, for finding CSS selectors of elements on the page
FireBug, for the magical "Inspect Element"

And the old standbys:

mechanize
lxml.html

My mini feedback for myself:

My WP-Hashcash demo didn't go as planned this year, but it's still possible in theory. The attack in the code still works against last year's version of WP HashCash. Kids, don't upgrade your demo site the night before your presentation!

Speaking of "the night before," again I didn't sleep very much before the talk. I think that worked better for me last year. In the future, if I basically stay up all night, I should give a talk before noon.

So I actually think last year's video was probably better, though I haven't watched them in full.

Take freely from the code samples and "cheat sheet"!

[] permanent link and comments

Comment form

The following HTML is supported: <a href>, , , , <blockquote>, , , <abbr>, <acronym>, <big>, <cite>, <code>, <dfn>, <kbd>, <pre>, , , , <tt>, <var>
I do not display your email address. It is for my personal use only.

Name:
Your email address:
Your website:

Comment:

Asheeshworld Notes you will like

Sat, 03 Apr 2010

After "Scrape the Web", 2010