May 20, 2008

Given a piece of XML: 403276Trivet00 One might assume that REXML is the way to parse it, but we all know how slow it is. Enter _why’s HTML parser, Hpricot. It’s written in C and since XHTML is a subset of XML, there’s no reason it shouldn’t be able to parse my file. Turns out it does, it’s really fast, and the code is dead simple. FIELDS = %w[SKU ItemName CollectionNo Pages] doc = Hpricot.parse(File.read(“my.xml”)) (doc/:product).each do |xml_product| product = Product.new for field in FIELDS product[field] = (xml_product/field.intern).first.innerHTML end product.save end Update: Slight refactoring of the code above. Chris figured out last night that you can use innerHTML which eliminated the only ugly part of the code.

Clicky Web Analytics