Given a piece of XML:
403276Trivet00
One might assume that REXML is the way to parse it, but we all know how slow it is.
Enter _why’s HTML parser, Hpricot. It’s written in C and since XHTML is a subset of XML, there’s no reason it shouldn’t be able to parse my file.
Turns out it does, it’s really fast, and the code is dead simple.
FIELDS = %w[SKU ItemName CollectionNo Pages]
doc = Hpricot.parse(File.read(“my.xml”))
(doc/:product).each do |xml_product|
product = Product.new
for field in FIELDS
product[field] = (xml_product/field.intern).first.innerHTML
end
product.save
end
Update: Slight refactoring of the code above. Chris figured out last night that you can use innerHTML which eliminated the only ugly part of the code.
1 year ago