Page 1 of 1

scriptures.lds.org/nl - HTML download and cleanup

Posted: Mon Sep 27, 2010 7:19 am
by mark235-p40
I'm trying to create HTML files of the Dutch scriptures so I can convert those to a format that my e-reader will accept. The obvious source for this would be scriptures.lds.org/nl since the new Dutch translation is available there. Nice job guys :)

I've tried to download parts of the Book of Mormon using website grabbers like HTTRACK, but this leaves me with heavily formatted HTML, which is rather difficult to clean and prepare for conversion.

So my question is: is there a way to extract the complete Dutch scripture text from the website and leave out the rest? I've added an attachment to show which part of the website I'm referring to.

If possible, I would like to end up with a single HTML file looking something like this:
http://www.battleforce.com/ldspalm/html/bom.html