"Not Found" returned on some Come Follow Me lesson pages (wget)
Posted: Wed Jan 02, 2019 6:26 pm
I am creating calendar entries for the Come Follow Me Individual and Family lessons. To avoid having to manually type in the info for the weekly lesson, I want to programatically extract it from the Church web page for each week.
Based on clicking around the web page, I see that the format of the URL for each week is uniform, and it looks like this:
Easy enough, I write a simple script to use "wget" to pull down each week's page to my hard drive where I can extract the information.
But I am not succeeding getting all of the weeks. In fact, I only get weeks 01 through 05, and then 10. Every other week number gets me a small web page with "This page is unavailable. Error code: 2-1919"
I know the URL for all those weeks is correct, since if I put that URL in my Chrome browser window I get the correct web page. But retrieving with "wget" doesn't work.
If I were receiving the "Not found" for every attempted access, I would suspect something wrong with my script. But since I get some -- but not all -- of the week pages, I suspect something with the lds.org web page.
I tried adding a "Referer:" header to the request (with an lds.org web page as the referer) but that didn't change anything.
Does anyone have any suggestions on how to get all of the weekly web pages downloaded?
Thanks,
Steven
Based on clicking around the web page, I see that the format of the URL for each week is uniform, and it looks like this:
where "XX" goes from 01, 02, etc. up to 50.https://www.lds.org/study/manual/come-follow-me-for-individuals-and-families-new-testament-2019/XX?lang=eng
Easy enough, I write a simple script to use "wget" to pull down each week's page to my hard drive where I can extract the information.
But I am not succeeding getting all of the weeks. In fact, I only get weeks 01 through 05, and then 10. Every other week number gets me a small web page with "This page is unavailable. Error code: 2-1919"
I know the URL for all those weeks is correct, since if I put that URL in my Chrome browser window I get the correct web page. But retrieving with "wget" doesn't work.
If I were receiving the "Not found" for every attempted access, I would suspect something wrong with my script. But since I get some -- but not all -- of the week pages, I suspect something with the lds.org web page.
I tried adding a "Referer:" header to the request (with an lds.org web page as the referer) but that didn't change anything.
Does anyone have any suggestions on how to get all of the weekly web pages downloaded?
Thanks,
Steven