Tech Forum

Posted: **Wed Jan 02, 2019 6:26 pm**

I am creating calendar entries for the Come Follow Me Individual and Family lessons. To avoid having to manually type in the info for the weekly lesson, I want to programatically extract it from the Church web page for each week.

Based on clicking around the web page, I see that the format of the URL for each week is uniform, and it looks like this:

https://www.lds.org/study/manual/come-follow-me-for-individuals-and-families-new-testament-2019/XX?lang=eng

where "XX" goes from 01, 02, etc. up to 50.

Easy enough, I write a simple script to use "wget" to pull down each week's page to my hard drive where I can extract the information.

But I am not succeeding getting all of the weeks. In fact, I only get weeks 01 through 05, and then 10. Every other week number gets me a small web page with "This page is unavailable. Error code: 2-1919"

I know the URL for all those weeks is correct, since if I put that URL in my Chrome browser window I get the correct web page. But retrieving with "wget" doesn't work.

If I were receiving the "Not found" for every attempted access, I would suspect something wrong with my script. But since I get some -- but not all -- of the week pages, I suspect something with the lds.org web page.

I tried adding a "Referer:" header to the request (with an lds.org web page as the referer) but that didn't change anything.

Does anyone have any suggestions on how to get all of the weekly web pages downloaded?

Thanks,

Steven

Posted: **Thu Jan 03, 2019 8:39 am**

I have seen the "This page is unavailable. Error code: 2-1919" error when I try to access too many pages on LDS.org in too short a period of time. I suspect that there is some sort of security restriction to prevent the server from getting overloaded. Maybe adding a pause between each page would help.

Posted: **Thu Jan 03, 2019 9:03 am**

Good suggestion. I added a 30-second delay between each page request (should be plenty of time), and it didn't change the outcome. Still getting the errors for the same pages.

Posted: **Thu Jan 03, 2019 10:43 am**

Ordinarily, I'd suggest ctrl-F5 on those pages to make sure you're not pulling from cache. But I'm not sure how to do that with wget.

Posted: **Mon Jan 07, 2019 6:47 pm**

The program I am using "wget" doesn't have the concept of a cache - it goes out to the web page directly.

I did some more testing, and it looks like things are not consistent. One time I would get the "unavailable" page and another time it will be served up just fine. I've tried it from multiple computers so I think it is something with lds.org

Note that when I actually use a browser, I get the pages every time no problem. It's only using this alternate method does it most times not work.

Tech Forum

"Not Found" returned on some Come Follow Me lesson pages (wget)

"Not Found" returned on some Come Follow Me lesson pages (wget)

Re: "Not Found" returned on some Come Follow Me lesson pages (wget)

Re: "Not Found" returned on some Come Follow Me lesson pages (wget)

Re: "Not Found" returned on some Come Follow Me lesson pages (wget)

Re: "Not Found" returned on some Come Follow Me lesson pages (wget)