Statistically Linked Verses / Passages

Discussions about the Notes and Journal tool on LDS.org. This includes the Study Toolbar as well as the scriptures and other content on LDS.org that is integrated with Notes and Journal.
josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Statistically Linked Verses / Passages

Postby josh-p40 » Thu Feb 08, 2007 11:09 am

I think it would be great to use statistics and/or machine learning to automatically create verse links with associated probabilities.

There are a lot of way this could be done, from simple to complex. One simple way would just be using a naive Bayes or "Bag of Words" approach: I click on a word, and then the program finds other verses that have that same word in it, but also have similar surrounding words.

These aren't my ideas alone, I give credit to Harold Stuart, the author of HandyScriptures for first suggesting linking scriptures by their base languages (Hebrew / Greek / etc), and to another friend of mine I won't mention without his permission who also suggested we could detect an infinite number of scripture topics using text analytics combined with some non-parametric Bayesian modeling.

In short, the latter approach would automatically lead to topics growing out of the text. The number of topics and size can be determined automatically.

Anyways, I just think it would be really nice, and the simpler approaches are not hard if you have an index.
--josh

User avatar
ShellineSE
New Member
Posts: 28
Joined: Fri Nov 10, 2006 3:04 pm
Location: Utah
Contact:

Postby ShellineSE » Mon Feb 12, 2007 8:55 am

I'm intrigued--can you provide some use cases for what you're suggesting? BTW, you might be interested to know that the new search on LDS.org is driven by a commercial Bayesian engine.

josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Postby josh-p40 » Mon Feb 12, 2007 10:40 am

The high level goal is that you click on a word in a verse and get returned to you a list of verses sorted by relevance. Relevance in this case is a measure of how much those other verses use that word in a similar context.

One simple example could be clicking on "commandments" in 1 Nephi 3:7 brings up a verse like 1 Nephi 17:3 first instead of "TG Commandments of God".

It detects the context of "commandments" in this case refers to the Lord's promises to help you fulfil them, and first returns verses using commandments in that context.


A higher end use case would be to return an entire hierarchial set up topics automatically. As a simple example, lets say I model the scriptures using a hierarchical latent dirichlet process. It returns to me first the words "faith, repentance, baptism"

I click on faith and it returns to me sub-topics under faith. Those sub-topics have other sub-topics, etc.

The cool part is that the topics are all auto-matically created using a state-of-the-art statistical modeling procedure.
--josh

User avatar
ShellineSE
New Member
Posts: 28
Joined: Fri Nov 10, 2006 3:04 pm
Location: Utah
Contact:

Postby ShellineSE » Mon Feb 12, 2007 10:54 am

Do you think this would be a substantial improvement over the Topical Guide? Granted, it would be more exhaustive, but would it uncover relationships that are not well-understood already? One problem that we've found with bayesian inference is that the vocabulary of Church content is relatively narrow, i.e., the ratio of rare (information-carrying) words is low compared to common words. Getting a meaningful taxonomy from such a limited vocabulary using statistical/probability approaches is challenging. Still, I would be interested in seeing a prototype of what you're suggesting.

In the end, of course, an examination of the scriptures should be more a spiritual journey than a mathematical one.

josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Postby josh-p40 » Mon Feb 12, 2007 12:04 pm

shellinese wrote:Do you think this would be a substantial improvement over the Topical Guide? Granted, it would be more exhaustive, but would it uncover relationships that are not well-understood already?


I personally think it would uncover "additional witnesses" that would help people understand concepts a little more. Of course I don't think it would replace inspiration.

The idea is just to make it so when you click a word, you get a more relevant set of results.

shellinese wrote:One problem that we've found with bayesian inference is that the vocabulary of Church content is relatively narrow, i.e., the ratio of rare (information-carrying) words is low compared to common words. Getting a meaningful taxonomy from such a limited vocabulary using statistical/probability approaches is challenging. Still, I would be interested in seeing a prototype of what you're suggesting.


I think we could get around this. One way is to have a predefined set of words that we "care" about, the rest we ignore. This predefined vocabulary could be generated as simply as just using those words that have topics in the Topical Guide, Bible Dictionary, or Index.

Here's a simple, brute force approach:

1. Take a complete verse-based index of the standard works, cull out all but the "interesting vocabulary" and save that.

2. When someone clicks on "faith", grab all of the verses from my index that have "faith" in them. For each of those verses, count the number of "interesting words" that occur in both the original verse and the verse being examined. Then, sort descending on that number and return the list.

The hardest parts here are:
1. Obtaining the "list of interesting words" (Topical Guide / Bible Dictionary / Index)
2. Obtaining a verse-based index of the standard works.

Again, this is the simplest approach I can think of, there are far more fancy ones (latent dirichlet allocation, etc) that we could get into down the road.


As for a prototype, I don't know when I'll have time to do one, especially working alone. I want to though.

If anyone else here is willing to help, this could be fun.
--josh

User avatar
mkmurray
Senior Member
Posts: 3241
Joined: Tue Jan 23, 2007 9:56 pm
Location: Utah
Contact:

Let's do it

Postby mkmurray » Mon Feb 12, 2007 1:33 pm

Josh,

I only took the AI class at BYU, not the Machine Learning. However, I'd love to help how I can. Perhaps I can get Logan interested in this too. This could be part of his interview process with you guys. It'll be like those little problems Microsoft gives you when you interveiw! :D

josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Postby josh-p40 » Tue Feb 13, 2007 12:10 pm

OK, I wrote a prototype, but it runs from the console.

I want to make a little web app out of it eventually, but for those of you who want to try it sooner:

Give me a verse and a phrase or word from that verse, and I will paste the search results here.



Verse anyone?
--josh

User avatar
mkmurray
Senior Member
Posts: 3241
Joined: Tue Jan 23, 2007 9:56 pm
Location: Utah
Contact:

Postby mkmurray » Tue Feb 13, 2007 3:48 pm

josh wrote:OK, I wrote a prototype, but it runs from the console.

I want to make a little web app out of it eventually, but for those of you who want to try it sooner:

Give me a verse and a phrase or word from that verse, and I will paste the search results here.



Verse anyone?

2 Ne. 31:19
"mighty to save"

Deut. 6:13
"fear the Lord"
...or should it be "fear the LORD" because of the small caps?

josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Postby josh-p40 » Tue Feb 13, 2007 4:17 pm

Here's one from a friend of mine:

Alma 32:21 on "faith"

Ether 12:6 Scores: 6.58823669484
And now, I, Moroni, would speak somewhat concerning these things; I would show unto the world that faith is things which are hoped for and not seen; wherefore, dispute not because ye see not, for ye receive no witness until after the trial of your faith.

Alma 7:17 Score: 6.47403262293
And now my beloved brethren, do you believe these things? Behold, I say unto you, yea, I know that ye believe them; and the way that I know that ye believe them is by the manifestation of the Spirit which is in me. And now because your faith is strong concerning that, yea, concerning the things which I have spoken, great is my joy.

Jacob 7:5 Score: 5.75066079295
And he had hope to shake me from the faith, notwithstanding the many revelations and the many things which I had seen concerning these things; for I truly had seen angels, and they had ministered unto me. And also, I had heard the voice of the Lord speaking unto me in very word, from time to time; wherefore, I could not be shaken.
--josh

josh-p40
New Member
Posts: 24
Joined: Fri Jan 26, 2007 10:25 am
Location: San Jose, CA

Postby josh-p40 » Tue Feb 13, 2007 4:17 pm

mkmurray wrote:2 Ne. 31: 19

"mighty to save"


Here are the top 3 with their "scores"

Alma 7:14 score: 10.0421240624
Now I say unto you that ye must repent, and be born again; for the Spirit saith if ye are not born again ye cannot inherit the kingdom of heaven; therefore come and be baptized unto repentance, that ye may be washed from your sins, that ye may have faith on the Lamb of God, who taketh away the sins of the world, who is mighty to save and to cleanse from all unrighteousness.

Doctrine and Covenants 133:47 score:3.87831884748
And he shall say: I am he who spake in righteousness, mighty to save.

Alma 34:18 score: 3.55735206572
Yea, cry unto him for mercy; for he is mighty to save.
--josh


Return to “Notes and Journal, and Online Scriptures”

Who is online

Users browsing this forum: No registered users and 1 guest