3

Algorithm used to show related questions

What algorithms are being used in askbot to show the "related questions" (1) in the right panel and (2) below the title when you are adding a new question ? Are they simply based on common tags and common strings in the body of the questions ? For (2) Does it keep track of which questions are being "clicked" by users ?

kintali's avatar
141
kintali
asked 2012-07-06 23:27:21 -0500
edit flag offensive 0 remove flag close merge delete

Comments

add a comment see more comments

1 Answer

1

It's in askbot.models.question.Thread.get_similar_threads.

First up to 100 questions with matching tags are selected, then similarity is calculated as number of overlapping tags, then 10 most similar threads are shown.

Not a rigorous algorithm at all, maybe you could suggest something better?

The algorithm should be either fast enough to generate the list in real time or we'd need to denormalize the list and recalculate periodically. Now it is not too slow and the result is stored in the cache so we won't need to do that computation every time.

Evgeny's avatar
13.2k
Evgeny
answered 2012-07-08 20:34:01 -0500
edit flag offensive 0 remove flag delete link

Comments

I can help you implement a better algorithm once you pick a search backend.

Joseph's avatar Joseph (2012-07-13 15:01:08 -0500) edit

I can see that Xapian readily provides functionality for finding a set of documents similar to a given one; I guess Lucene would have something equivalent. See: [http://trac.xapian.org/wiki/FAQ/FindSimilar], [http://trac.xapian.org/wiki/FAQ/EliteSet]. We would just need to fiddle with the factors specific to a QA forum, (namely the relative weights for title/tag/question/answer terms) to try to optimize the relevance.

Basel Shishani's avatar Basel Shishani (2012-08-24 03:53:15 -0500) edit
add a comment see more comments