From yuri at sims.berkeley.edu Wed Aug 27 07:32:03 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Wed Aug 27 07:44:57 2008 Subject: [Sputnik-list] sputnik search alpha Message-ID: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes). This is without any caching, and it would of course be much faster with caching, when the queries are repeated. (Or, rather, the thing to do would probably be to catch matches for individual terms, and then and them.)
You can try it at http://sputnik.freewisdom.org/
The commits are:
http://gitorious.org/projects/sputnik/repos/mainline/commits/ebb92bb (the main code, 83 lines) http://gitorious.org/projects/sputnik/repos/mainline/commits/7f1cc73 (the search node, 10 lines)
You can enter multiple keywords which are treated as an implicit "and". There is no stemming or substring searching, but adding basic English stemming shouldn't be too complicated and shouldn't affect performance too much.
Should this come with Sputnik by default?
- yuri
-- http://sputnik.freewisdom.org/
From carregal at fabricadigital.com.br Wed Aug 27 12:49:20 2008 From: carregal at fabricadigital.com.br (Andre Carregal) Date: Wed Aug 27 13:01:35 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com Message-ID: 92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com
On Wed, Aug 27, 2008 at 6:32 AM, Yuri Takhteyev yuri@sims.berkeley.edu wrote:
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes). This is without any caching, and it would of course be much faster with caching, when the queries are repeated. (Or, rather, the thing to do would probably be to catch matches for individual terms, and then and them.)
You can try it at http://sputnik.freewisdom.org/
Very nice!
Should this come with Sputnik by default?
Do I hear heavenly music? :o)
BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe
Andr?
From yuri at sims.berkeley.edu Wed Aug 27 14:58:42 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Wed Aug 27 15:11:27 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: 92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
Message-ID: fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com
Can sputnik be patched to generate indexes that can be used in the search?
Not sure i understood what this means.
BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe
Yes, it is like naive version of pagerank. (An alternative would be to count actual visits.) However, I don't expect too much from it, so I didn't make it the default sorting order. Note though, in case this wasn't clear, that you can click on the header of the table to resort it. So, you can view the results aphabetically (the default), sorted by then number of backlinks, or sorted by the time of the last modifications. (In the latter case, I am again assuming that pages that were rated more recently are more likely to be relevant.)
The next time I need to procrastinate, I'll look into implementing a simple ranking algorithm
- yuri
-- http://sputnik.freewisdom.org/
From petite.abeille at gmail.com Wed Aug 27 16:52:58 2008 From: petite.abeille at gmail.com (Petite Abeille) Date: Wed Aug 27 17:05:19 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com Message-ID: A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com
On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote:
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes).
As an alternative, SQLite's full text search module (aka FTS3) is
rather fast and reasonably easy to integrate with Lua.
Here is an example of FTS3 at work, searching around 4,658 pages:
http://svr225.stepx.com:3388/search?q=chicago http://svr225.stepx.com:3388/search?q=lua http://svr225.stepx.com:3388/search?q=sputnik
Cheers,
-- PA. http://alt.textdrive.com/nanoki/