From yuri at sims.berkeley.edu Wed Aug 27 07:32:03 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Wed Aug 27 07:44:57 2008 Subject: [Sputnik-list] sputnik search alpha Message-ID: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes). This is without any caching, and it would of course be much faster with caching, when the queries are repeated. (Or, rather, the thing to do would probably be to catch matches for individual terms, and then and them.)
You can try it at http://sputnik.freewisdom.org/
The commits are:
http://gitorious.org/projects/sputnik/repos/mainline/commits/ebb92bb (the main code, 83 lines) http://gitorious.org/projects/sputnik/repos/mainline/commits/7f1cc73 (the search node, 10 lines)
You can enter multiple keywords which are treated as an implicit "and". There is no stemming or substring searching, but adding basic English stemming shouldn't be too complicated and shouldn't affect performance too much.
Should this come with Sputnik by default?
- yuri
-- http://sputnik.freewisdom.org/
From carregal at fabricadigital.com.br Wed Aug 27 12:49:20 2008 From: carregal at fabricadigital.com.br (Andre Carregal) Date: Wed Aug 27 13:01:35 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com Message-ID: 92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com
On Wed, Aug 27, 2008 at 6:32 AM, Yuri Takhteyev yuri@sims.berkeley.edu wrote:
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes). This is without any caching, and it would of course be much faster with caching, when the queries are repeated. (Or, rather, the thing to do would probably be to catch matches for individual terms, and then and them.)
You can try it at http://sputnik.freewisdom.org/
Very nice!
Should this come with Sputnik by default?
Do I hear heavenly music? :o)
BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe
Andr?
From yuri at sims.berkeley.edu Wed Aug 27 14:58:42 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Wed Aug 27 15:11:27 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: 92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
Message-ID: fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com
Can sputnik be patched to generate indexes that can be used in the search?
Not sure i understood what this means.
BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe
Yes, it is like naive version of pagerank. (An alternative would be to count actual visits.) However, I don't expect too much from it, so I didn't make it the default sorting order. Note though, in case this wasn't clear, that you can click on the header of the table to resort it. So, you can view the results aphabetically (the default), sorted by then number of backlinks, or sorted by the time of the last modifications. (In the latter case, I am again assuming that pages that were rated more recently are more likely to be relevant.)
The next time I need to procrastinate, I'll look into implementing a simple ranking algorithm
- yuri
-- http://sputnik.freewisdom.org/
From petite.abeille at gmail.com Wed Aug 27 16:52:58 2008 From: petite.abeille at gmail.com (Petite Abeille) Date: Wed Aug 27 17:05:19 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com Message-ID: A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com
On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote:
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes).
As an alternative, SQLite's full text search module (aka FTS3) is
rather fast and reasonably easy to integrate with Lua.
Here is an example of FTS3 at work, searching around 4,658 pages:
http://svr225.stepx.com:3388/search?q=chicago http://svr225.stepx.com:3388/search?q=lua http://svr225.stepx.com:3388/search?q=sputnik
Cheers,
-- PA. http://alt.textdrive.com/nanoki/
From yuri at sims.berkeley.edu Wed Aug 27 17:15:31 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Wed Aug 27 17:28:17 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
Message-ID: fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com
As an alternative, SQLite's full text search module (aka FTS3) is rather fast and reasonably easy to integrate with Lua.
I am not sure I want to make SQLite a required dependency for the default installation, but this may very well be worth while as a plugin. Do you have the code?
Here is an example of FTS3 at work, searching around 4,658 pages:
If someone realistically wanted to use Sputnik for a wiki with 4,658 pages, I would put some serious effort into offering better search options. Right now our top case is ~200 pages, I think. I am working on converting my photoblog to Sputnik, which would be around 1200 pages, but those mostly just need tagging. So, I don't know whether scaling search is the most important thing right now. But if it could be done easily, then surely it would be a good thing to offer.
- yuri
-- http://sputnik.freewisdom.org/
From petite.abeille at gmail.com Wed Aug 27 17:23:02 2008 From: petite.abeille at gmail.com (Petite Abeille) Date: Wed Aug 27 17:35:23 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
<fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com>
Message-ID: 8B7744FB-25DD-455D-A10D-8DCC10CE35B7@gmail.com
Hello,
On Aug 27, 2008, at 9:15 PM, Yuri Takhteyev wrote:
As an alternative, SQLite's full text search module (aka FTS3) is
rather fast and reasonably easy to integrate with Lua.I am not sure I want to make SQLite a required dependency for the default installation,
In the case of Nanoki, SQLite is optional. If it's there it's used,
otherwise Nanoki only provides a default title search which doesn't
require any additional libraries.
but this may very well be worth while as a plugin. Do you have the code?
http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl http://dev.alt.textdrive.com/browser/HTTP/Finder.dml http://dev.alt.textdrive.com/browser/HTTP/Finder.lua
Aside from SQLite itself, this requires LuaSQL driver for SQLite.
Here is a little tutorial to enable FTS in SQLite:
"Full-Text Search on SQLite" http://blog.michaeltrier.com/tags/fts
Here is an example of FTS3 at work, searching around 4,658 pages:
If someone realistically wanted to use Sputnik for a wiki with 4,658 pages, I would put some serious effort into offering better search options. Right now our top case is ~200 pages, I think. I am working on converting my photoblog to Sputnik, which would be around 1200 pages, but those mostly just need tagging. So, I don't know whether scaling search is the most important thing right now. But if it could be done easily, then surely it would be a good thing to offer.
Cheers,
-- PA. http://alt.textdrive.com/nanoki/
From carregal at fabricadigital.com.br Wed Aug 27 18:40:50 2008 From: carregal at fabricadigital.com.br (Andre Carregal) Date: Wed Aug 27 18:53:07 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
Message-ID: 92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com
On Wed, Aug 27, 2008 at 1:58 PM, Yuri Takhteyev yuri@sims.berkeley.edu wrote:
Can sputnik be patched to generate indexes that can be used in the search?
Not sure i understood what this means.
Me either, this wasn't on my message. :o)
BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe
Yes, it is like naive version of pagerank. (An alternative would be to count actual visits.) However, I don't expect too much from it, so I didn't make it the default sorting order. Note though, in case this wasn't clear, that you can click on the header of the table to resort it. So, you can view the results aphabetically (the default), sorted by then number of backlinks, or sorted by the time of the last modifications. (In the latter case, I am again assuming that pages that were rated more recently are more likely to be relevant.)
I guess then it's a matter of style. I still keep my vote for a simpler listing...
The next time I need to procrastinate, I'll look into implementing a simple ranking algorithm
I hope I didn't sound like I was complaining about the search, I really like it. I was just mentioning an UI issue.
Andr?
From dm.lua at math2.org Thu Aug 28 03:35:49 2008 From: dm.lua at math2.org (David Manura) Date: Thu Aug 28 03:48:10 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: 92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
<92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
Message-ID: bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com
On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote:
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes).
Nice, thanks. This may simplify the code though:
diff --git a/sputnik/lua/sputnik/actions/search.lua b/sputnik/lua/sputnik/actions/search.lua index e7c262e..a9757da 100644 --- a/sputnik/lua/sputnik/actions/search.lua +++ b/sputnik/lua/sputnik/actions/search.lua @@ -28,41 +28,34 @@ TEMPLATE = [[
actions.show_results = function(node, request, sputnik)
local query = {}
+ local nquery = 0
for term in (request.params.q or ""):lower():gmatch("%w+") do
- query[term] = {} + if not query[term] then + query[term] = true + nquery = nquery + 1 + end
end
node.title = 'Search for "'..request.params.q..'"'
local backlinks = {}
+ local nodes = {}
for i, node in ipairs(wiki.get_visible_nodes(sputnik, nil)) do
if node.content and type(node.content)=="string" then
+ local found = {} + local nfound = 0
for word in node.content:lower():gmatch("%w+") do
- if query[word] then - query[word][node.id] = node + if query[word] and not found[word] then + found[word] = true + nfound = nfound + 1
end
end
+ if nfound == nquery then + table.insert(nodes, node) + end
for id in node.content:gmatch("%[%[([^%]]*)%]%]") do
backlinks[id] = (backlinks[id] or 0) + 1
end
end
end
- local nodes = {} - for term, matches in pairs(query) do - for id, node in pairs(matches) do - nodes[id] = node - end - end - local ordered_nodes = {} - for id, node in pairs(nodes) do - for term, _ in pairs(query) do - if not query[term][node.id] then - nodes[id] = nil - end - end - if nodes[id] then - table.insert(orderednodes, node) - end - end - nodes = orderednodes
table.sort(nodes, function(x,y) return x.id < y.id end)
node:add_javascript_snippet(sorttable.script)
node.inner_html = util.f(TEMPLATE){
On Wed, Aug 27, 2008 at 4:40 PM, Andre Carregal wrote:
I guess then it's a matter of style. I still keep my vote for a simpler listing...
I tend to agree that the number of backlinks is not that important for the user to see. Moreover, it's not clear that backlink counts or ranking in general is that useful in a small wiki maintained by a small group of people. I haven't missed it on the lua-users.org/wiki search.
Some things that can do much toward narrowing the search results are quoted phrase searching and negative logic. Phrase searching would ideally support phrases of non-alphanumeric characters--i.e. non-tokenized (e.g. "C++" or "obj:method()"). I may add those if no-one beats me to them.
From yuri at sims.berkeley.edu Tue Sep 2 06:53:49 2008 From: yuri at sims.berkeley.edu (Yuri Takhteyev) Date: Tue Sep 2 07:06:22 2008 Subject: [Sputnik-list] sputnik search alpha In-Reply-To: bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com References: fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
<92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
<bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com>
Message-ID: fa4efbc00809020153r514a53bfr3f014827fcd81af1@mail.gmail.com
Nice, thanks. This may simplify the code though:
Thanks for the patch. Unfortunately, I had already started a major refactoring so the patch no longer applies. I am trying to figure out how much of the functionality to push into Saci and how generic to make the interface. I am now leaning on having saci provide a query() method that would retrieve a list of nodes matching the query (as well as another one that would allow you to search within specific fields), but without any ranking. Something like:
local nodes = saci:query("lua table -git")
or
local nodes = saci:find_nodes({"title", "tags"}, {"lua", "table"},
"Tickets")
(the third parameter is the prefix within which to search).
But this is not set in stone yet.
I tend to agree that the number of backlinks is not that important for the user to see. Moreover, it's not clear that backlink counts or ranking in general is that useful in a small wiki maintained by a small group of people. I haven't missed it on the lua-users.org/wiki search.
Agreed. I removed them, since this required breaking all sorts of encapsulation.
Some things that can do much toward narrowing the search results are quoted phrase searching and negative logic. Phrase searching would ideally support phrases of non-alphanumeric characters--i.e. non-tokenized (e.g. "C++" or "obj:method()"). I may add those if no-one beats me to them.
I implemented negative terms, as well as an option to limit search by node id prefix. So you can search for, say, "git -storage prefix:Ticket".
- yuri
-- http://sputnik.freewisdom.org/