2008-08-27 07:32 |
yuri
Yuri Takhteyev <yuri at sims.berkeley.edu>
I implemented "dumb" search for Sputnik, which means simply scanning the nodes when a query is made, and while the performance is not stellar, it seems to be bearable for the sputnik site itself (193 nodes). This is without any caching, and it would of course be much faster with caching, when the queries are repeated. (Or, rather, the thing to do would probably be to catch matches for individual terms, and then and them.) You can try it at http://sputnik.freewisdom.org/ The commits are: http://gitorious.org/projects/sputnik/repos/mainline/commits/ebb92bb (the main code, 83 lines) http://gitorious.org/projects/sputnik/repos/mainline/commits/7f1cc73 (the search node, 10 lines) You can enter multiple keywords which are treated as an implicit "and". There is no stemming or substring searching, but adding basic English stemming shouldn't be too complicated and shouldn't affect performance too much. Should this come with Sputnik by default? - yuri -- http://sputnik.freewisdom.org/
2008-08-27 12:49 |
carregal
Andre Carregal <carregal at fabricadigital.com.br>
On Wed, Aug 27, 2008 at 6:32 AM, Yuri Takhteyev <yuri@sims.berkeley.edu> wrote: > I implemented "dumb" search for Sputnik, which means simply scanning > the nodes when a query is made, and while the performance is not > stellar, it seems to be bearable for the sputnik site itself (193 > nodes). This is without any caching, and it would of course be much > faster with caching, when the queries are repeated. (Or, rather, the > thing to do would probably be to catch matches for individual terms, > and then and them.) > > You can try it at http://sputnik.freewisdom.org/ Very nice! > Should this come with Sputnik by default? Do I hear heavenly music? :o) BTW, why show the backlink counts? If you are assuming this could be important info, why not use them as sort order instead of the alphabetic one? Page rank at its minimal... hehe Andr?
2008-08-27 14:58 |
yuri
Yuri Takhteyev <yuri at sims.berkeley.edu>
> Can sputnik be patched to generate indexes that can be used in the search? Not sure i understood what this means. > BTW, why show the backlink counts? If you are assuming this could be > important info, why not use them as sort order instead of the > alphabetic one? Page rank at its minimal... hehe Yes, it is like naive version of pagerank. (An alternative would be to count actual visits.) However, I don't expect too much from it, so I didn't make it the default sorting order. Note though, in case this wasn't clear, that you can click on the header of the table to resort it. So, you can view the results aphabetically (the default), sorted by then number of backlinks, or sorted by the time of the last modifications. (In the latter case, I am again assuming that pages that were rated more recently are more likely to be relevant.) The next time I need to procrastinate, I'll look into implementing a simple ranking algorithm - yuri -- http://sputnik.freewisdom.org/
2008-08-27 16:52 |
petite.abeille
Petite Abeille <petite.abeille at gmail.com>
On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote: > I implemented "dumb" search for Sputnik, which means simply scanning > the nodes when a query is made, and while the performance is not > stellar, it seems to be bearable for the sputnik site itself (193 > nodes). As an alternative, SQLite's full text search module (aka FTS3) is rather fast and reasonably easy to integrate with Lua. Here is an example of FTS3 at work, searching around 4,658 pages: http://svr225.stepx.com:3388/search?q=chicago http://svr225.stepx.com:3388/search?q=lua http://svr225.stepx.com:3388/search?q=sputnik Cheers, -- PA. http://alt.textdrive.com/nanoki/
2008-08-27 17:15 |
yuri
Yuri Takhteyev <yuri at sims.berkeley.edu>
> As an alternative, SQLite's full text search module (aka FTS3) is rather > fast and reasonably easy to integrate with Lua. I am not sure I want to make SQLite a required dependency for the default installation, but this may very well be worth while as a plugin. Do you have the code? > Here is an example of FTS3 at work, searching around 4,658 pages: If someone realistically wanted to use Sputnik for a wiki with 4,658 pages, I would put some serious effort into offering better search options. Right now our top case is ~200 pages, I think. I am working on converting my photoblog to Sputnik, which would be around 1200 pages, but those mostly just need tagging. So, I don't know whether scaling search is the most important thing right now. But if it could be done easily, then surely it would be a good thing to offer. - yuri -- http://sputnik.freewisdom.org/
2008-08-27 17:23 |
petite.abeille
Petite Abeille <petite.abeille at gmail.com>
Hello, On Aug 27, 2008, at 9:15 PM, Yuri Takhteyev wrote: >> As an alternative, SQLite's full text search module (aka FTS3) is >> rather >> fast and reasonably easy to integrate with Lua. > > I am not sure I want to make SQLite a required dependency for the > default installation, In the case of Nanoki, SQLite is optional. If it's there it's used, otherwise Nanoki only provides a default title search which doesn't require any additional libraries. > but this may very well be worth while as a > plugin. Do you have the code? http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl http://dev.alt.textdrive.com/browser/HTTP/Finder.dml http://dev.alt.textdrive.com/browser/HTTP/Finder.lua Aside from SQLite itself, this requires LuaSQL driver for SQLite. Here is a little tutorial to enable FTS in SQLite: "Full-Text Search on SQLite" http://blog.michaeltrier.com/tags/fts >> Here is an example of FTS3 at work, searching around 4,658 pages: > > If someone realistically wanted to use Sputnik for a wiki with 4,658 > pages, I would put some serious effort into offering better search > options. Right now our top case is ~200 pages, I think. I am working > on converting my photoblog to Sputnik, which would be around 1200 > pages, but those mostly just need tagging. So, I don't know whether > scaling search is the most important thing right now. But if it could > be done easily, then surely it would be a good thing to offer. Cheers, -- PA. http://alt.textdrive.com/nanoki/
2008-08-27 18:40 |
carregal
Andre Carregal <carregal at fabricadigital.com.br>
On Wed, Aug 27, 2008 at 1:58 PM, Yuri Takhteyev <yuri@sims.berkeley.edu> wrote: >> Can sputnik be patched to generate indexes that can be used in the search? > > Not sure i understood what this means. Me either, this wasn't on my message. :o) >> BTW, why show the backlink counts? If you are assuming this could be >> important info, why not use them as sort order instead of the >> alphabetic one? Page rank at its minimal... hehe > > Yes, it is like naive version of pagerank. (An alternative would be > to count actual visits.) However, I don't expect too much from it, so > I didn't make it the default sorting order. Note though, in case this > wasn't clear, that you can click on the header of the table to resort > it. So, you can view the results aphabetically (the default), sorted > by then number of backlinks, or sorted by the time of the last > modifications. (In the latter case, I am again assuming that pages > that were rated more recently are more likely to be relevant.) I guess then it's a matter of style. I still keep my vote for a simpler listing... > The next time I need to procrastinate, I'll look into implementing a > simple ranking algorithm I hope I didn't sound like I was complaining about the search, I really like it. I was just mentioning an UI issue. Andr?
2008-08-28 03:35 |
dm.lua
David Manura <dm.lua at math2.org>
> On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote: >> I implemented "dumb" search for Sputnik, which means simply scanning >> the nodes when a query is made, and while the performance is not >> stellar, it seems to be bearable for the sputnik site itself (193 >> nodes). Nice, thanks. This may simplify the code though: diff --git a/sputnik/lua/sputnik/actions/search.lua b/sputnik/lua/sputnik/actions/search.lua index e7c262e..a9757da 100644 --- a/sputnik/lua/sputnik/actions/search.lua +++ b/sputnik/lua/sputnik/actions/search.lua @@ -28,41 +28,34 @@ TEMPLATE = [[ actions.show_results = function(node, request, sputnik) local query = {} + local nquery = 0 for term in (request.params.q or ""):lower():gmatch("%w+") do - query[term] = {} + if not query[term] then + query[term] = true + nquery = nquery + 1 + end end node.title = 'Search for "'..request.params.q..'"' local backlinks = {} + local nodes = {} for i, node in ipairs(wiki.get_visible_nodes(sputnik, nil)) do if node.content and type(node.content)=="string" then + local found = {} + local nfound = 0 for word in node.content:lower():gmatch("%w+") do - if query[word] then - query[word][node.id] = node + if query[word] and not found[word] then + found[word] = true + nfound = nfound + 1 end end + if nfound == nquery then + table.insert(nodes, node) + end for id in node.content:gmatch("%[%[([^%]]*)%]%]") do backlinks[id] = (backlinks[id] or 0) + 1 end end end - local nodes = {} - for term, matches in pairs(query) do - for id, node in pairs(matches) do - nodes[id] = node - end - end - local ordered_nodes = {} - for id, node in pairs(nodes) do - for term, _ in pairs(query) do - if not query[term][node.id] then - nodes[id] = nil - end - end - if nodes[id] then - table.insert(ordered_nodes, node) - end - end - nodes = ordered_nodes table.sort(nodes, function(x,y) return x.id < y.id end) node:add_javascript_snippet(sorttable.script) node.inner_html = util.f(TEMPLATE){ On Wed, Aug 27, 2008 at 4:40 PM, Andre Carregal wrote: > I guess then it's a matter of style. I still keep my vote for a > simpler listing... I tend to agree that the number of backlinks is not that important for the user to see. Moreover, it's not clear that backlink counts or ranking in general is that useful in a small wiki maintained by a small group of people. I haven't missed it on the lua-users.org/wiki search. Some things that can do much toward narrowing the search results are quoted phrase searching and negative logic. Phrase searching would ideally support phrases of non-alphanumeric characters--i.e. non-tokenized (e.g. "C++" or "obj:method()"). I may add those if no-one beats me to them.
2008-09-02 06:53 |
yuri
Yuri Takhteyev <yuri at sims.berkeley.edu>
> Nice, thanks. This may simplify the code though: Thanks for the patch. Unfortunately, I had already started a major refactoring so the patch no longer applies. I am trying to figure out how much of the functionality to push into Saci and how generic to make the interface. I am now leaning on having saci provide a query() method that would retrieve a list of nodes matching the query (as well as another one that would allow you to search within specific fields), but without any ranking. Something like: local nodes = saci:query("lua table -git") or local nodes = saci:find_nodes({"title", "tags"}, {"lua", "table"}, "Tickets") (the third parameter is the prefix within which to search). But this is not set in stone yet. > I tend to agree that the number of backlinks is not that important for > the user to see. Moreover, it's not clear that backlink counts or > ranking in general is that useful in a small wiki maintained by a > small group of people. I haven't missed it on the lua-users.org/wiki > search. Agreed. I removed them, since this required breaking all sorts of encapsulation. > Some things that can do much toward narrowing the search results are > quoted phrase searching and negative logic. Phrase searching would > ideally support phrases of non-alphanumeric characters--i.e. > non-tokenized (e.g. "C++" or "obj:method()"). I may add those if > no-one beats me to them. I implemented negative terms, as well as an option to limit search by node id prefix. So you can search for, say, "git -storage prefix:Ticket". - yuri -- http://sputnik.freewisdom.org/