- 000009 by yuri@... on 2008/09/02 at 11:53 GMT
000002 by carregal@... on 2008/08/27 at 17:49 GMT
Page Content
From yuri at sims.berkeley.edu Wed Aug 27 07:32:03 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Wed Aug 27 07:44:57 2008
Subject: [Sputnik-list] sputnik search alpha
Message-ID: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
I implemented "dumb" search for Sputnik, which means simply scanning
the nodes when a query is made, and while the performance is not
stellar, it seems to be bearable for the sputnik site itself (193
nodes). This is without any caching, and it would of course be much
faster with caching, when the queries are repeated. (Or, rather, the
thing to do would probably be to catch matches for individual terms,
and then and them.)
You can try it at http://sputnik.freewisdom.org/
The commits are:
http://gitorious.org/projects/sputnik/repos/mainline/commits/ebb92bb
(the main code, 83 lines)
http://gitorious.org/projects/sputnik/repos/mainline/commits/7f1cc73
(the search node, 10 lines)
You can enter multiple keywords which are treated as an implicit
"and". There is no stemming or substring searching, but adding basic
English stemming shouldn't be too complicated and shouldn't affect
performance too much.
Should this come with Sputnik by default?
- yuri
--
http://sputnik.freewisdom.org/
From carregal at fabricadigital.com.br Wed Aug 27 12:49:20 2008
From: carregal at fabricadigital.com.br (Andre Carregal)
Date: Wed Aug 27 13:01:35 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
Message-ID: <92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
On Wed, Aug 27, 2008 at 6:32 AM, Yuri Takhteyev <yuri@sims.berkeley.edu> wrote:
> I implemented "dumb" search for Sputnik, which means simply scanning
> the nodes when a query is made, and while the performance is not
> stellar, it seems to be bearable for the sputnik site itself (193
> nodes). This is without any caching, and it would of course be much
> faster with caching, when the queries are repeated. (Or, rather, the
> thing to do would probably be to catch matches for individual terms,
> and then and them.)
>
> You can try it at http://sputnik.freewisdom.org/
Very nice!
> Should this come with Sputnik by default?
Do I hear heavenly music? :o)
BTW, why show the backlink counts? If you are assuming this could be
important info, why not use them as sort order instead of the
alphabetic one? Page rank at its minimal... hehe
Andr?
From yuri at sims.berkeley.edu Wed Aug 27 14:58:42 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Wed Aug 27 15:11:27 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
Message-ID: <fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
> Can sputnik be patched to generate indexes that can be used in the search?
Not sure i understood what this means.
> BTW, why show the backlink counts? If you are assuming this could be
> important info, why not use them as sort order instead of the
> alphabetic one? Page rank at its minimal... hehe
Yes, it is like naive version of pagerank. (An alternative would be
to count actual visits.) However, I don't expect too much from it, so
I didn't make it the default sorting order. Note though, in case this
wasn't clear, that you can click on the header of the table to resort
it. So, you can view the results aphabetically (the default), sorted
by then number of backlinks, or sorted by the time of the last
modifications. (In the latter case, I am again assuming that pages
that were rated more recently are more likely to be relevant.)
The next time I need to procrastinate, I'll look into implementing a
simple ranking algorithm
- yuri
--
http://sputnik.freewisdom.org/
From petite.abeille at gmail.com Wed Aug 27 16:52:58 2008
From: petite.abeille at gmail.com (Petite Abeille)
Date: Wed Aug 27 17:05:19 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
Message-ID: <A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote:
> I implemented "dumb" search for Sputnik, which means simply scanning
> the nodes when a query is made, and while the performance is not
> stellar, it seems to be bearable for the sputnik site itself (193
> nodes).
As an alternative, SQLite's full text search module (aka FTS3) is
rather fast and reasonably easy to integrate with Lua.
Here is an example of FTS3 at work, searching around 4,658 pages:
http://svr225.stepx.com:3388/search?q=chicago
http://svr225.stepx.com:3388/search?q=lua
http://svr225.stepx.com:3388/search?q=sputnik
Cheers,
--
PA.
http://alt.textdrive.com/nanoki/
From yuri at sims.berkeley.edu Wed Aug 27 17:15:31 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Wed Aug 27 17:28:17 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
Message-ID: <fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com>
> As an alternative, SQLite's full text search module (aka FTS3) is rather
> fast and reasonably easy to integrate with Lua.
I am not sure I want to make SQLite a required dependency for the
default installation, but this may very well be worth while as a
plugin. Do you have the code?
> Here is an example of FTS3 at work, searching around 4,658 pages:
If someone realistically wanted to use Sputnik for a wiki with 4,658
pages, I would put some serious effort into offering better search
options. Right now our top case is ~200 pages, I think. I am working
on converting my photoblog to Sputnik, which would be around 1200
pages, but those mostly just need tagging. So, I don't know whether
scaling search is the most important thing right now. But if it could
be done easily, then surely it would be a good thing to offer.
- yuri
--
http://sputnik.freewisdom.org/
From petite.abeille at gmail.com Wed Aug 27 17:23:02 2008
From: petite.abeille at gmail.com (Petite Abeille)
Date: Wed Aug 27 17:35:23 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<A6808BA7-C1B6-4D5C-A65C-D48499D6F518@gmail.com>
<fa4efbc00808271215m187a17eybc0bcab8bb63653f@mail.gmail.com>
Message-ID: <8B7744FB-25DD-455D-A10D-8DCC10CE35B7@gmail.com>
Hello,
On Aug 27, 2008, at 9:15 PM, Yuri Takhteyev wrote:
>> As an alternative, SQLite's full text search module (aka FTS3) is
>> rather
>> fast and reasonably easy to integrate with Lua.
>
> I am not sure I want to make SQLite a required dependency for the
> default installation,
In the case of Nanoki, SQLite is optional. If it's there it's used,
otherwise Nanoki only provides a default title search which doesn't
require any additional libraries.
> but this may very well be worth while as a
> plugin. Do you have the code?
http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl
http://dev.alt.textdrive.com/browser/HTTP/Finder.dml
http://dev.alt.textdrive.com/browser/HTTP/Finder.lua
Aside from SQLite itself, this requires LuaSQL driver for SQLite.
Here is a little tutorial to enable FTS in SQLite:
"Full-Text Search on SQLite"
http://blog.michaeltrier.com/tags/fts
>> Here is an example of FTS3 at work, searching around 4,658 pages:
>
> If someone realistically wanted to use Sputnik for a wiki with 4,658
> pages, I would put some serious effort into offering better search
> options. Right now our top case is ~200 pages, I think. I am working
> on converting my photoblog to Sputnik, which would be around 1200
> pages, but those mostly just need tagging. So, I don't know whether
> scaling search is the most important thing right now. But if it could
> be done easily, then surely it would be a good thing to offer.
Cheers,
--
PA.
http://alt.textdrive.com/nanoki/
From carregal at fabricadigital.com.br Wed Aug 27 18:40:50 2008
From: carregal at fabricadigital.com.br (Andre Carregal)
Date: Wed Aug 27 18:53:07 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
Message-ID: <92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
On Wed, Aug 27, 2008 at 1:58 PM, Yuri Takhteyev <yuri@sims.berkeley.edu> wrote:
>> Can sputnik be patched to generate indexes that can be used in the search?
>
> Not sure i understood what this means.
Me either, this wasn't on my message. :o)
>> BTW, why show the backlink counts? If you are assuming this could be
>> important info, why not use them as sort order instead of the
>> alphabetic one? Page rank at its minimal... hehe
>
> Yes, it is like naive version of pagerank. (An alternative would be
> to count actual visits.) However, I don't expect too much from it, so
> I didn't make it the default sorting order. Note though, in case this
> wasn't clear, that you can click on the header of the table to resort
> it. So, you can view the results aphabetically (the default), sorted
> by then number of backlinks, or sorted by the time of the last
> modifications. (In the latter case, I am again assuming that pages
> that were rated more recently are more likely to be relevant.)
I guess then it's a matter of style. I still keep my vote for a
simpler listing...
> The next time I need to procrastinate, I'll look into implementing a
> simple ranking algorithm
I hope I didn't sound like I was complaining about the search, I
really like it. I was just mentioning an UI issue.
Andr?
From dm.lua at math2.org Thu Aug 28 03:35:49 2008
From: dm.lua at math2.org (David Manura)
Date: Thu Aug 28 03:48:10 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
<92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
Message-ID: <bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com>
> On Aug 27, 2008, at 11:32 AM, Yuri Takhteyev wrote:
>> I implemented "dumb" search for Sputnik, which means simply scanning
>> the nodes when a query is made, and while the performance is not
>> stellar, it seems to be bearable for the sputnik site itself (193
>> nodes).
Nice, thanks. This may simplify the code though:
diff --git a/sputnik/lua/sputnik/actions/search.lua
b/sputnik/lua/sputnik/actions/search.lua
index e7c262e..a9757da 100644
--- a/sputnik/lua/sputnik/actions/search.lua
+++ b/sputnik/lua/sputnik/actions/search.lua
@@ -28,41 +28,34 @@ TEMPLATE = [[
actions.show_results = function(node, request, sputnik)
local query = {}
+ local nquery = 0
for term in (request.params.q or ""):lower():gmatch("%w+") do
- query[term] = {}
+ if not query[term] then
+ query[term] = true
+ nquery = nquery + 1
+ end
end
node.title = 'Search for "'..request.params.q..'"'
local backlinks = {}
+ local nodes = {}
for i, node in ipairs(wiki.get_visible_nodes(sputnik, nil)) do
if node.content and type(node.content)=="string" then
+ local found = {}
+ local nfound = 0
for word in node.content:lower():gmatch("%w+") do
- if query[word] then
- query[word][node.id] = node
+ if query[word] and not found[word] then
+ found[word] = true
+ nfound = nfound + 1
end
end
+ if nfound == nquery then
+ table.insert(nodes, node)
+ end
for id in node.content:gmatch("%[%[([^%]]*)%]%]") do
backlinks[id] = (backlinks[id] or 0) + 1
end
end
end
- local nodes = {}
- for term, matches in pairs(query) do
- for id, node in pairs(matches) do
- nodes[id] = node
- end
- end
- local ordered_nodes = {}
- for id, node in pairs(nodes) do
- for term, _ in pairs(query) do
- if not query[term][node.id] then
- nodes[id] = nil
- end
- end
- if nodes[id] then
- table.insert(ordered_nodes, node)
- end
- end
- nodes = ordered_nodes
table.sort(nodes, function(x,y) return x.id < y.id end)
node:add_javascript_snippet(sorttable.script)
node.inner_html = util.f(TEMPLATE){
On Wed, Aug 27, 2008 at 4:40 PM, Andre Carregal wrote:
> I guess then it's a matter of style. I still keep my vote for a
> simpler listing...
I tend to agree that the number of backlinks is not that important for
the user to see. Moreover, it's not clear that backlink counts or
ranking in general is that useful in a small wiki maintained by a
small group of people. I haven't missed it on the lua-users.org/wiki
search.
Some things that can do much toward narrowing the search results are
quoted phrase searching and negative logic. Phrase searching would
ideally support phrases of non-alphanumeric characters--i.e.
non-tokenized (e.g. "C++" or "obj:method()"). I may add those if
no-one beats me to them.
From yuri at sims.berkeley.edu Tue Sep 2 06:53:49 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Tue Sep 2 07:06:22 2008
Subject: [Sputnik-list] sputnik search alpha
In-Reply-To: <bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com>
References: <fa4efbc00808270232k76ed0c0clf1ad707cddf9898e@mail.gmail.com>
<92ab989c0808270749q4df9fb5k82178ba1c5cbddb3@mail.gmail.com>
<fa4efbc00808270958s349723efnab3e9e7f7c4232f6@mail.gmail.com>
<92ab989c0808271340o71e7e30fr1a6d868fba023ced@mail.gmail.com>
<bc4ed2190808272235w38f2965eg6408d509dd1830ff@mail.gmail.com>
Message-ID: <fa4efbc00809020153r514a53bfr3f014827fcd81af1@mail.gmail.com>
> Nice, thanks. This may simplify the code though:
Thanks for the patch. Unfortunately, I had already started a major
refactoring so the patch no longer applies. I am trying to figure out
how much of the functionality to push into Saci and how generic to
make the interface. I am now leaning on having saci provide a query()
method that would retrieve a list of nodes matching the query (as well
as another one that would allow you to search within specific fields),
but without any ranking. Something like:
local nodes = saci:query("lua table -git")
or
local nodes = saci:find_nodes({"title", "tags"}, {"lua", "table"},
"Tickets")
(the third parameter is the prefix within which to search).
But this is not set in stone yet.
> I tend to agree that the number of backlinks is not that important for
> the user to see. Moreover, it's not clear that backlink counts or
> ranking in general is that useful in a small wiki maintained by a
> small group of people. I haven't missed it on the lua-users.org/wiki
> search.
Agreed. I removed them, since this required breaking all sorts of
encapsulation.
> Some things that can do much toward narrowing the search results are
> quoted phrase searching and negative logic. Phrase searching would
> ideally support phrases of non-alphanumeric characters--i.e.
> non-tokenized (e.g. "C++" or "obj:method()"). I may add those if
> no-one beats me to them.
I implemented negative terms, as well as an option to limit search by
node id prefix. So you can search for, say, "git -storage
prefix:Ticket".
- yuri
--
http://sputnik.freewisdom.org/