Sputnik: traceback errors, config pages, searching

traceback errors, config pages, searching
XSSFilter could not parse (X)HTML:


<p>From dm.lua at math2.org  Sat Aug 23 14:38:33 2008
From: dm.lua at math2.org (David Manura)
Date: Sat Aug 23 14:50:39 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
Message-ID: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>

<p>A few comments...</p>

<p>(1)</p>

<p>When Sputnik raises an unexpected exception, a stack traceback is
displayed on the web page:</p>

<p><snip>
There was an error in the specified application. The full error message follows:</p>

<p>...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:183: cannot
obtain information from file `redirect:/cgi/sputnik.cgi'
stack traceback:</p>
<pre><code>   [C]: in function 'assert'
   ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:183: in
</code></pre>
<p>function <...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:182></p>
<pre><code>   (tail call): ?
   ...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:16: in
</code></pre>
<p>function <...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:14></p>
<pre><code>   (tail call): ?
   [C]: in function 'xpcall'
   ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:135: in
</code></pre>
<p>function 'run_app'</p>
<pre><code>   ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:159: in
</code></pre>
<p>function 'run'</p>
<pre><code>   ...k/kepler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/cgi.lua:16: in
</code></pre>
<p>function 'run'</p>
<pre><code>   ...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:26: in main chunk
   [C]: ?
</code></pre>
<p></snip></p>

<p>It could be argued that the end user of the web site shouldn't see a
stack traceback.  First, there may be security implications in
allowing the end user to know how the web site is implemented and
installed.  Second, the stack traceback is more useful rather to the
administrator of a web site, so perhaps it should be recorded instead
to a log file on the server, and end user should only see a ticket
number that the administrator can cross reference against the log
file.   I did some searching on this concern just now:</p>

<p>[1] http://www.jankoatwarpspeed.com/post/2008/06/02/Exception-handling-best-practices-in-ASPNET-web-applications.aspx
[2] http://www.securitypark.co.uk/article.asp?articleid=26905
[3] http://www.infosecwriters.com/text<em>resources/pdf/Top</em>10<em>Configuration</em>Security<em>Vulnerabilities</em>Part<em>One</em>BSullivan.pdf</p>

<p>The stack traceback in Sputnik is triggered by error_html in
rocks/wsapi/1.0-2/lua/wsapi/common.lua, so this might instead be a
WSAPI/Kepler concern.</p>

<p>(2)</p>

<p>After installing Sputnik, I had difficulty finding a complete list of
all the configuration pages.  Only some were on the start page.  I
later discovered they were listed on the "sputnik" page--e.g.
http://sputnik.freewisdom.org/en/sputnik .  (BTW, the "_navigation"
link on this page is broken.)  I think the "sputnik" configuration
page should be linked from the start page on the initial installation.</p>

<p>(3)</p>

<p>More generally, is there a way to obtain a complete list of all pages
that exist (without indexing them on Google)?  Perhaps I'm setting up
a new wiki and want to remove unnecessary pages.  On lua-users wiki, I
just enter an empty search in http://lua-users.org/wiki/FindPage .</p>

<p>(4)</p>

<p>I'm quite in favor of adding a built-in full-text search engine that
works out-of-the box, at least as a fallback, even if that may be
inferior in some ways to Google.  A discussion about this was here:</p>

<p>  http://lua-users.org/lists/lua-l/2008-02/msg00950.html</p>

<p>A potentially common use case is to use Sputnik internally on a small
wiki by an individual or small group.  In that case, simple linear
search through the pages (much like grep) would be sufficient and
trivial to implement.  More generally, you'd want to maintain an
inverted index, possibly using an existing production-grade search
engine (e.g. http://swish-e.org and others) or Google, but if you want
something trivial to implement now, here's the code used by the usemod
wiki ( http://www.usemod.com/cgi-bin/wiki.pl ), which is the wiki upon
which lua-users.org is based:</p>

<p>sub SearchTitleAndBody {
  my ($string) = @_;
  my ($name, $freeName, @found);</p>

<p>  foreach $name (&AllPagesList()) {</p>
<pre><code>&OpenPage($name);
&OpenDefaultText();
if (($Text{'text'} =~ /$string/i) || ($name =~ /$string/i)) {
  push(@found, $name);
} elsif ($FreeLinks) {
  if ($name =~ m/_/) {
    $freeName = $name;
    $freeName =~ s/_/ /g;
    if ($freeName =~ /$string/i) {
      push(@found, $name);
    }
  } elsif ($string =~ m/ /) {
    $freeName = $string;
    $freeName =~ s/ /_/g;
    if ($Text{'text'} =~ /$freeName/i) {
      push(@found, $name);
    }
  }
}
</code></pre>
<p>  }
  return @found;
}</p>

<p>Boolean AND/NOT logic and phrase searching would be a simple extension
to that (e.g. ' "hello world" -goodbye ').  You do not need word
tokenization (since there is no inverted index of words) nor stemming,
synonyms, etc., which would complicate the otherwise simple logic.</p>

<p>(5)</p>

<p>When previewing edits to template/config pages, it would be useful for
Sputnik to apply the templates being edited in the preview.  This is
especially true since edits to these pages can break the wiki, so it
would be desirable to preview them first.</p>


<p>From jnwhiteh at gmail.com  Sat Aug 23 14:54:31 2008
From: jnwhiteh at gmail.com (Jim Whitehead II)
Date: Sat Aug 23 15:06:34 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a>
Message-ID: <a href="mailto:5fe281d40808230954s7a5979c6sbcdb6ee12ff2f213@mail.gmail.com">5fe281d40808230954s7a5979c6sbcdb6ee12ff2f213@mail.gmail.com</a></p>

<p>On Sat, Aug 23, 2008 at 5:38 PM, David Manura <a href="mailto:dm.lua@math2.org">dm.lua@math2.org</a> wrote:</p>
<blockquote>
    <p>A few comments...</p>

<p>(1)</p>

<p>When Sputnik raises an unexpected exception, a stack traceback is
displayed on the web page:</p>

<p><snip>
There was an error in the specified application. The full error message follows:</p>

<p>...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:183: cannot
obtain information from file `redirect:/cgi/sputnik.cgi'
stack traceback:</p>
<pre><code>  [C]: in function 'assert'
  ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:183: in
</code></pre>
<p>function <...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:182></p>
<pre><code>  (tail call): ?
  ...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:16: in
</code></pre>
<p>function <...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:14></p>
<pre><code>  (tail call): ?
  [C]: in function 'xpcall'
  ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:135: in
</code></pre>
<p>function 'run_app'</p>
<pre><code>  ...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:159: in
</code></pre>
<p>function 'run'</p>
<pre><code>  ...k/kepler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/cgi.lua:16: in
</code></pre>
<p>function 'run'</p>
<pre><code>  ...utnik/kepler-1.1/rocks//wsapi/1.0rc1-1/bin/wsapi.cgi:26: in main chunk
  [C]: ?
</code></pre>
<p></snip></p>

<p>It could be argued that the end user of the web site shouldn't see a
stack traceback.  First, there may be security implications in
allowing the end user to know how the web site is implemented and
installed.  Second, the stack traceback is more useful rather to the
administrator of a web site, so perhaps it should be recorded instead
to a log file on the server, and end user should only see a ticket
number that the administrator can cross reference against the log
file.   I did some searching on this concern just now:</p>

<p>[1] http://www.jankoatwarpspeed.com/post/2008/06/02/Exception-handling-best-practices-in-ASPNET-web-applications.aspx
[2] http://www.securitypark.co.uk/article.asp?articleid=26905
[3] http://www.infosecwriters.com/text<em>resources/pdf/Top</em>10<em>Configuration</em>Security<em>Vulnerabilities</em>Part<em>One</em>BSullivan.pdf</p>

<p>The stack traceback in Sputnik is triggered by error_html in
rocks/wsapi/1.0-2/lua/wsapi/common.lua, so this might instead be a
WSAPI/Kepler concern.</p>
</blockquote>

<p>I agree 100% and my thoughts were to write the following:</p>

<ol>
    <li><p>A configuration page in the core that is displayed whenever a
    sputnik-layer error occurs.  This is somewhat difficult to manage
    considering that the sputnik error may be preventing nodes from being
    displayed in the first place.  It could be a static page, so I'm open
    to ideas on this one.</p></li>
    <li><p>A module that sends all errors that occur in the system to some
    endpoint so they can be viewed after the fact.  There are again issues
    with this (how to ensure that the filesystem/db isn't consumed with
    these errors, etc).</p></li>
</ol>

<blockquote>
    <p>After installing Sputnik, I had difficulty finding a complete list of
    all the configuration pages.  Only some were on the start page.  I
    later discovered they were listed on the "sputnik" page--e.g.
    http://sputnik.freewisdom.org/en/sputnik .  (BTW, the "_navigation"
    link on this page is broken.)  I think the "sputnik" configuration
    page should be linked from the start page on the initial installation.</p>
</blockquote>

<p>All configuration pages, or at least the major sputnik configuration
page should be linked from the main page post-installation so this
just needs to be updated in the node template.</p>

<blockquote>
    <p>More generally, is there a way to obtain a complete list of all pages
    that exist (without indexing them on Google)?  Perhaps I'm setting up
    a new wiki and want to remove unnecessary pages.  On lua-users wiki, I
    just enter an empty search in http://lua-users.org/wiki/FindPage .</p>
</blockquote>

<p>Well there are a few issues with this.  Some pages only exist at the
point they are queried and have no transient state, while the rest can
be viewed directly on the file system or whatever backend the
repository is using.  I'm not sure if we have a way using LuaRocks to
figure out what modules are possible provided, but that would be the
primary issue there.l</p>

<blockquote>
    <p>I'm quite in favor of adding a built-in full-text search engine that
works out-of-the box, at least as a fallback, even if that may be
inferior in some ways to Google.  A discussion about this was here:</p>

<p> http://lua-users.org/lists/lua-l/2008-02/msg00950.html</p>

<p>A potentially common use case is to use Sputnik internally on a small
wiki by an individual or small group.  In that case, simple linear
search through the pages (much like grep) would be sufficient and
trivial to implement.  More generally, you'd want to maintain an
inverted index, possibly using an existing production-grade search
engine (e.g. http://swish-e.org and others) or Google, but if you want
something trivial to implement now, here's the code used by the usemod
wiki ( http://www.usemod.com/cgi-bin/wiki.pl ), which is the wiki upon
which lua-users.org is based:</p>

<p>sub SearchTitleAndBody {
 my ($string) = @_;
 my ($name, $freeName, @found);</p>

<p> foreach $name (&AllPagesList()) {
   &OpenPage($name);
   &OpenDefaultText();
   if (($Text{'text'} =~ /$string/i) || ($name =~ /$string/i)) {</p>
<pre><code> push(@found, $name);
</code></pre>
<p>   } elsif ($FreeLinks) {</p>
<pre><code> if ($name =~ m/_/) {
   $freeName = $name;
   $freeName =~ s/_/ /g;
   if ($freeName =~ /$string/i) {
     push(@found, $name);
   }
 } elsif ($string =~ m/ /) {
   $freeName = $string;
   $freeName =~ s/ /_/g;
   if ($Text{'text'} =~ /$freeName/i) {
     push(@found, $name);
   }
 }
</code></pre>
<p>   }
 }
 return @found;
}</p>

<p>Boolean AND/NOT logic and phrase searching would be a simple extension
to that (e.g. ' "hello world" -goodbye ').  You do not need word
tokenization (since there is no inverted index of words) nor stemming,
synonyms, etc., which would complicate the otherwise simple logic.</p>
</blockquote>

<p>I'm sure I speak for both Yuri and myself when I say we would welcome
a contribute module that provides out of the box search for Sputnik.</p>

<blockquote>
    <p>When previewing edits to template/config pages, it would be useful for
    Sputnik to apply the templates being edited in the preview.  This is
    especially true since edits to these pages can break the wiki, so it
    would be desirable to preview them first.</p>
</blockquote>

<p>I'm not sure how feasible this is with the prototyped/inheritance
system that Sputnik operates under, but I agree this would be useful.
I can definitely see the use case for this.</p>

<ul>
    <li>Jim</li>
</ul>


<p>From petite.abeille at gmail.com  Sat Aug 23 16:18:48 2008
From: petite.abeille at gmail.com (Petite Abeille)
Date: Sat Aug 23 16:30:54 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:5fe281d40808230954s7a5979c6sbcdb6ee12ff2f213@mail.gmail.com">5fe281d40808230954s7a5979c6sbcdb6ee12ff2f213@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><5fe281d40808230954s7a5979c6sbcdb6ee12ff2f213@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:1E382948-37D2-444D-85D9-C0B5D99C400D@gmail.com">1E382948-37D2-444D-85D9-C0B5D99C400D@gmail.com</a></p>


<p>On Aug 23, 2008, at 6:54 PM, Jim Whitehead II wrote:</p>

<blockquote>
    <p>I'm sure I speak for both Yuri and myself when I say we would welcome
    a contribute module that provides out of the box search for Sputnik.</p>
</blockquote>

<p>Perhaps SQLite's FTS module would be of interest:</p>

<p>http://www.sqlite.org/cvstrac/wiki?p=FtsTwo
http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex</p>

<p>FWIW, Nanoki provides its own full-text search by implementing an <br/>
inverted index in SQLite:</p>

<p>http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl
http://dev.alt.textdrive.com/browser/HTTP/Finder.dml
http://dev.alt.textdrive.com/browser/HTTP/Finder.lua</p>

<p>Cheers,</p>

<p>--
PA.
http://alt.textdrive.com/nanoki/</p>




<p>From yuri at sims.berkeley.edu  Sat Aug 23 16:51:02 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Sat Aug 23 17:03:05 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a>
Message-ID: <a href="mailto:fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com">fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com</a></p>

<p>This may repeat Jim's answers a bit.</p>

<blockquote>
    <p>When Sputnik raises an unexpected exception, a stack traceback is
    displayed on the web page:</p>
</blockquote>

<p>That's simply a bug.  We used to have a nicer page and then something
changed in WSAPI and we haven't updated.  That said, we should also
focus on avoiding any error messages. :)   So...</p>

<blockquote>
    <p>...epler-1.1/rocks//wsapi/1.0rc1-1/lua/wsapi/common.lua:183: cannot
    obtain information from file `redirect:/cgi/sputnik.cgi'</p>
</blockquote>

<p>...this specific one (which David reported to me separately) has been
forwarded to Fabio and fixed.</p>

<blockquote>
    <p>It could be argued that the end user of the web site shouldn't see a
    stack traceback.  First, there may be security implications in
    allowing the end user to know how the web site is implemented and
    installed.  Second, the stack traceback is more useful rather to the
    administrator of a web site, so perhaps it should be recorded instead
    to a log file on the server, and end user should only see a ticket
    number that the administrator can cross reference against the log
    file.</p>
</blockquote>

<p>Agreed.  Having the traceback displayed in the browser speeds up the
development tremendously, but perhaps the best thing to do is add a
config variable ("DISPLAY_TRACEBACK") that turns this on, keeping it
off by default.</p>

<p>Jim: for the logging module you suggest, wouldn't lualogging work?</p>

<blockquote>
    <p>The stack traceback in Sputnik is triggered by error_html in
    rocks/wsapi/1.0-2/lua/wsapi/common.lua, so this might instead be a
    WSAPI/Kepler concern.</p>
</blockquote>

<p>Again, this specific stack traceback was a WSAPI issue that has been
fixed.  However, the question of whether WSAPI should be displaying
stack traces may be worth bringing up on the Kepler list.  Though,
perhaps this is an application-level issue.</p>

<blockquote>
    <p>After installing Sputnik, I had difficulty finding a complete list of
    all the configuration pages.</p>
</blockquote>

<p>The links on the sputnik wiki didn't help?</p>

<p>My assumption was that people start with the "Installation" page,
which links to "Basic Configuration" (which is also the next item in
the nav bar.  "Basic Configuration" links to configuration (the next
tab), which tells you all of your configuration options.</p>

<p>Should something like this sentence go into the default homepage? :)</p>

<blockquote>
    <p>http://sputnik.freewisdom.org/en/sputnik .  (BTW, the "_navigation"
    link on this page is broken.)</p>
</blockquote>

<p>Oops.  The node name changed, the page did not.  This is part of the
reason why I've been thinking of just moving all the documentation to
the wiki, so it's all in one place.</p>

<blockquote>
    <p>More generally, is there a way to obtain a complete list of all pages
    that exist (without indexing them on Google)?</p>
</blockquote>

<p>Yes: http://sputnik.freewisdom.org/en/sitemap
And even in the "Sitemap" XML format:
http://sputnik.freewisdom.org/en/sitemap.xml</p>

<p>(you can give the latter URL to google via "Webmaster Tools" so that
Google will know when new pages are added.)</p>

<p>The only limitation (or feature, depending on how you look at it) is
that this only displays pages that were edited at some point, skipping
the default pages.  (That is, your "sputnik" page won't be in this
list, unless you edit it.)  This could be changed.</p>

<blockquote>
    <p>Perhaps I'm setting up
    a new wiki and want to remove unnecessary pages.  On lua-users wiki, I
    just enter an empty search in http://lua-users.org/wiki/FindPage .</p>
</blockquote>

<p>Makes sense.  Though, I've never had to do it this way: Sputnik's
default is to store the data in a very transparent way, each node
being a directory inside wiki-data.  So, when I did cleanup of that
sort in the past, I've just cded into wiki-data, did an "ls" and then
a "rm -rf" on the nodes I wanted to delete.</p>

<p>Though, for full transparency of data you would want the Git plugin.
With that, each node is a lua file, revisions are git revisions to
that file, and subdirectories are subdirectories.  (That is,
"Tickets/000001" would map be stored in wiki-data/Tickets/000001.lua,
and you could see the revision history by just running "git log
Tickets/000001.lua")</p>

<blockquote>
    <p>I'm quite in favor of adding a built-in full-text search engine that
    works out-of-the box, at least as a fallback, even if that may be
    inferior in some ways to Google.  A discussion about this was here:</p>
    
    <p> http://lua-users.org/lists/lua-l/2008-02/msg00950.html</p>
</blockquote>

<p>As Jim said, we are all in favor of this, for reasons that you
mentioned and more!  If there was a good search system that was easy
to install and had decent Lua bindings, I swear I would write a plugin
for it the next day.  (I've discussed this issue a bit with Jim and
Carregal, basically thinking of it in terms of an API that would
subscribe to modifications and then be able to answer queries.)</p>

<blockquote>
    <p>something trivial to implement now, here's the code used by the usemod
    wiki ( http://www.usemod.com/cgi-bin/wiki.pl ), which is the wiki upon
    which lua-users.org is based:</p>
</blockquote>

<p>I've tried this before and it was very slow.  However, I have since
have made a simple fix to Saci (commit beda1d7) which improved the
performance dramatically, making this a viable, though still somewhat
slow approach.</p>

<p>Additionally, I've been working on a Sputnik based photoblog
application that is supposed to allow you to browse blog posts and
albums by tag, which is to say that a version of "search" is meant to
be used heavily.  I ended up adding an experimental "application
cache" option, which gives Sputnik or Sputnik-based applications an
option to cache stuff until there is a change to the main storage
system.  So, my application would then cache the list of items for
each tag from the moment the tag is first queried and until some new
photos or posts are added.  However, this feature is not supported by
all versium implementations at this point.  (In fact, by none that are
checked in!)</p>

<blockquote>
    <p>sub SearchTitleAndBody {
 my ($string) = @_;
 my ($name, $freeName, @found);</p>

<p> foreach $name (&AllPagesList()) {
   &OpenPage($name);
   &OpenDefaultText();
   if (($Text{'text'} =~ /$string/i) || ($name =~ /$string/i)) {</p>
<pre><code> push(@found, $name);
</code></pre>
<p>   } elsif ($FreeLinks) {</p>
<pre><code> if ($name =~ m/_/) {
   $freeName = $name;
   $freeName =~ s/_/ /g;
   if ($freeName =~ /$string/i) {
     push(@found, $name);
   }
</code></pre>
</blockquote>

<p>And it would be prettier in Lua. :)</p>

<p>In fact, I will try to make an alternative to "sputnik-search" that does that.</p>

<blockquote>
    <p>Boolean AND/NOT logic and phrase searching would be a simple extension
    to that (e.g. ' "hello world" -goodbye ').  You do not need word
    tokenization (since there is no inverted index of words) nor stemming,
    synonyms, etc., which would complicate the otherwise simple logic.</p>
</blockquote>

<p>Part of me wants to go this route of adding first one little feature
then another, and eventually implementing a great search engine in
Lua.  Another part of me wants to finish my dissertation. :)</p>

<blockquote>
    <p>When previewing edits to template/config pages, it would be useful for
    Sputnik to apply the templates being edited in the preview.  This is
    especially true since edits to these pages can break the wiki, so it
    would be desirable to preview them first.</p>
</blockquote>

<p>It does work this way for CGI, but not currently with Xavante.
(Because in case of Xavante, the same Sputnik instance is re-used as
for the previous call.)  I'll look into this, though.</p>

<p>Thanks for all the comments!</p>

<ul>
    <li>yuri</li>
</ul>

<p>-- 
http://sputnik.freewisdom.org/</p>


<p>From yuri at sims.berkeley.edu  Sat Aug 23 18:29:15 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Sat Aug 23 18:41:23 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:5fe281d40808231212i4ddd6edavadec468c679a8eae@mail.gmail.com">5fe281d40808231212i4ddd6edavadec468c679a8eae@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
<5fe281d40808231212i4ddd6edavadec468c679a8eae@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:fa4efbc00808231329u60f2a441vccd44a9af382aaf1@mail.gmail.com">fa4efbc00808231329u60f2a441vccd44a9af382aaf1@mail.gmail.com</a></p>

<blockquote>
    <blockquote>
        <p>Jim: for the logging module you suggest, wouldn't lualogging work?</p>
    </blockquote>
    
    <p>Yes, it shouldn't be too difficult to provide a sputnik-lualogging
    that can be configured to the various loggers and would be able to
    provide the standard DEBUG, INFO, WARNING, ERROR logging levels.
    Actually, this would be a huge win in the Sputnik core as well for
    troubleshooting things like why my user tokens are timing out so
    quickly ;-)</p>
</blockquote>

<p>We have that!</p>

<p>install lualogging with luarocks and then set:</p>

<p>LOGGER = "file"
LOGGER_PARAMS = {"/tmp/sputnik.log"}</p>

<p>The only thing missing is that there is no way to set logging level.
So, I just added a "LOGGER_LEVEL" parameter, so if you also set</p>

<p>LOGGER_LEVEL = "WARN"</p>

<p>then you will only get warn and error messages, but not the info and
debug ones.  (See lualogging website for more configurations.
LOGGER_PARAMS are just passed to the logger constructor, so they are
logger-specific.  I've never used loggers other than logging.file, but
there are many options, including an email one.)</p>

<p>The commit is http://gitorious.org/projects/sputnik/repos/mainline/commits/5e6cdcdb</p>

<p>In the same commit, I turned off stack trace display by default,
replacing it with a message saying that you can turn stack traces on
by setting SHOW<em>STACK</em>TRACE to true.</p>

<p>One small issue with all this: it all works quite well if Sputnik
initializes successfully and then runs into a problem when responding
to a request.  If it fails <em>before</em> WSAPI even sents it any requests,
then we just get the default WSAPI message.  The reason is that WSAPI
works in two steps:</p>

<ol>
    <li>create an application function</li>
    <li>call it for each request</li>
</ol>

<p>In step 2 we generate a response that goes to the user.  This gives us
an option of handling errors in a smart way.  In step 1, we just
return a function that handles requests.  We can't "say" anything to
the user directly at this point.  I am guessing that the thing to do
is to catch errors happening during initialization and return a
function that just responds with a formatted error message for any
request.  I'll look into this later.</p>

<blockquote>
    <blockquote>
        <p>In fact, I will try to make an alternative to "sputnik-search" that does that.</p>
    </blockquote>
    
    <p>This could also be extended by a simple script that generates the
    index once, and uses the post-action hooks I added to Sputnik in order
    to update the index file when a page is changed.  You would run into
    concurrency issues but its interesting to think about.</p>
</blockquote>

<p>That's an option.  It depends on what kind of data you have, how much
and how often it is updated, and whether you want to update it from
outside Sputnik.  For my own use of the photoblog, I've been wanting
to edit the content via git, but expect to do so at most once a week,
so simply caching searches until the main storage is touched ends up
being easier.  We'll have to think what makes most sense as the
default.</p>

<ul>
    <li>yuri</li>
</ul>

<p>-- 
http://sputnik.freewisdom.org/</p>


<p>From jnwhiteh at gmail.com  Sat Aug 23 18:34:55 2008
From: jnwhiteh at gmail.com (Jim Whitehead II)
Date: Sat Aug 23 18:46:58 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:fa4efbc00808231329u60f2a441vccd44a9af382aaf1@mail.gmail.com">fa4efbc00808231329u60f2a441vccd44a9af382aaf1@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
<5fe281d40808231212i4ddd6edavadec468c679a8eae@mail.gmail.com>
<fa4efbc00808231329u60f2a441vccd44a9af382aaf1@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:5fe281d40808231334h4201866cof8c63f7e040c4d5d@mail.gmail.com">5fe281d40808231334h4201866cof8c63f7e040c4d5d@mail.gmail.com</a></p>

<p>On Sat, Aug 23, 2008 at 9:29 PM, Yuri Takhteyev <a href="mailto:yuri@sims.berkeley.edu">yuri@sims.berkeley.edu</a> wrote:</p>
<blockquote>
    <blockquote>
        <blockquote>
            <p>Jim: for the logging module you suggest, wouldn't lualogging work?</p>
        </blockquote>
        
        <p>Yes, it shouldn't be too difficult to provide a sputnik-lualogging
        that can be configured to the various loggers and would be able to
        provide the standard DEBUG, INFO, WARNING, ERROR logging levels.
        Actually, this would be a huge win in the Sputnik core as well for
        troubleshooting things like why my user tokens are timing out so
        quickly ;-)</p>
    </blockquote>
    
    <p>We have that!</p>
    
    <p>install lualogging with luarocks and then set:</p>
    
    <p>LOGGER = "file"
    LOGGER_PARAMS = {"/tmp/sputnik.log"}</p>
    
    <p>The only thing missing is that there is no way to set logging level.
    So, I just added a "LOGGER_LEVEL" parameter, so if you also set</p>
    
    <p>LOGGER_LEVEL = "WARN"</p>
    
    <p>then you will only get warn and error messages, but not the info and
    debug ones.  (See lualogging website for more configurations.
    LOGGER_PARAMS are just passed to the logger constructor, so they are
    logger-specific.  I've never used loggers other than logging.file, but
    there are many options, including an email one.)</p>
    
    <p>The commit is http://gitorious.org/projects/sputnik/repos/mainline/commits/5e6cdcdb</p>
</blockquote>

<p>Aye, I forgot about that but in truth I was more referring to the
quality of the messages.  The current debug messages from sputnik are
pretty much useless unless you're the one who wrote them <em>nudge</em>.</p>

<blockquote>
    <p>In the same commit, I turned off stack trace display by default,
    replacing it with a message saying that you can turn stack traces on
    by setting SHOW<em>STACK</em>TRACE to true.</p>
    
    <p>One small issue with all this: it all works quite well if Sputnik
    initializes successfully and then runs into a problem when responding
    to a request.  If it fails <em>before</em> WSAPI even sents it any requests,
    then we just get the default WSAPI message.  The reason is that WSAPI
    works in two steps:</p>
    
    <ol>
        <li>create an application function</li>
        <li>call it for each request</li>
    </ol>
    
    <p>In step 2 we generate a response that goes to the user.  This gives us
    an option of handling errors in a smart way.  In step 1, we just
    return a function that handles requests.  We can't "say" anything to
    the user directly at this point.  I am guessing that the thing to do
    is to catch errors happening during initialization and return a
    function that just responds with a formatted error message for any
    request.  I'll look into this later.</p>
    
    <blockquote>
        <blockquote>
            <p>In fact, I will try to make an alternative to "sputnik-search" that does that.</p>
        </blockquote>
        
        <p>This could also be extended by a simple script that generates the
        index once, and uses the post-action hooks I added to Sputnik in order
        to update the index file when a page is changed.  You would run into
        concurrency issues but its interesting to think about.</p>
    </blockquote>
    
    <p>That's an option.  It depends on what kind of data you have, how much
    and how often it is updated, and whether you want to update it from
    outside Sputnik.  For my own use of the photoblog, I've been wanting
    to edit the content via git, but expect to do so at most once a week,
    so simply caching searches until the main storage is touched ends up
    being easier.  We'll have to think what makes most sense as the
    default.</p>
</blockquote>


<p>From dm.lua at math2.org  Sat Aug 23 20:05:03 2008
From: dm.lua at math2.org (David Manura)
Date: Sat Aug 23 20:17:07 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com">fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com">bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com</a></p>

<p>On Sat, Aug 23, 2008 at 12:54 PM, Jim Whitehead II wrote:</p>
<blockquote>
    <ol>
        <li>A configuration page in the core that is displayed whenever a
        sputnik-layer error occurs.  This is somewhat difficult to manage
        considering that the sputnik error may be preventing nodes from being
        displayed in the first place.</li>
    </ol>
</blockquote>

<p>Full error info is usually reportable though, yes, sometimes not.
This is like the "error in error handling message" in Lua.</p>

<blockquote>
    <p>how to ensure that the filesystem/db isn't consumed with
    these errors, etc)</p>
</blockquote>

<p>This may or may not be a concern of Lua.  For example, if a CGI writes
to stderr, Apache just appends to the Apache error log file.  It's
assumed the user is running some cron job to roll and archive the
error logs, or at least maybe the logs are on a separate partition.</p>

<blockquote>
    <p>Some pages only exist at the
    point they are queried and have no transient state, while the rest can
    be viewed directly on the file system or whatever backend the
    repository is using.  I'm not sure if we have a way using LuaRocks to
    figure out what modules are possible provided, but that would be the
    primary issue there.</p>
</blockquote>

<p>Yes.</p>

<p>On Sat, Aug 23, 2008 at 2:51 PM, Yuri Takhteyev wrote:</p>
<blockquote>
    <p>Having the traceback displayed in the browser speeds up the
    development tremendously, but perhaps the best thing to do is add a
    config variable ("DISPLAY_TRACEBACK") that turns this on, keeping it
    off by default.</p>
</blockquote>

<p>That would do.</p>

<blockquote>
    <blockquote>
        <p>After installing Sputnik, I had difficulty finding a complete list of
        all the configuration pages.
        The links on the sputnik wiki didn't help?...
        My assumption was that people start with the "Installation" page,
        which links to "Basic Configuration"</p>
    </blockquote>
</blockquote>

<p>That helped, but I more looking for just a simple (and complete)
listing on my local install rather than a tutorial (e.g. like the
sitemap below).</p>

<blockquote>
    <blockquote>
        <p>More generally, is there a way to obtain a complete list of all pages
        that exist (without indexing them on Google)?
        Yes: http://sputnik.freewisdom.org/en/sitemap</p>
    </blockquote>
</blockquote>

<p>That will do.</p>

<blockquote>
    <p>The only limitation (or feature, depending on how you look at it) is
    that this only displays pages that were edited at some point, skipping
    the default pages...This could be changed.</p>
</blockquote>

<p>I consider it a limitation--whether a config page is edited shouldn't
affect whether it gets displayed in this list.</p>

<p>Note there are two purposes for this list, and it depends whether the
user is logged in as administrator and the permission settings on
those pages.  First, an administrator may want to see a complete list
of pages (edited or not), including configuration pages.  Perhaps the
administrator is securing the web site.  Each page, and in particular
the configuration pages, is an interface that needs to be reviewed.
The second purpose is for indexing by Google.  Google should only see
a subset of these pages (i.e. the public ones), and in particular that
set should by default not contain configuration pages.</p>

<blockquote>
    <p>...Sputnik's default is to store the data in a very transparent way, each node
    being a directory inside wiki-data.  So, when I did cleanup of that
    sort in the past, I've just cded into wiki-data, did an "ls" and then
    a "rm -rf" on the nodes I wanted to delete.</p>
    
    <p>Though, for full transparency of data you would want the Git plugin.
    With that, each node is a lua file, revisions are git revisions to
    that file, and subdirectories are subdirectories.  (That is,
    "Tickets/000001" would map be stored in wiki-data/Tickets/000001.lua,
    and you could see the revision history by just running "git log
    Tickets/000001.lua")</p>
</blockquote>

<p>Yes, that file system transparency would be a nice feature, as it is
in Dokuwiki[1].  The structure you describe using Git is similar in
structure to the plain-text file storage structure used by Dokuwiki,
and perhaps the plain-text file storage used by Versium could benefit
by taking an approach more like this as well.  For example, here's
roughly what the Dokuwiki plain-text file storage structure looks
like:</p>

<p>  /data/pages/mypage.txt
  /data/media/wiki/dokuwiki-128.png
  /data/attic/mypage.1218929386.txt.gz
  /data/attic/mypage.1218167994.txt.gz
  /data/attic/mypage.1218308427.txt.gz
  /data/attic/mypage.1218332876.txt.gz
  /data/attic/mypage.1217553719.txt.gz
  /data/index/page.idx
  /data/index/pageword.idx
  /data/index/i[0-9]+.idx
  /data/index/w[0-9]+.idx
  /data/cache/[0-9a-f]/<md5sum>.(xhtml|js|css|i)
  /data/conf/*
  /data/locks/*
  /data/tmp/*</p>

<p>The "page" directory contains the latest versions of all the wiki
pages.  These are human readable -and- editable markup text.  The
"media" directory contains resources used by those pages (e.g.
images).  The "attic" directory contains compressed/timestamped copies
of previous versions of the pages.  The "index" directory contains the
index files used by the search engine (more on this later).  The
"cache" directory contains cached objects to improve performance.  You
can safely delete the attic/index/cache files if you no longer want
them (or omit them from a backup process).</p>

<p>Concerning the index directory, these are all text files.  page.idx is
a new-line delimited list of page names in the index.  Each
w[0-9]+.idx file is an unsorted, new-line delimited list of words of
length [0-9]+ (note: this nicely makes all records fixed-width).
pageword.idx maps each page number (index in page.idx) to a list of
words identified in the form of (word<em>length, word</em>index) pairs.  (The
text of a word can be obtained by looking up the pair in the
w[0-9]+.idx files.)  The i[0-9]+.idx files correspond to the
w[0-9]+.idx files, and the lines in the files correspond as well.
These files represent the inverted index and likely constrain query
performance, as they map word index to a list of (page<em>number,
word</em>count) pairs.  There's some further documentation in (
http://www.dokuwiki.org/indexer ).  You can checkout the source code
of the indexer.php file ( http://www.splitbrain.org/projects/dokuwiki
)--it's only about 700 lines.  fulltext.php implements the search
routine.  Obviously, less constrained approaches could give better
performance, but it's interesting, for this being one of the major
wikis, the approach they took given those constraints (file system and
text files) and the performance/scalability they got.  However, this
is PHP, and I don't know if they keep the index around in memory
between queries--lacking that, a different approach might be used.</p>

<p>[1] http://www.dokuwiki.org/dokuwiki</p>

<blockquote>
    <blockquote>
        <p>sub SearchTitleAndBody {
        And it would be prettier in Lua. :)
        In fact, I will try to make an alternative to "sputnik-search" that does that.
        ...
        Part of me wants to go this route of adding first one little feature
        then another, and eventually implementing a great search engine in
        Lua.  Another part of me wants to finish my dissertation. :)</p>
    </blockquote>
</blockquote>

<p>Just a basic case-insensitive substring search would go a long way.
With the basics in place, others may improve upon it.  (I'm not sure
what a great search engine implemented in Lua would really offer, as
opposed to a Lua binding to a great search engine implemented in C.)</p>

<blockquote>
    <blockquote>
        <p>When previewing edits to template/config pages, it would be useful for
        Sputnik to apply the templates being edited in the preview...
        It does work this way for CGI, but not currently with Xavante.</p>
    </blockquote>
</blockquote>

<p>Didn't do so in Apache/CGI when I tested it (e.g. previewing edits to
the MAIN template) in the latest release version.</p>


<p>From yuri at sims.berkeley.edu  Wed Aug 27 05:46:42 2008
From: yuri at sims.berkeley.edu (Yuri Takhteyev)
Date: Wed Aug 27 05:58:57 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com">bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
<bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com">fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com</a></p>

<blockquote>
    <blockquote>
        <p>Having the traceback displayed in the browser speeds up the
        development tremendously, but perhaps the best thing to do is add a
        config variable ("DISPLAY_TRACEBACK") that turns this on, keeping it
        off by default.</p>
    </blockquote>
    
    <p>That would do.</p>
</blockquote>

<p>This is committed now.  (See my message to the sputnik list the other day.)</p>

<blockquote>
    <blockquote>
        <p>My assumption was that people start with the "Installation" page,
        which links to "Basic Configuration"</p>
    </blockquote>
    
    <p>That helped, but I more looking for just a simple (and complete)
    listing on my local install rather than a tutorial (e.g. like the
    sitemap below).</p>
</blockquote>

<p>Done in git.</p>

<blockquote>
    <p>Note there are two purposes for this list, and it depends whether the
    user is logged in as administrator and the permission settings on
    those pages.  First, an administrator may want to see a complete list
    of pages (edited or not), including configuration pages.  Perhaps the
    administrator is securing the web site.  Each page, and in particular
    the configuration pages, is an interface that needs to be reviewed.
    The second purpose is for indexing by Google.  Google should only see
    a subset of these pages (i.e. the public ones), and in particular that
    set should by default not contain configuration pages.</p>
</blockquote>

<p>It's not edited vs. non-edited pages.  It's "real" nodes vs defaults.
Real nodes are actual chunks of data that we have in our storage
system.  "Defaults" are things that we fall back onto, based on name
patterns and other things.  Note that the built-in pages like the
one's now listed in the "sputnik" node, are just the simplest kind of
defaults, but there are others.  For example, if we get a request for
"foo/bar", we'll check with the node "foo" whether it wants to tell us
what to do with "foo/bar".  So, a request for "foo/bar" may produce a
proper response even if we don't have a node called "foo/bar".  This
could be used, for example, to map children of a node to some
different data source.  For instance, we could configure our "Source"
node to treat its children as git IDs, so git commits, so that a
request for "Source/3327741" would return the information about commit
3327741.  What this all means is that there isn't a clear boundary
between nodes that "exist" and those that "do not exist".</p>

<p>That said, I'll try to think more about this and see if I come up with
a way to offer a more complete listing.</p>

<blockquote>
    <p>Yes, that file system transparency would be a nice feature, as it is
    in Dokuwiki[1].  The structure you describe using Git is similar in
    structure to the plain-text file storage structure used by Dokuwiki,
    and perhaps the plain-text file storage used by Versium could benefit
    by taking an approach more like this as well.  For example, here's
    roughly what the Dokuwiki plain-text file storage structure looks
    like:</p>
</blockquote>

<p>I see the attraction of this system, but I am wondering if the
advantages it would offer justify the change.  My approach has been to
keep the default storage method as simple as i could make it (in terms
of code), leaving it to other implementation to offer additional
features.</p>

<blockquote>
    <p>The "page" directory contains the latest versions of all the wiki
    pages.  These are human readable -and- editable markup text.</p>
</blockquote>

<p>I assume, though, that editing them by hand does not affect the
history.  The nice thing about using git is that you can actually make
changes and record history from the command line.  (Or view history of
edits made through the web interface.)</p>

<blockquote>
    <p>"media" directory contains resources used by those pages (e.g.
    images).</p>
</blockquote>

<p>We've been trying to implement this in a generic way, so images are
just nodes like any other.</p>

<blockquote>
    <p> The "attic" directory contains compressed/timestamped copies
    of previous versions of the pages.</p>
</blockquote>

<p>I've thought of adding compression, but I am not sure if that would
give much benefit.  At least in my experience, most revisions of most
nodes are under 4K.  This means that for a "typical" page such as
http://sputnik.freewisdom.org/en/Installation, gzipping each version
individually only reduces the total size only by about 20%.  On the
other hand, concatenating the versions and <em>then</em> gzipping them
reduces it by 96.4 and may well be worth doing.  Of course, this is
also more complicated.  One possible compromize is to concatenate and
zip files in groups of ten, after reaching 10th, 20th, 30th, etc.
revision.  In this case the directory ends up looking like this:</p>

<pre><code>00000.txt.gz  00003.txt.gz  00006.txt.gz  000081  000084  000087
00001.txt.gz  00004.txt.gz  00007.txt.gz  000082  000085
00002.txt.gz  00005.txt.gz  000080        000083  000086
</code></pre>

<p>("00002.txt.gz" has versions 000020 to 000029.)</p>

<p>This gives a reduction of 83%.  This would perhaps be worth doing.
But then again, if space is an issue, I am thinking that git would
offer more compact storage.</p>

<blockquote>
    <p>"cache" directory contains cached objects to improve performance.  You
    can safely delete the attic/index/cache files if you no longer want
    them (or omit them from a backup process).</p>
</blockquote>

<p>Why would you want to delete attic?  I would tend to think of wiki's
history being as important (if not more) than the latest revision.</p>

<blockquote>
    <p>Concerning the index directory, these are all text files.  page.idx is
    a new-line delimited list of page names in the index.  Each
    w[0-9]+.idx file is an unsorted, new-line delimited list of words of
    length [0-9]+ (note: this nicely makes all records fixed-width).
    pageword.idx maps each page number (index in page.idx) to a list of
    words identified in the form of (word<em>length, word</em>index) pairs.  (The
    text of a word can be obtained by looking up the pair in the
    w[0-9]+.idx files.)  The i[0-9]+.idx files correspond to the
    w[0-9]+.idx files, and the lines in the files correspond as well.
    These files represent the inverted index and likely constrain query
    performance, as they map word index to a list of (page<em>number,
    word</em>count) pairs.  There's some further documentation in (
    http://www.dokuwiki.org/indexer ).  You can checkout the source code
    of the indexer.php file ( http://www.splitbrain.org/projects/dokuwiki
    )--it's only about 700 lines.</p>
</blockquote>

<p>Thanks for the links.  If we try to do an indexer in Lua, though, we
should try storing the index as a Lua file! :)</p>

<p>My main issue with this approach though, is that indexing is expensive
and happens only occasionally.  Considering that wikis are likely to
be updated more often than they are searched, I am wondering if with
some clever caching indexing on demand (at the time of the query) may
actually work better.</p>

<blockquote>
    <p>(I'm not sure
    what a great search engine implemented in Lua would really offer, as
    opposed to a Lua binding to a great search engine implemented in C.)</p>
</blockquote>

<p>Flexibility and ease of experimentation.</p>

<blockquote>
    <blockquote>
        <blockquote>
            <p>When previewing edits to template/config pages, it would be useful for
            Sputnik to apply the templates being edited in the preview...
            It does work this way for CGI, but not currently with Xavante.</p>
        </blockquote>
    </blockquote>
    
    <p>Didn't do so in Apache/CGI when I tested it (e.g. previewing edits to
    the MAIN template) in the latest release version.</p>
</blockquote>

<p>Can you email me specific steps?  Otherwise I can't seem to reproduce it.</p>

<ul>
    <li>yuri</li>
</ul>

<p>-- 
http://sputnik.freewisdom.org/</p>


<p>From carregal at fabricadigital.com.br  Wed Aug 27 12:43:14 2008
From: carregal at fabricadigital.com.br (Andre Carregal)
Date: Wed Aug 27 12:55:30 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com">fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
<bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com>
<fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:92ab989c0808270743q4b726c3ao3e8ed6bc808d1d23@mail.gmail.com">92ab989c0808270743q4b726c3ao3e8ed6bc808d1d23@mail.gmail.com</a></p>

<p>On Wed, Aug 27, 2008 at 4:46 AM, Yuri Takhteyev <a href="mailto:yuri@sims.berkeley.edu">yuri@sims.berkeley.edu</a> wrote:</p>
<blockquote>
    <p>(...)
    It's not edited vs. non-edited pages.  It's "real" nodes vs defaults.
    Real nodes are actual chunks of data that we have in our storage
    system.  "Defaults" are things that we fall back onto, based on name
    patterns and other things.  Note that the built-in pages like the
    one's now listed in the "sputnik" node, are just the simplest kind of
    defaults, but there are others.  For example, if we get a request for
    "foo/bar", we'll check with the node "foo" whether it wants to tell us
    what to do with "foo/bar".  So, a request for "foo/bar" may produce a
    proper response even if we don't have a node called "foo/bar".  This
    could be used, for example, to map children of a node to some
    different data source.  For instance, we could configure our "Source"
    node to treat its children as git IDs, so git commits, so that a
    request for "Source/3327741" would return the information about commit
    3327741.  What this all means is that there isn't a clear boundary
    between nodes that "exist" and those that "do not exist".</p>
    
    <p>That said, I'll try to think more about this and see if I come up with
    a way to offer a more complete listing.</p>
</blockquote>

<p>What about asking the "real" nodes about the "virtual" ones? You could
call node:getchildren() for example and then use the resulting list as
part of the listing. This would also allow you to show this listing as
an hierachy.</p>

<blockquote>
    <p>(...)
    Thanks for the links.  If we try to do an indexer in Lua, though, we
    should try storing the index as a Lua file! :)</p>
    
    <p>My main issue with this approach though, is that indexing is expensive
    and happens only occasionally.  Considering that wikis are likely to
    be updated more often than they are searched, I am wondering if with
    some clever caching indexing on demand (at the time of the query) may
    actually work better.</p>
</blockquote>

<p>I think you are overestimating the updating frequency. I'd say the
typical wiki is searched more often. :o)</p>

<blockquote>
    <blockquote>
        <p>(I'm not sure
        what a great search engine implemented in Lua would really offer, as
        opposed to a Lua binding to a great search engine implemented in C.)</p>
    </blockquote>
    
    <p>Flexibility and ease of experimentation.</p>
</blockquote>

<p>And, depending on the API you choose, this Lua engine could be
replaced by a more powerful one when needed.</p>

<p>Andr?</p>


<p>From dm.lua at math2.org  Thu Aug 28 03:59:27 2008
From: dm.lua at math2.org (David Manura)
Date: Thu Aug 28 04:11:45 2008
Subject: [Sputnik-list] traceback errors, config pages, searching
In-Reply-To: <a href="mailto:fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com">fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com</a>
References: <a href="mailto:bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com">bc4ed2190808230938x1652c455yc757bf60059c6349@mail.gmail.com</a></p>
<pre><code><fa4efbc00808231151x41ee96a8j261c93d01de5a77e@mail.gmail.com>
<bc4ed2190808231505p49090763o569431c6b1e45d49@mail.gmail.com>
<fa4efbc00808270046ld44e48dp17db9dd60bc160b2@mail.gmail.com>
</code></pre>
<p>Message-ID: <a href="mailto:bc4ed2190808272259ob9613des9c9e280e226a12ef@mail.gmail.com">bc4ed2190808272259ob9613des9c9e280e226a12ef@mail.gmail.com</a></p>

<p>On Wed, Aug 27, 2008 at 3:46 AM, Yuri Takhteyev wrote:</p>
<blockquote>
    <blockquote>
        <p>The "page" directory contains the latest versions of all the wiki
        pages.  These are human readable -and- editable markup text.</p>
    </blockquote>
    
    <p>I assume, though, that editing them by hand does not affect the
    history.  The nice thing about using git is that you can actually make
    changes and record history from the command line.  (Or view history of
    edits made through the web interface.)</p>
</blockquote>

<p>True.  However, there is a utility that can check-in versions from the
command-line utility:</p>

<p>  http://www.dokuwiki.org/cli</p>

<blockquote>
    <blockquote>
        <p>"cache" directory contains cached objects to improve performance.  You
        can safely delete the attic/index/cache files if you no longer want
        them (or omit them from a backup process).</p>
    </blockquote>
    
    <p>Why would you want to delete attic?  I would tend to think of wiki's
    history being as important (if not more) than the latest revision.</p>
</blockquote>

<p>Maybe a more desirable property is that a set of files outside of
revision control has basically the same structure as the set of files
inside revision control but with no older versions stored.  There is
therefore little need for an "svnadmin create" / "svn import" / "svn
export" type of command.</p>

<blockquote>
    <blockquote>
        <p>Didn't do so in Apache/CGI when I tested it (e.g. previewing edits to
        the MAIN template) in the latest release version.</p>
    </blockquote>
    
    <p>Can you email me specific steps?  Otherwise I can't seem to reproduce it.</p>
</blockquote>

<p>Will try again later.</p>