Sputnik: Saci

Saci

Saci is a document-oriented hierarchical storage system that underlies Sputnik. One could say that Sputnik is a web front-end to Saci.

Web applications often store their data in a relational database and use an "object-relational mapper" (or "ORM") to convert data between relational representation in the database and object-oriented representation in memory. Saci takes a different approach. ("Saci"is pronounced suh-SEE, see the article in Wikipedia for the meaning.)

This page provides a high-level overview of Saci. For the actual API, see Saci API.

Saci is a document-oriented storage system with built-in support for history

Saci stores data in "nodes" to make it easier to track history

While relational databases break up the data into many elementary relationships, Saci stores data in larger chunks or "documents," which are called "node" in Saci terminology. Each "node" can contain arbitrary amounts of information - there is no requirement of breaking the data up in the smallest possible pieces ("normalization" in RDBMS terminology). Storing data in larger nodes presents some important advantages when building a collaborative application such as a wiki: the node creates a unit for tracking history of changes. In contrast, relational databases typically do not make it easier (or even possible) to see what changes have been made. Keeping track of history and ability to diff and undo edits, however, is crucial for any collaborative software. Without that, it becomes too risky to allow people to edit the data, unless those people can be really trusted. Saci makes it possible to edit data the same way as text gets edited in wikis.

At the moment Saci does not have a built-in notion of permissions, though permissions are supported by Sputnik. Permissions may become a part of Saci in the future.

Nodes are stored as Lua code and can be activated into Lua objects

Document-oriented storage systems need a format for storing the data inside a document. XML and JSON are a popular solution, for example. Saci uses Lua as it's format. Lua is a programming language but it also works quite well as a format for data representations. Saci nodes are unpacked by running them through the Lua interpreter (in a sandbox for security reasons).

Saci then provides a mechanism for "activating" nodes - turning their values into Lua tables, functions, etc. However, Saci keeps explicit the difference between the "active" version of the node and its underlying representation. The client can set fields on the activated nodes, but those values are not automatically saved. Or, the client can "update" the node by setting the underlying fields to strings and reactivating the node. "Updated" nodes can then be saved. In other words, serialization of functions, etc. is done by the client.

Saci nodes are self describing

Saci nodes are self-describing - there is no separation between "the data" and "the model". The meaning of the node's fields and instructions for what to do with them can be stored in the node itself. This includes the information on how to save modifications to the node and how to activate it. This has an important advantage. Relational database and other systems that have a fixed schema, typically require that all data is stored in a uniform way. This makes changes to the schema difficult. It also inhibits collaborative projects, since changes cannot be easily be made locally.

Saci uses prototype inheritance

Saci relies on inheritance to free nodes from having to carry all of the metadata in them. However, instead of class inheritance (most common in object-oriented systems today), it uses prototype inheritance. This means, that there are no "node classes" that are separate from normal nodes. Each prototype is just a node and any node can be a prototype. Inheritance is just a matter of a node X telling us: "I am just like Node Y, except for the following differences..."

Saci is neutral between independent physical storage

Saci doesn't care where and how the documents and their histories are stored. It pulls nodes in as byte strings, turning them into Lua objects, but it doesn't care where the nodes come from. To achieve this, Saci relies on Versium, which is a simple API for history-aware storage. Versium implementations can store data in a variety of ways: on disk, in memory, in DB, in a Subversion repository, in a Git repository - just to mention those options that are already available today.

Coming soon

Hierarchical storage

Saci will soon support hierarchical storage, where each node would be responsible for it's children. Among other things, children of a node would be potentially physically stored in a different place and it is up to the parent node to decide this. (This is partly implemented at this point.)

Distributed storage

Saci will at some point support distributed storage - i.e., an application would be able to fetch nodes over the network and use them in nearly the same way as the local nodes. (A proof-of-concept implementation is available, but it needs to be refactored, etc.)

Future plans

Indexing

Saci will at some point do indexing in a storage independent way.