Data hosting and publishing is an essential feature of Kasabi. All users of Kasabi can create datasets for publishing data to the marketplace and the web.

While data hosting is free within Kasabi, currently data publishers are limited to a maximum of 5 datasets. Please contact us if you feel you have a need for further storage.

Datasets have a life-cycle, allowing them to be populated and tested in a "draft" form before being published for wider use. Datasets published into the marketplace are automatically available through a number of APIs. Users may then create additional custom APIs as necessary in order to provide new ways to access a dataset.

During the beta period, only free, open Data can be published via Kasabi, using one of the pre-defined built-in licenses. In future releases more fine-grained controlled will be provided over managing dataset access rights and terms of use, as well as reporting on usage statistics.

Data Storage in Kasabi

Each dataset in Kasabi is a self-contained graph database. Each dataset has an owner, a set of associated metadata, and a license. While Kasabi holds datasets from a number of different data publishers, each dataset is stored separately and is accessed through a separate set of APIs.

Only the dataset owner has authorization to add, update or delete data in their datasets. Facilities to carry out bulk operations on datasets, e.g. to reset and dump data, are also available via a Jobs API.

Data is submitted to Kasabi primarily as RDF or RDFa. The Update API supports a number of different formats and data may be directly submitted to the API or imported from the web. Updates are handled by submission of "changesets" that describe changes, i.e. diffs, to the currently stored graph.

Bulk update facilities, including bulk loading, mirroring and web crawling will be added in future releases. In its first releases, Kasabi is intended to support the creation of smaller, regularly updated datasets.

The Dataset Life-cycle

When a dataset is first created in Kasabi it is temporarily in a Pending status. This indicates that the storage for the dataset is being allocated. Whilst in this status the dataset is unavailable for use, although the owner may still edit its metadata, e.g. to add a title, description, etc.

Once storage has been allocated -- typically within a few minutes -- a dataset will be marked as in a Draft status. This indicates that the storage for the dataset has been allocated and that the Update API is now available. A standard set of APIs are automatically deployed for the new dataset. These are also in a Draft status.

Datasets and APIs that are in a Draft status can only be accessed by their owner. The status is provided to support data publishers in testing out the system, and their data, before publishing it to the rest of the developer community.

Controls for publishing a dataset are available right from the dataset homepage. Once a dataset has been marked as Published then it, and its assocated APIs, become visible to all Kasabi users. The dataset and API will automatically appear in the search indexes and be available for public browsing.

We strongly encourage data publishers to test our their data, submit developer documentation, and carefully consider their data licensing terms before publishing a dataset.

We recognize that data publishers may sometimes need to withdraw datasets from the marketplace. Controls for doing this are available from the administrators dashboard and the dataset homepage. This allows a dataset and its APIs to be moved back to a Draft status.

Dataset Licensing

All datasets in Kasabi must have an associated set of usage terms. A set of existing open data licensing terms have been pre-configured in the system and publishers are encouraged to choose a suitable off-the-shelf set of terms.

During the beta period only these standard open licenses and public domain waivers may be used to publish data into the marketplace. In future releases support will be made available for submission of custom terms, although our hope is that the data publishing community can converge on a standard set of common terms that are easy for developers to work with.