The Kasabi Update API provides a simple protocol for managing data within a Kasabi dataset.
Formally, a Kasabi dataset is an RDF graph store, that consists of a single default graph. This graph can be manipulated via the Update API by appending new RDF statements or modified through the application of changesets. Changesets are patches that can be used to add and remove statements from a graph in order to achieve updates.
The API supports direct submission of RDF using HTTP POST requests, as well as import of data from a specified URL. RDF statements can be submitted in a number of different RDF serializations including RDF/XML, Turtle, as well as RDFa.
Request Processing & Eventual Consistency
Updates to Kasabi are handled asynchronously. An HTTP request to append or update data will return a URL that can be monitored to identify when an update has been successfully applied. Changes are guaranteed to be applied in the order in which they were submitted. Parsing and data format errors will be rejected during the original request.
In addition Kasabi uses an Eventual Consistency model for storing data. Internally data is replicated across a number of nodes and updates are applied to each separately. This means that a successful update may not be immediately visible to all nodes.
Basic API Reference
Endpoint URL
The base URL for the Update API associated with a given dataset is:
http://api.kasabi.com/dataset/[short-code]/storeWhere short-code is the short name for the dataset. E.g. the NASA dataset available from http://beta.kasabi.com/dataset/nasa has an Update API available at:
http://api.kasabi.com/dataset/nasa/storeAuthentication
Only the owner of a dataset has the permissions to modify it using the Update API. To access the API will require use of your API key. For more information on Kasabi authentication options read the authentication documentation.
Parameters
Data submitted to the update API via HTTP POST. The submitted data is provided in the body of the request.
When importing data from the web, a data-uri parameter is used to indicate the location of the data to be imported.
For more information see the following sections.
HTTP Response Codes
Clients should be prepared to receive any valid HTTP response code. The following table lists the most frequently used codes
| Code | Meaning |
|---|---|
| 202 Accepted | Request to update data has been accepted |
| 400 Bad Request | Invalid data, missing Content-Type header |
| 401 Not Authorized | API key is not authorized to access the data |
Please also review our additional notes on response codes and error reporting.
Response Formats
The Update API typically returns plain text messages. For a successfully queued update request, an HTTP Location header is returned in the response. This indicates a URL that can be monitored to check whether an update has been successfully applied.
Appending Data to a Dataset
New RDF statements can be appended to the graph store associated with a dataset by sending an HTTP post the Update API endpoint.
The body of the request should contain the data to be appended to the graph, and the Content-Type header that indicates the format of the data being submitted. Standard RDF serialization formats including RDF/XML and Turtle are supported.
The Update API operates best when data is submitted in chunks of approximately 2Mb. Currently attempts to POST larger documents will be rejected. Large datasets should be chunked into smaller data files for uploading through the API.
Importing Data from the Web
The Update API supports fetching remote data sources for importing into a dataset. This allows existing RDF and RDFa data that has been published to the web to be appended to a dataset.
If a POST request to the Update API includes a data-uri parameter, then instead of processing the body of the request, the API will instead fetch the remote data source and attempt to add that to the dataset.
The API will not process a remote data file larger than 2MB. It also does not support the fetching of compressed data files.
Note: The range of data formats supported will be improved, and restrictions on file sizes will be removed in later releases of the API.
Handling of Blank Nodes
"Blank Nodes" or "bnodes" are entities in an RDF dataset that have not been assigned a globbal identifier. While blank nodes are useful in a number of circumstances they make it difficult to reliably refer to a resource, e.g. for the purposes of establishing links between datasets, or annotating a resources with additional information.
The Kasabi graph store does not support the storage of blank nodes. Blank nodes present in data that appended to the graph store will be replaced by a resource with an automatically generated URI. This resource URI will be based on the following URI pattern:
http://data.kasabi.com/dataset/[short-code]/[uuid]Where short-code is the short name for a dataset, and uuid is an automatically generated UUID. This ensures that a unique, de-referencable URI is assigned to the resource.
Updating Data in a Dataset
Changesets
Changesets are an RDF vocabulary that describes changes to resources in an RDF graph. The Update API supports submission of changesets to apply updates to a dataset graph.
A Changeset consists of:
- A time-stamp and reason for a change
- An optional set of statements to be added to a graph
- An optional set of statements to be removed from a graph
The set of additions and removals all relate to a single, common "subject of change". i.e. a changeset describes changes to a single resource in the RDF graph.
This means that a changeset can be used to achieve any of the following operations on a graph:
- Append new statements -- by providing only new statements to add to the graph. This is redundant as the same effect can be achieved by submitting new data directly
- Remove existing statements -- by providing only a list of statements to be removed
- Updates to a graph -- removing and then adding a statement with the same predicate, effectively creates an update to the graph
Submitting Changesets
To submit a changeset to a graph, POST the RDF changeset to the Update API endpoint with a Content-Type of application/vnd.talis.changeset+xml.
If an incorrect mimetype is specified then the changeset may be stored in the graph, rather than applied to the graph, so ensure that the correct Content-Type header is being passed in the request.
As with other updates, a successfuly update will return the URL of a resource that can be polled to check whether an update has been successful.
Removals as Preconditions
If a Changeset indicates that a statement should be removed from a graph, and that statement is not present, then the update will fail.
Note: this behaviour may be removed in future releases, in favour of silently ignoring removal requests for statements that are not present in the graph.
Checking Whether An Update Has Been Applied
A successful request to append statements to a store, or to apply a changeset will return a 202 Accepted status code. This indicates that the data has been successfully parsed by the API and queued for processing.
As noted above, the Kasabi Update API is both asynchronous and eventually consistent. Successful responses from the API will include a Location header. This header contains the URL of a "change" resource that can be monitored to determine the state of the change.
An HTTP GET request to a change resource will return a JSON format that indicates whether that update has been successfully applied. E.g a request to submit data to the NASA dataset might return the following URL for a change resource:
http://api.kasabi.com/dataset/nasa/changes/12345A GET request to that resource will return a JSON response with the following format:
{
"status": "applied"
}The applied status indicates that the change has been successfully applied to all replicas of the data. A value of pending will indicate that the change has not yet been applied to all replicas.
Note: this aspect of the API is likely to evolve over the coming releases. Client library implementors should not rely on the URL format for change resources, and should expect the response format to evolve.
It is up to the client sending a request to cache the URLs of these resources, for local monitoring. Currently there is no API for retrieving a list of previously applied and/or pending changes.