Larry Maccherone

Major upgrades to documentdb-utils

Wed, 09 Dec 2015 06:07:58 +0100

I've just now finished pushing a major upgrade to documentdb-utils and adding C# .NET examples to documentdb-lumenize docs.

With support for upserts, id-based links, and maxItemCount -1, a lot of the API design decisions that I made when I originally created documentdb-utils have been overcome by events... a testament to the rapid development the Azure team is doing. So, I decided to go back to square one and rather than expose my own API, I thought I'd just wrap DocumentClient to upgrade it. Essentially every line of code in the package is new. The WrappedClient class takes a raw DocumentClient and upgrades the methods to have 429 error delay/retry support as well as work with higher-order async functions. It'll even send the same query to multiple collections and aggregate the results as well as aggregate the results of executing the same sproc on multiple collections.

Also, I realized that I had started to create a bunch of other helpful utilities that provide a lot of value and decided to share them in documentdb-utils including:

The ability to require() other packages when writing server-side scripts
The ability to load all sprocs or UDFs in a directory to all collections in a list. This is incredibly useful when working with multiple partitions.
Mixins for async.js and underscore.js so you can use those JavaScript utility libraries in your sprocs
functions to generate id-based links and arrays of links (incidentally, you can pass in an array of links to many of the WrappedClient methods)
A few other niceties

Also, I knew it was possible but I never bothered before to use documentdb-lumenize (aggregations) from a C# .NET project. I've done so now and included that documentation and examples.

Anyway, I thought I'd share with this group in case any of you want to check out the upgrades.

Announcing documentdb-lumenize

Fri, 10 Jul 2015 03:19:21 +0200

The bad news: DocumentDB does not include aggregation capability.

The good news: DocumentDB includes stored procedures and documentdb-lumenize uses this to add aggregation capability that far exceeds that which you are used to with SQL.

As Michael Stonbraker, genius creator of not one but three wildly successful databases (Postgres, Vertica, and VoltDB) said, "It's better to move the code to the data than the other way around." VoltDB was designed to execute Java-language ACID transactional stored procedures running in the same memory space as the data with huge performance and consistency benefits. Well, Microsoft Azure's DocumentDB takes a similar approach except that it's a NoSQL data store using JSON and you write your stored-procedures in JavaScript (or CoffeeScript in my case).

Using this power, I've ported the aggregation engine of the Lumenize library, which I created while working on my PhD at Carnegie Mellon over to run inside of DocumentDB. This instantly upgrades DocumentDB with more powerful declarative aggregation (including full OLAP cube) capability than even the most advanced databases.

A simple groupBy example

Let's assume this is the only data in your collection.

[
  {id: 1, value: 10}
  {id: 1, value: 100}
  {id: 2, value: 20}
  {id: 3, value: 30}
]

Now, let's call the cube with the following:

{cubeConfig: {groupBy: 'id', field: "value", f: "sum"}}

After you call the cube stored procedure, you should expect this to be in the savedCube.cellsAsCSVStyleArray parameter of the response. Note, the _count metric is always calculated even when not specified.

[
  [ 'id', '_count', 'value_sum' ],
  [   1,         2,         110 ],
  [   2,         1,          20 ],
  [   3,         1,          30 ]
]

Full OLAP cube capability

A groupBy is just a one-dimensional OLAP cube and the example above just uses a bit of syntactic sugar to quickly configure this one-dimensional OLAP cube. However, the underlying engine is a fast, light, flexible, declaratively-configured OLAP Cube with powerful hierarchical rollup support. It can be used from node.js projects as well as .NET projects or any other platform using DocumentDB's REST API.

You can read all about this capability as well as all the details on how to use it on the GitHub page for documentdb-lumenize.

This is still a work in progress, so please give me your feedback.

Announcing documentdb-mock

Mon, 29 Jun 2015 22:37:52 +0200

Microsoft Azure's DocumentDB is a great PaaS NoSQL database. The killer feature is that you can write stored procedures in JavaScript (CoffeeScript in my case) AND the operations performed in a single run of the stored procedure is contained in an ACID-compliant transaction -- even if the transaction effects more than one document. This means that either all operations are completed or none are and that none of the operations of an incomplete transaction are seen by any other database interaction. Also, to quote VoltDB (similar in design to DoucumentDB except stored procedures are written in Java), "Code is smaller than data. Move the code to the data." In DocumentDB, the JavaScript is run in the same memory space as the database which provide performance advantages. I will publish my performance findings confirming this shortly.

I use the Node.js client on occasion for inspection or testing, but, knowing the advantages of using stored procedures, I have adopted a pattern of doing all database operations for production code inside of stored procedures.

However, the infrastructure around DocumentDB is not yet mature. A particularly painful aspect of this is that it's missing a good way to test and debug your stored procedures. Sure, you can attach execution state to the "body" that is returned from the run to see what's going on, but it would be much easier if you could use your local debugger and write automated tests to confirm the functionality before you pushed your stored procedure to the server.

Luckily, stored procedures are just JavaScript so we can use Node.js to test and debug them. This documentdb-mock package implements a thin mock to enable this testing and debugging.

So, you can find it on npm at https://www.npmjs.com/package/documentdb-mock or GitHub at https://github.com/lmaccherone/documentdb-mock.

As always I appreciate input and pull requests.