MongoDB Wire Protocol Part 2

Last time, we began discussing the MongoDB Wire Protocol with a look at how BSON ties in and some restrictions imposed by MongoDB. In this post, we’ll continue that discussion with a look at querying.

As mentioned previously, there are 7 different types of messages used by drivers. 4 of these messages relate to querying; OP_QUERY, OP_REPLY, OP_GETMORE, and OP_KILLCURSORS. Let’s go through the lifecycle of a query.

The first message sent to the server is an OP_QUERY.

struct OP_QUERY {
    MsgHeader header;                 // standard message header
    int32     flags;                  // bit vector of query options.  See below for details.
    cstring   fullCollectionName ;    // "dbname.collectionname"
    int32     numberToSkip;           // number of documents to skip
    int32     numberToReturn;         // number of documents to return
                                      //  in the first OP_REPLY batch
    document  query;                  // query object.  See below for details.
  [ document  returnFieldsSelector; ] // Optional. Selector indicating the fields
                                      //  to return. 
}

An OP_QUERY message will always have a reply of type OP_REPLY.

struct OP_REPLY {
    MsgHeader header;         // standard message header
    int32     responseFlags;  // bit vector - see details below
    int64     cursorID;       // cursor id if client needs to do get more's
    int32     startingFrom;   // where in the cursor this reply is starting
    int32     numberReturned; // number of documents in the reply
    document* documents;      // documents
}

As OP_REPLY is also under the maximum message size restriction, then it goes to reason that if a query returns more results than can fit inside the reply message, there would need to be away to get the next batch. The cursorID field is present so that the driver can then go back to the same server on the same or a different TCP connection and ask for the next batch. It does this by sending the OP_GETMORE message.

struct OP_GETMORE {
    MsgHeader header;             // standard message header
    int32     ZERO;               // 0 - reserved for future use
    cstring   fullCollectionName; // "dbname.collectionname"
    int32     numberToReturn;     // number of documents to return
    int64     cursorID;           // cursorID from the OP_REPLY
}

The OP_GETMORE will also come back with an OP_REPLY and is handled the same way a response from an OP_QUERY message would be. This will continue until either the cursorID equals 0, meaning no more results, or until the user decides they don’t want any more results. If the user terminates while a cursorID is non-zero, then it is up to the driver to send an OP_KILLCURSORS message to tell the server it can release the resources it is using the maintain the cursor.

struct OP_KILLCURSORS {
    MsgHeader header;            // standard message header
    int32     ZERO;              // 0 - reserved for future use
    int32     numberOfCursorIDs; // number of cursorIDs in message
    int64*    cursorIDs;         // sequence of cursorIDs to close
}

In the .NET driver, this is managed by using a custom class implementing the built-in IEnumerator<T> class. The great thing about IEnumerator<T> is that it implements all the pieces we need, from knowing when to send an OP_GETMORE to implementing IDisposable, allowing us to send OP_KILLCURSORS if necessary. The C#/VB compilers handle calling Dispose() once the foreach loop has been terminated/abandoned. In addition, the LINQ methods also handle disposal correctly. Therefore, it is very natural and simple to expose these semantics to our users in a natural way.

Wire Protocol Part 3 will continue with a look at how writes are performed with MongoDB servers up to version 2.4.

MongoDB Wire Protocol Part 1

Last time, we discussed BSON and how it relates to drivers. In this post, we’ll continue our driver building discussion with how that BSON gets transferred over the wire using the MongoDB Wire Protocol.

The MongoDB Wire Protocol is a binary protocol communicated over TCP/IP. It doesn’t really need to use TCP/IP per se, but TCP is the reliable transport protocol that everyone already understands. Almost every language/framework has a way to use TCP and, as such, was a rather obvious choice. There are a couple of optimizations that have been made by most drivers using TCP, such as packing multiple messages together and disabling Nagle. For the most part though, using TCP is straightforward.

Before we look at the wire protocol specifically, let’s look at a few things we need to know that are related.

The Bson Overlap

As previously discussed, bson plays an important role. But because of how bson works, we are a little constrained in how we can write to a TCP stream. Specifically, bson encodes the size of a document before the document itself. It’s a chicken and egg problem. We need to write the document to the TCP stream, but until we do that, we don’t know the size of the document.

Therefore, each document must be written to a buffer first to get its size and then rewritten to the TCP stream starting with the discovered size. In .NET, this could mean converting a POCO (plain old csharp object) into bytes twice. Obviously, this could be extremely expensive, so we don’t do it this way. Instead, still using a buffer, we write a placeholder for length, write the document, and then go back and backpatch the length (and since the length is an integer, we know it’s 4 bytes). The only real overhead here is that we now have the document in memory twice, once as the POCO itself and once as it’s rendered byte array. In other words, to send a 10MB document to the server requires 20MB of memory.

Unfortunately, there isn’t much we can do about this. We can alleviate some of the pressure on the GC by reusing the buffers used to temporarily serialize our documents. The .NET driver has pinned byte buffers set aside so that the GC is mostly unaffected because we aren’t constantly allocating and releasing memory. This is a common pattern called object pooling, only we are doing it with bytes instead of objects.

MongoDB Requirements

MongoDB also imposes some requirements on the drivers in the form of maximum sizes. Derived from the isMaster command, MongoDB communicates both a maximum bson document size (maxBsonObjSize) as well as a maximum message size (maxMessageSizeBytes). If the server receives a message larger than the maximum message size or a document larger than the maximum document size, it will reject the message 1.

Technically, a driver doesn’t need to care about this and can let the server handle abnormalities but it wouldn’t be a great experience for users. How annoying to find out that the 48MB message we just sent over the wire from America to Europe was rejected by the server. It is much better if the driver keeps track of this information and rejects messages or documents before sending them.

Wire Protocol

The wire protocol is composed of messages, 8 in total but only 7 used by the driver (OP_MSG is the guy left out). Each has a common header and then appends extra data on depending on what needs to be done.

struct MsgHeader {
    int32   messageLength; // total message size, including this
    int32   requestID;     // identifier for this message
    int32   responseTo;    // requestID from the original request
                           //   (used in reponses from db)
    int32   opCode;        // request type - see table below
}

The message length, just like noted above about bson, must be known to write a message to the wire. So, the .NET driver doesn’t just serialized documents to a buffer before writing them to a stream, but also the entire message.

The only other important thing to note about the header is the request id. The .NET driver simply does an atomic increment of a static variable and uses that. It also remembers the value used to send the query such that it can correlate a possible response with the request. That response will contain the request id in the responseTo field.

Wire Protocol Part 2 will continue looking at the query portion of the MongoDB wire protocol. Stay tuned.

  1. There is actually some fudge room with the document size. When we come to the write portion of the query protocol, I’ll discuss why this is important given the new Write Commands server 2.6 will be supporting.

Bson (Binary Json) is the document representation for MongoDB.

In the introduction, I discussed the purpose of this series as well as the purpose and responsibility of a MongoDB Driver. We’ll continue the series with a look at one of the foundational aspects of MongoDB - BSON.

MongoDB uses bson as its document representation. It is basically a binary form of json. This makes perfect sense because json is a nice compact way of representing documents and since MongoDB is a document-oriented database, it seems like a natural fit. However, json has some problems on its own. For one, it’s expensive to store. Storing json in a binary format allows us to save a lot of space. Json is also expensive to parse. If we had to parse a document everytime we need to compare its contents to satisfy a query, that could be quite a performance hit. All these problems were solved by creating bson.

But using bson means that all our drivers need to understand bson. When the .NET driver began, there were no shared libraries for .NET that knew how to read or write bson, and so we created our own. It is represented in the namespace MongoDB.Bson and can be used separately from the driver for storage or messaging protocols. All in all, implementing bson is relatively trivial. However, there is one question that isn’t exactly trivial to answer - how to represent bson to our users?

How to Represent Bson

Some languages have types that naturally map to json structures. For instance, Node.js doesn’t really have to do much. Python uses the built-in dict type. .NET could represent a bson document as a Dictionary<string, object>. Unfortunately, this fails in .NET land because of some special behaviors that exist in bson.

First, bson allows for the same key to exist multiple times. The json spec states that “names within an object should be unique,” but this doesn’t always hold even for json documents. Since MongoDB doesn’t prohibit this (although we seriously discourage it), drivers need to be able to handle this situation. Second, bson elements are ordered and keeping the order definitely helps the database in storage optimization.

All of this led us to create a type called a BsonDocument.

public class BsonDocument : BsonValue, 
	IComparable<BsonDocument>, 
	IEnumerable<BsonElement>, 
	IEquatable<BsonDocument>
{
	public BsonValue this[string name] { get; set; }

	public BsonValue this[int index] { get; set; }


	public bool TryGetValue(string name, out BsonValue value)
	{ 
		// ...
	}

	// etc...
}

The semantics of a dictionary are preserved. You can still do value lookups using indexers or TryGetValue. In addition, you can be order conscious and use indexes. But the most interesting item in the above code is the BsonValue type.

Closed Type System

One of the biggest problems we had to overcome was the lack of round-tripability (word?) into native .NET types. For instance, the native DateTime type in .NET has a different resolution and range than the bson date type. If we allowed you to just put an arbitrary DateTime instance into your BsonDocument, then there is a chance that data loss could occur, either when going to MongoDB or when coming from MongoDB. In addition, all dates in MongoDB are stored as UTC time. If you put in a localized DateTime, we could definitely convert it to UTC, but we wouldn’t know whether or not to convert it back to a local time on the way out. All of this is no good. So we have a type called a BsonDateTime which complies with the resolution of a bson date and time and is always in UTC, but is then user-convertible to a native DateTime type (with possible data loss). It is a user’s decision and we won’t do it automatically unless the user has instructed us to do so.

An additional benefit of a closed type system means that we can highly optimize serialization and deserialization for these specific types. We don’t have to worry about how to convert the world, just the types that are part of our closed type system. And for those who need every bit of performance out of their system, then using the BsonValue type system is definitely the way to go.

Static Languages

Another thing the .NET bson library needs to handle is static, user-constructed types. Users choose to work in a static language for a couple of reasons; either they like static languages (me) or they are forced to by their company. Either way, they are working in a static language where as bson/json/MongoDB has a dynamic schema. How can we reconcile these concepts?

It’s highly improbable that a number of documents stored together will vary wildly. You wouldn’t store a bunch of documents about your pets in the same place as a bunch of documents related to the measurment of your higgs-boson expiriments. They would be stored in different places and, as such, each of the places would have what I call an application-defined schema. In other words, documents collected together look mostly the same.

Given this information, the .NET driver team felt it important to have a way to work with plain old csharp objects (POCOs). And we still need them to be fast. For instance, reading bson into a BsonDocument and then into a POCO wasn’t acceptable. Therefore, we enabled the possibility of mapping POCOs and then reading bson directly into those POCOs (and also in reverse, POCOs directly to bson). This decision also enabled a number of other benefits we weren’t fully cognizant of at the time, namely providing a foundation for supporting automatic query generation via LINQ, a feature which .NET developers rely on heavily. For example, the .NET driver will map the below class to the shown json automatically, without the user needing to do anything.

public class Person
{
	public string Name { get; set; }

	public DateTime BirthDate { get; set; }
}
{
        "Name" : "Jack",
        "BirthDate" : ISODate("2000-01-01T00:00:00Z")
}

As far as the user is concerned, they are simply working with their POCO and never need to care that it is persisted to MongoDB or represented as bson. The LINQ query below is translated into MongoDB natively because we have implemented the IQueryable interface in the driver.

var people = from person in personCollection
             where person.Name.StartsWith("J")
             select person;
{ "Name": /^J*/ }

However, some users will care and will want to customize this experience. Internally, a number of conventions are used and users can customize everything from what is serialized to how it is serialized either via attributes or in-code configuration. I won’t go into how this works now because our documentation is very complete on this topic. Perhaps a future blog post on conventions and mapping can delve into the internals.

Conclusion

Bson is the first step that needs to happen when building a driver. Without it, a driver can’t talk to MongoDB. At this point in time, there might already be a bson library built for your language or framework, although a simple one isn’t overly difficult to get up and running. For instance, if you are writing a driver that can utilized C libraries, MongoDB makes the tremendous libbson which has had a number of micro-optimizations to make it extremely fast and reliable.

Next time, we’ll talk about the TCP and the MongoDB wire protocol.

Series introduction on building a MongoDB Driver.

Building a driver for MongoDB encompasses a wide variety of topics and deals with quite a few different areas of programming. I’d like to walk through the basics of building a driver in MongoDB over a series of posts. We’ll start with bson and go until I run out of things to talk about. In particular though, I’ll focus on the aspects and decisions that were made and are continuing to be made while composing the .NET driver. I’ll also discuss some of the places where the .NET driver diverges from other drivers. To begin with though, I’d like to give an introduction of what a driver is for MongoDB and why it’s important.

What is a Driver?

First, it’s the piece of software that sits between your application and the server. It talks to the server using a specific protocol over TCP/IP. Underneath, most drivers pool connections, handle authentication, perform automatic member discovery and failover, etc… In short, drivers manage all of the network stuff so you can focus on building your applications without needing to worry about the details of what is involved in communicating with MongoDB.

Second, a driver is responsible for making an API that is idiomatic to its target user base. The .NET driver should look like a .NET library (not a python module) and respect the conventions that typical .NET libraries use. There are most certainly concepts, algorithms, and high-level API decisions that are shared across our many drivers, but each one distinctively looks like a native library to its target user base. I believe this is one of the reasons users love MongoDB - working with a driver feels like working with everything else in their framework of choice.

Going Forward

I’ll update the below list as new articles are published. Please ask questions in the comments and I’ll do my best to answer them. The .NET driver team (Robert Stam, Sridhar Nanjundeswaran, and I) are currently refactoring large portions of our code base for our next major release, so many of the questions and decisions that need to be made are relatively fresh in my mind.

Care must be taken when putting a load balancer in front of a bunch of mongos’.

A common desire when using mongos’ is to place a load balancer between an application and the mongos’. There are a number of reasons for doing so, many of which are perfectly fine. One example would be acting as a router if it can be configured to have sticky behavior where a single client is always routed to the same server. Another example would be if the load balancer was configured to facilitate failover between a couple of mongos’. These two examples have a common theme where connections from a single client always go to the same mongos.

For the rest of this post, I’m going to explore the problem with a load balancer that is actually doing load balancing, i.e., every new connection from a client gets routed to the mongos with the least load or is used in a round-robin fashion.

The outcome of a load balancer configured this way is that queries may work initially and then stop working even though more data is available. In the .NET driver, this would manifest itself with an exception message “Cursor not found”. Before I go into the reasons for this behavior, some background is required.

Background

First, a mongos acts like a router in front of a partitioned system. A driver, the name for mongodb’s client, will connect to a mongos and let it handle which partition to route a CRUD operation. This works extremely well. MongoDB has a number of different drivers for different languages and frameworks.

Second, for queries whose results are larger than a single batch, a cursor is created. It is basically a correlating identifier for a number of batches in a driver. A driver issues a query (OP_QUERY) and receives the first batch of results. If the response from the OP_QUERY included a cursor id, it means there are more results and a driver can issue an OP_GET_MORE to retrieve the next batch. The only requirement here is that the OP_GET_MORE be issued to the same server that received it initially.

Reason

So, what is the problem? Well, given that a load balancer has been placed in front of a bunch of different servers, and that mongodb requires OP_GET_MOREs to be sent to the same server as the initial OP_QUERY, the load balancer would be required to understand the MongoDB wire protocol to guarantee this behavior. The problem is, of course, that load balancers don’t understand the MongoDB wire protocol. So, simply put, load balancing mongos’ simply won’t work, right?

Well, it’s a little trickier than that because each of our many drivers was coded a little differently. They have different authors, different needs, and most started outside of MongoDB, Inc as open-source projects. Let’s take a few examples:

PHP

The PHP driver will always send OP_GET_MOREs down the same connection it used for the initial OP_QUERY. If the connection dies, then even though the cursor could continue to be used by a different connection, it considers the query dead.

Node.js

The Node driver uses (by default) 5 connections period. It also pins the connection it uses for the initial OP_QUERY to the cursor such that each successive OP_GET_MORE always uses the same connection. This would mean all OP_GET_MOREs always go to the same server. However, there’s always a gotcha. If the connection used for the initial OP_QUERY happens to die, then the node driver will replace it with a different, new connection. There is no guarantee that this connection will be for the same server. While this is a somewhat exceptional circumstance, it holds that the node driver can’t be guaranteed to work with a load balancer.

Java

Many of the drivers copied our Java driver’s modus operandi which uses thread affinity. This means that under normal operation conditions, each thread is bound to a connection. As such, each OP_QUERY and OP_GET_MORE will most likely use the same connection and therefore the same mongos behind the load balancer. However, under heavy loads where the number of threads is greater than the maximum size of the connection pool, a thread without a connection may steal one from a different thread. This, of course, has a nice ripple effect where threads are all stealing from each other and there is no guarantee anywhere.

.NET

The .NET driver doesn’t do this thread affinity thing unless requested. Instead, everytime a connection is needed, it pulls an available one out of the connection pool. This gives a number of benefits, the greatest being that less connections are needed because each is utilized more frequently. The downside, of course, is that it doesn’t work with load balancers. We do provide a way for users to perform a “batch” of operations on the same connection by opting-in to the thread affinity mode. This alone solves the problem for .NET users who are forced to work in this type of environment because we don’t steal connections from other threads. This does limit overall throughput because connections are now exclusively checked-out until they are released. So, if your max connection pool size is 100, then only 100 requests can be handled at a given time.

Summary

Long story short, using a load balancer to actualy do load balancing between mongos’ is likely going to cause some problems. Rather, I recommend that a mongos exists on each application server. This has a few benefits, the foremost being that there is no network involved between the application and the mongos. If this is not possible due to security configurations or policies, mongodb most certainly supports connecting to one or more mongos across a network and most of our drivers support automated failover between them. In fact, some drivers already support load balancing between mongos themselves and the rest will likely support this in the future as well.

So, if you are planning on using a load balancer, take care during configuration. If you have any questions, please post them to mongdb-user.