From Day-1, we’ve built AMPS to be content aware, yet message-type agnostic. As such, we’re often asked which message-type we think is best. The best message type, in most situations, is dependent on the use-case. In this article, we drill-down into what factors you should consider when selecting a message type, the benefits/drawbacks of each message-type and the functionality trade-offs specifically when it comes to AMPS.
Message Type Considerations
There are a gazillion message types one could use, each having been created to offer some distinct benefit over other existing message types. For example, the FIX message type is used within financial services and is simple to parse, serialize, and view over a network. The Protobuf message type is designed to be a generic binary message type with a format enforced by a schema definition.
When selecting the best message type for your use case, it’s a good idea to consider the following: serialization, parsing, language support, and size.
Serialization and Parsing
Whatever format you select, messages will need to be serialized into that format. If your use-case is performance critical, then you’ll want to look at the serialization performance for the types of messages you’ll be sending. Make sure the programming languages your team uses has support for efficient serialization from data structures into the message format and parsing back into a data structure.
If you’re a performance critical use case, as many AMPS customers are, you’ll want to pay careful attention to any garbage collection or wasted cycles in the parsing path. For example, if you receive a 2KB message, is there an efficient way to get just a single value out of that message or does the entire message need to be parsed?
You’ll want to consider the full path from message construction (serialization) through to the consumption of that message (parsing) when determining which message type is the best fit for your use-case.
For example, general message formats such as Protobuf could be a great message type for a back-end system written in a variety of programming languages. However, if the final target of the message is a Javascript application running within a browser, then JSON is likely a better message type choice to optimize for the user experience, UI response time, and even battery lifetime (for mobile apps) – JSON was developed from the Javascript object model, and browsers have built-in parsers for JSON that are much more efficient than the parsers for any other format.
Message Size
Message types encode values and data types in different ways. You’ll want to make sure your message type choice has an acceptable data type “bloat” for the data types you plan using. For example, a message type like a simple “C-struct” or Protobuf message can efficiently encode a large array of 1000 double-precision floating point values in around 8000 bytes. However, using BSON or JSON could easily be 1.5x to 3x larger to store the same large array.
There can be a large variance on encoded data size between message types, so you’ll want to test the message types you are considering with the data you plan on using.
Message Type Properties
The 3 most important properties of message types are if the message type is “binary” (as opposed to utf-8, latin-1, ASCII, etc.), whether the type supports a hierarchical structure, and whether the type requires formal schema definition documents. Each property has unique benefits and drawbacks that can dramatically impact system performance, developer productivity, and future flexibility.
Binary Message Types
Message types that encode their data in binary form have a distinct advantage of being able to maintain the precision of their data. Message types that are encoded in UTF-8 or ASCII can lose floating point precision during serialization and parsing. If you need to transmit 64-bit floating point numbers without any loss of precision, then using a binary message type may be your only choice. On the other hand, if your data only needs a few decimal places of precision, then this loss of precision may not matter.
Hierarchical Message Types
Some message types work best for “flat” message layouts, while others are designed to work with hierarchical data. Hierarchical messages are, in our experience, more expensive (in time and space) to serialize and parse. If you’re using hierarchical messages, then you’ll typically be leveraging “array” data structures as well, which add to the complexity of your parsing and more surface area to your application testing. Some applications need a hierarchical structure to accurately represent the data. In other cases, though, a hierarchical structure isn’t necessary and using a flat (or flatter) data structure can improve performance.
Schema Definitions
When you application receives a stream of bits, it needs to understand the message framing and how to extract data from the message. Message types with schema support require the serialization into a specific format matching the schema that programs parsing the message can later use to determine the message layout. Messages without explicit schema definitions will have an implicit schema encoded in the message layout itself, otherwise a parser of the message won’t know which bits correspond to which data.
For example, schema-based Protobuf, can use the following schema definition for an Order:
message Order {
int32 id = 1;
int32 quantity = 2;
float price = 3;
int32 product_id = 4;
}
A single order will take 16-bytes on the wire and the receiver of the Order message will use the Protobuf schema (the .proto file) to decipher what those 16-bytes mean in terms of the Order properties.
On the other hand, if we were to send a similar message in a schema-less message type like JSON, the message could look something like the following:
{"id": 1, "quantity": 100, "price": 123.45, "product_id": 42}
It’s easy to see that the JSON message is much larger than 16-bytes, because the message schema/layout is encoded in each message itself.
Schemas are fantastic ways to reduce the size of messages, but they come at a cost with reduced system flexibility and developer agility. For example, adding a “timestamp” attribute to our Order example would require a new Schema is rolled-out to all producers and consumers of the message – otherwise a producer may be producing an older Order without a “timestamp” while a consumer expects the message to have a “timestamp” attribute. Compare that to JSON, where the producers can inject the new timestamp attribute at anytime and the consumer will see it and be able to use it when it exists.
Some teams see the schema-based types as too rigid and inflexible, while others see the fluidity of message types like JSON as having a dangerous lack of contract between the producers and consumers of the messages. There’s no single answer that works for all applications, but it’s important to consider the tradeoffs and be aware of what constraints you are choosing.
Message Type | Schema | Binary | Hierarchical |
---|---|---|---|
JSON | ○ | ○ | ⬤ |
FIX | ◍ | ○ | ◍ |
XML | ◍ | ○ | ⬤ |
ProtoBuf | ⬤ | ⬤ | ⬤ |
MessagePack | ○ | ⬤ | ⬤ |
BSON | ○ | ⬤ | ⬤ |
BFlat | ○ | ⬤ | ○ |
C-Structs | ⬤ | ⬤ | ⬤ |
⬤: Full Capability ◍: Partial Capability ○: No Capability
End-to-end Performance
One of the reasons we’ve built AMPS to be message type agnostic, is because we’ve found in decades of working on high-performance systems, that one of the most common places to unnecessarily introduce latency is message type conversions.
For example, if you’re storing FIX data into a classic RDBMS you need to convert the FIX data into SQL insert statements that map the FIX data to columns within the database table. If you want to later read that record and format into JSON, you need to select the data out and then convert into JSON.
Even in cases where the message type of the producer and consumer are the same, if the intermediary data store or broker requires a different data format, then you’ll add latency to your messaging with conversions alone.
When your datastore or broker can natively store data in at least the producer or consumer’s format, this can cut down the latency costs by 1/2. If the datastore or broker, producer, and consumer can all natively use the same format, then there’ll be no latency increase due to format conversions.
Therefore, an additional consideration for your message type selection should be the intermediary systems that these messages pass through. Make sure you can store, query, and retrieve data in your message type of choice.
AMPS and Message Types
We’ve worked hard to make the message type you select to use with AMPS a choice of functionality and policies of your choosing. This is reflected in our messaging performance, which at a 10% match rate you can see the content filtered performance of every message type is outstanding – maxing out at nearly 1 million messages per second per CPU core for every message type (except for that darn XML – which is, unfortunately, both complex and verbose!) The graph below shows the results with relatively small messages, using content filtering with a 10% match rate, and running on a relatively fast (as of March 2022) machine.
AMPS is designed to let you choose which message type works best for you. Even better, there’s no need to be restricted to the list of message types provided with AMPS, because we have APIs for extending AMPS functionality and content-awareness to other message types. This means we (or customers) can include new message types without changing the AMPS server itself.
Most functionality is supported on every type that ships out-of-the-box with AMPS. However, there are key differences with Protobuf when it comes to delta messaging (you need to be using Protobuf version 2 or greater than 3.15) and with real-time aggregation. Real-time aggregation doesn’t work with strict schema data types, since it’s not necessarily possible for AMPS to guarantee that the aggregation is valid for the schema (except for within real-time aggregated JOINs that have a result message type in one of the other message types supporting real-time aggregation.)
AMPS Message Type | Content Filtering | Delta Messaging | Realtime Aggregation |
---|---|---|---|
JSON | ⬤ | ⬤ | ⬤ |
FIX | ⬤ | ⬤ | ⬤ |
XML | ⬤ | ⬤ | ⬤ |
ProtoBuf | ⬤ | ⬤ | ○ |
MessagePack | ⬤ | ⬤ | ⬤ |
BSON | ⬤ | ⬤ | ⬤ |
BFlat | ⬤ | ⬤ | ⬤ |
C-Structs | ⬤ | ○ | ○ |
“Unparsed Binary” | ○ | ○ | ○ |
⬤: Full Capability ○: No Capability
Still Struggling?
Hopefully these considerations help with selection of the best message type for your application or solution space. If you’re still struggling with what to choose, then we’d suggest these tips:
-
Web/mobile applications are increasing in popularity: go with JSON if you can.
-
If you’re always in a high-frequency feedback loop with your own users/customers, select a flexible (non-schema) message type, such as JSON or MessagePack.
-
Minimize data hierarchy within your messages (the “flatter” the better) – go at most one level beyond the “root” level.
-
For serialization and parsing performance, don’t ever use XML or BSON.
-
If you have the choice of which fields to include in your messages, include fewer fields for better performance.