In the Hitchhiker’s Guide to the Galaxy, Douglas Adams conveyed to us that the earth was actually a computer system designed to calculate the meaning of life. SPOILER ALERT: the answer was 42. Unfortunately, not all our answers come out as a concise short int format. Many of the simulation farms or grids that AMPS is deployed in produce vast amounts of data. This data is often an intermediate step in a workflow of calculations. In fact, we get asked about the movement of large data sets so often that we previously devoted an entire blog post to the topic. In this article, we are going to highlight how we can be flexible in our treatment of very large messages through the use of the new AMPS composite message types.
Finding The Big Answers
Many people use AMPS to efficiently route messages to their appropriate destinations based on powerful content filtering. With AMPS, it’s straight forward to navigate through complex data in a wide variety of formats including JSON, BSON, FIX, NVFIX, and XML. AMPS also offers an unparsed binary type (or BLOB) that lets us send any data we want, such as a serialized object, but trades off the ability to filter based on the content of the data. Also, by supporting unparsed binary, it not only helps avoid the costs of serialization and deserialization but allows AMPS to work the variety of similar formats found in customer deployments. We have also seen such binary formats used to “hide” content from AMPS, for example, large or deeply-nested XML documents that you don’t want parsed. When a message type is declared as binary, AMPS won’t attempt to parse it.
When the messages become very large, people often consider some form of optimization or otherwise transform them into a binary format (i.e. sending a series of one byte chars as sequential bytes on the wire instead of a JSON array). This can make sense if the data can be optimized effectively and alleviates memory and network loads while not incurring too much of a latency hit. Another challenge that it introduces is that we often lose the capability to do content-based filtering or routing. If we already know where they are supposed to go, then we can create the appropriate topic (target) and send them. Unfortunately, the use of explicit topics makes for a more brittle system and incurs the costs of topic management as more and more applications and topics are added to the system.
The answer? AMPS composite message types.
With most traditional pub-sub systems the work around for handling binary payloads and rigid topic hierarchies was to utilize custom headers as a vehicle to store meta-data which could be parsed by the routing agents. Now with AMPS composite message types, we enable developers to avoid such a traditional hack and embrace content filtering on regular message parts that are not limited to any particular size or structure. Unlike headers, composite message types can contain arbitrary data, and are fully filterable.
Fortunately, if we desire to leverage optimized payloads as well as enabling critical content based filtering, we can employ composite message types. Akin to MIME, the payload can remain untouched while content filtering can be performed on the accessible parts or metadata.
In many financial services applications, the actual bulk of the payload are doubles or floating point numbers, and the data that is useful for filtering and routing is metadata and a small subset of critical information. Ideally, we would send the metadata in a format AMPS can filter, while maintaining the bulk of the payload in an efficient binary format.
No single message type meets both these needs, but the AMPS composite message type is ideal for this situation. We can store the information to be filtered on in a JSON part in a preprocessing step. We can combine that part with the BLOB payload to form a composite message type. We can treat it as a regular AMPS message type and filter/route it based on the JSON part of the message, all while maintaining the optimized payload until it is needed.
Composite message types can be treated like any other message and can be leveraged to create a SOW, conflated topics, or even with delta subscriptions. In terms of setting your application up for using it, you will just need to update your configuration file to declare your composite JSON-Binary message type. One has to just name the part and declare its parts (message types). After that, we just have to bind it to a transport.
<MessageType>
<Name>composite-json-binary</Name>
<Module>composite-local</Module>
<MessageType>json</MessageType>
<MessageType>binary</MessageType>
</MessageType>
<Transport>
<Name>composite-json-binary-tcp</Name>
<Type>tcp</Type>
<InetAddr>9023</InetAddr>
<MessageType>composite-json-binary</MessageType>
<Protocol>amps</Protocol>
</Transport>
Searching the Whole or a Part
AMPS provides two different ways to create an AMPS composite message, depending on how you want AMPS to parse the message.
The composite-local
option ensures that an XPath identifier can match any of the parts of the message, and that a filter can match a specific part of the message using the ordinal value of the message part. For example, /0/mymessage=123.4
would test the json message’s field mymessage
.
The alternative option is composite-global
, which combines all of the parts into a single set of XPaths. This lets you find values without having to know which part of the message contains the value.
Build a Message
The AMPS 4.3.1.0 clients include helper classes for building and parsing composite messages. To build a composite message, we just have to create the message and append()
each part.
Say we had a large set of doubles that represented the results from a scenario calculation. That would be the binary part of the message. We would then take any essential data from that message and, along with any other enrichment information, we would create a JSON message which would be the other part.
std::ostringstream json_part;
std::vector<double> data;
// …skipping the population of variables…
// Create the payload for the composite message.
AMPS::CompositeMessageBuilder builder;
//insert the json part of the message
builder.append(json_part.str());
// copy the array of doubles into second message part
builder.append(reinterpret_cast<const char*>(data.data()),
data.size() * sizeof(double));
// Create publish the payload on the topic
std::string topic("messages");
ampsClient.publish(topic.c_str(), topic.length(),
builder.data(), builder.length());
And that is all we have to do to create and publish a composite message.
On the subscriber side, we instantiate a CompositeMessageParser
to access the distinct parts of the composite message.
AMPS::CompositeMessageParser parser;
We then create our subscription and upon receipt, we can parse the message and obtain the distinct parts with getPart()
.
for (auto message : ampsClient.subscribe("messages"))
{
parser.parse(message);
std::string json_part(parser.getPart(0));
AMPS::Field binary = parser.getPart(1);
...
}
Of course we don’t have to be so explicit, we could first use the parser object to obtain a count of how many parts the message contained.
std::cout << "Received message with " << parser.size() << " parts" << std::endl;
To make the binary payload usable, we convert it back into a vector.
std::vector<double> vec;
double *array_start = (double*)binary.data();
double *array_end = array_start + (binary.len() / sizeof(double));
vec.insert(vec.end(), array_start, array_end);
In this blog, we looked at how composite messages provides great flexibility on how we can optimize our messages by having them encapsulate different message types. The primary use cases are to allow for payloads that do not need to be parsed by AMPS (i.e. large binaries, custom formats etc) as well as cases that could benefit from optimized or hidden data. With AMPS, having the metadata available in a supported type such as JSON affords us the added luxury of content filtering on that particular part – without having to parse the binary payload.
In the future we will analyze more options that could be employed to improve transport optimizations and provide life to those systems still using FIX with Base64 encoded payloads. We can also discuss best practices around Protocol Buffers or Avro and/or leverage modern compression systems such as snappy. In our experience, the best choice of tools and strategies are highly dependent on the use case and characteristics such as system load, tolerance for latency, and burden of maintenance.
Let us know how you think Composite Message types may help you and we can work through the ideas with you – just “Don’t Panic” (Douglas Adams).