Parallelism
Flink provides scalibility of a job through parallelism, and the AMPS Flink connectors utilize parallelism as a simple way to improve performance.
AMPS Sink Parallelism
The AMPS Sink can use parallelism to create multiple clients to publish to AMPS in parallel. The following example uses two clients to publish to AMPS by setting parallelism to 2:
AMPSSink<String> sink = AMPSSink.<String>builder()
.setUri(uri)
.setTopic(topic)
.setSerializationSchema(new SimpleStringSchema())
.build();
DataStream<String> ds = ...;
ds.sinkTo(sink).setParallelism(2);
Order
When parallelism is greater than 1, there is no guarantee that all the messages in a data stream will be published in that exact order to AMPS. For example, if an AMPS Sink with a parallelism of 2 receives the data stream 1 -> 2 -> 3 -> 4, then Flink might assign the first client 1 and 3, and the second client might get 2 and 4. When publishing to AMPS, it is possible that some messages from one client arrive before some from the other client despite those messages appearing later in the data stream.
If the exact order of a data stream must be maintained, use a parallelism of 1. This ensures that a single client publishes the messages in the order that they arrive.