Skip to main content

Parallelism

Kafka Connect provides scalability through connector tasks. The AMPS Kafka source can run multiple tasks when the AMPS subscription is partitioned with task-specific filters.

AMPS Source Parallelism

When tasks.max is greater than 1, the source requires taskFilter. The connector uses the configured taskFilter as a format string and generates a different AMPS content filter for each task.

The following example creates two source tasks and partitions messages by /id:

{
"name": "amps-kafka-source",
"config": {
"connector.class": "com.crankuptheamps.kafka.AMPSKafkaSource",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"header.converter": "org.apache.kafka.connect.storage.StringConverter",
"clientFactoryClass": "com.crankuptheamps.kafka.AMPSBasicClientFunction",
"clientName": "KafkaSource",
"uri": "tcp://localhost:9007/amps/json",
"ampsTopic": "SourceTest",
"taskFilter": "/id MOD %d = %d",
"maxBuffers": "10",
"maxBatch": "1000"
}
}

With this configuration, the connector creates one task using /id MOD 2 = 0 and another task using /id MOD 2 = 1. If filter is also configured, the connector combines the base filter and the task filter with AND.

info

The source also appends the task index to the configured clientName. For example, KafkaSource becomes KafkaSource_0 and KafkaSource_1.

Order

Messages remain ordered within each task's AMPS subscription, but there are no order guarantees across tasks. If exact global ordering is required, use a single source task.