Best practices

Geolocation data streams are central to vehicle tracking and routing applications. Let’s look at the geolocation data in a ride-sharing system. In this example the geolocation data messages are small, only around 500 bytes. The messages typically include GPS coordinates, timestamps, and vehicle IDs.

The message rate of geolocation data streams is high, clocking in at two thousand messages per second (m/s). The streaming pattern tends to be cyclical, with peak activity during commuting hours.

The Kafka partition calculator inputs below are good starting points for geolocation data streaming projects with small, medium, and large fleets of vehicles generating data.

Kafka UI for your team

Results

number of producers
64
number of consumer
192
Expected lag
100
number of partitions
192

Business size preset

The size of the company in terms of article production (not number of employees, readership numbers, market capitalization, etc…)
Medium

Number of brokers

The number of brokers in the cluster where the topic will be created.
1 140

Producer processing time

The average time it takes to produce a message in ms.
ms
0.1 ms1,000 ms

Consumer processing time

The average amount of time it takes to consume a message in ms.
ms
0.1 ms1,000 ms

Throughput

The amount of messages that the system should process per second.
msg/s
1 msg/s10,000 msg/s

How to increase partitions

Learn how to to increase your topic partitions and what effects this will have on your cluster.

Read full article

Recommended configuration for a medium business size

  • Expected lag (L) of X1, a reasonable latency given the rapid update frequency.
  • Recommended number of producers (RP) is X2, facilitating the high throughput.
  • Recommended number of consumers (RC) is X3, allowing for timely consumption of the messages.
  • The number of partitions for topics (NP) should be X4, ensuring efficient data distribution within the prescribed partition limits.

Each element of this configuration plays a significant role in optimizing Kafka's use in the ride-sharing industry. This configuration ensures that drivers and passengers experience minimal latency in location updates, contributing to smooth and responsive services.

Understanding the specific needs of a medium-sized company in this context, with moderate data flow and balanced Kafka configurations, is key to achieving optimal message processing.

In conclusion, the effective use of Kafka in ride-sharing services depends on a nuanced understanding of the industry's unique demands and a precise alignment of Kafka's powerful features to meet those needs.