Geolocation data streams are central to vehicle tracking and routing applications. Let’s look at the geolocation data in a ride-sharing system. In this example the geolocation data messages are small, only around 500 bytes. The messages typically include GPS coordinates, timestamps, and vehicle IDs.
The message rate of geolocation data streams is high, clocking in at two thousand messages per second (m/s). The streaming pattern tends to be cyclical, with peak activity during commuting hours.
The Kafka partition calculator inputs below are good starting points for geolocation data streaming projects with small, medium, and large fleets of vehicles generating data.
- The brokers are of similar capability.
- The load on the brokers’ machines is similar.
- The messages don't diverge too much in size.
- The messages are evenly distributed across all partitions.
- The number of brokers makes sense in this context.
- Brokers have similar latencies between producers and consumers.
- The throughput per producer is less than 10MB/s.
- Individual brokers have less than 40k partitions.
- The cluster has less than 200k partitions in total.