How to get your Apache Kafka Client code right?
Recently I have got an opportunity to work on a Kafka client implementation for an interesting use case. Till then my assumption was writing a Kafka client was as easy as we see in many examples on the net :) Although it's true for many see cases, we cannot say the same simple client work for all. Depending upon the complexity of the use case that you may deal with, the client's implementation might change and complexity may increase.
Please note that this is not an introductory article on Kafka, also expected you to have some basic understanding of Kafka's client. I found the following article very useful while learning the basics of Kafka client: https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/ . This post will refer to various topics from this article as we move forward.
In this post, I am sharing three common scenarios that you may need to deal with while using Kafka client in an message heavy system and possible solutions or patterns followed by the community for addressing these needs.
How to handle errors while processing messages?
A typical message-heavy system that uses Kafka may involve various services consuming the incoming messages. Depending upon the use cases, you may have a generic Kafka client that passes the incoming message to the distributed services for further processing. There is no guarantee that these services are always available or successful in processing messages. The question here is how to handle this scenario. There are many solutions or patterns available to handle the errors. Please take a look at this article, it's well documented: https://www.confluent.io/blog/error-handling-patterns-in-kafka/
What is Kafka rebalancing and how to handle it in the client?
In general, Kafka rebalancing happens when a new consumer is either added into the consumer group or removed or when the partition assigned to the consumer group changes. To learn how it's handled in Kafka client, please take a look at the following section: https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/#group-rebalancing
How to rate-limit the incoming messages?
There could be a scenario where a Kafka producer publishes the messages at a higher rate than the consumer can process. We typically refer to this scenario as back pressure on messaging queue. Now the question here is how the back pressures are handled in Kafka client? Technically the answer is simple, Kafka consumers are pull-based so they can request messages only when they are comfortable dealing with a new batch of messages. Kafka client typically implements this using pause() and resume() APIs. Please see this section in the above article: https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/#processing-order-guarantees
What next?
Thanks for staying with me till this point. As the next step, let us build a reusable Kafka client that addresses all the above points for use in a message heavy system. Say tuned for the next post!
Here is the next post that talks about real-life Kafka client implementation: https://www.jobinesh.com/2021/10/a-real-life-kafka-client-implementation.html
Appreciate you sharing, great article.Much thanks again. Really Cool.
ReplyDeletehadoop training
hadoop online training