Frequently asked questions

  1. Why does my consumer get InvalidMessageSizeException?

    This typically means that the "fetch size" of the consumer is too small. Each time the consumer pulls data from the broker, it reads bytes up to a configured limit. If that limit is smaller than the largest single message stored in Kafka, the consumer can't decode the message properly and will throw an InvalidMessageSizeException. To fix this, increase the limit by setting the property "fetch.size" properly in config/consumer.properties. The default fetch.size is 300,000 bytes.
  2. On EC2, why can't my high-level consumers connect to the brokers?

    When a broker starts up, it registers its host ip in ZK. The high-level consumer later uses the registered host ip to establish the socket connection to the broker. By default, the registered ip is given by InetAddress.getLocalHost.getHostAddress. Typically, this should return the real ip of the host. However, in EC2, the returned ip is an internal one and can't be connected to from outside. The solution is to explicitly set the host ip to be registered in ZK by setting the "hostname" property in server.properties.
  3. My consumer seems to have stopped, why?

    First, try to figure out if the consumer has really stopped or is just slow, using our tool ConsumerOffsetChecker.
    bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect zkhost:zkport --topic topic1
    consumer-group1,topic1,0-0 (Group,Topic,BrokerId-PartitionId)
                Owner = consumer-group1-consumer1
      Consumer offset = 70121994703
                      = 70,121,994,703 (65.31G)
             Log size = 70122018287
                      = 70,122,018,287 (65.31G)
         Consumer lag = 23584
                      = 23,584 (0.00G)
    
    If consumer offset is not moving after some time, then consumer is likely to have stopped. If consumer offset is moving, but consumer lag (difference between the end of the log and the consumer offset) is increasing, the consumer is slower than the producer. If the consumer is slow, the typical solution is to increase the degree of parallelism in the consumer. This may require increasing the number of partitions of a topic. If a consumer has stopped, one of the typical causes is that the application code that consumes messages somehow died and therefore killed the consumer thread. We recommend using a try/catch clause to log all Throwable in the consumer logic.