  1. apache kafka is a publish-subcribe messaging rethought as a distributed commit log

  2. use cases

    1. messaging comparable to traditional messaging systems such as ActiveMQ or RabbitMQ

    2. website activity tracking

    3. metrics

    4. log aggregation

    5. stream processing

    6. event sourcing

    7. commit log


  1. dependency

         $ brew deps kafka
  2. install

         $ brew install kafka
         ==> Installing kafka dependency: gradle
         ==> Downloading https://downloads.gradle.org/distributions/gradle-2.4-bin.zip
         ######################################################################## 100.0%
         🍺  /usr/local/Cellar/gradle/2.4: 177 files, 48M, built in 2.9 minutes
         ==> Installing kafka
         ==> Downloading https://homebrew.bintray.com/bottles/kafka-
         ######################################################################## 100.0%
         ==> Pouring kafka-
         ==> Caveats
         To start Kafka, ensure that ZooKeeper is running and then execute:
           kafka-server-start.sh /usr/local/etc/kafka/server.properties
         To have launchd start kafka at login:
             ln -sfv /usr/local/opt/kafka/*.plist ~/Library/LaunchAgents
         Then to load kafka now:
             launchctl load ~/Library/LaunchAgents/homebrew.mxcl.kafka.plist
         ==> Summary
         🍺  /usr/local/Cellar/kafka/ 3817 files, 38M

quick start

  1. single node

    1. download the code

       $ tar -xzf kafka_*.tgz
       $ cd kafka_*
    2. start the server

       # single-node zookeeper instance
       $ zookeeper-server-start.sh config/zookeeper.properties
       # start the kafka server
       $ kafka-server-start.sh config/server.properties
    3. create a topic

       # create a topic name `test`
       # with single partition
       # and only one replica
       $ kafka-topics.sh --create --zookeeper \
       localhost:2181 --replication-factor 1 \
       --partition 1 --topic test
       # list topic command
       $ kafka-topics.sh --list --zookeeper localhost:2181
    4. send some messages

       # command line client
       # take input form a file
       # form standard input
       # and send out as messages to kafka cluster
       $ kafka-console-producer.sh --broker-list localhost:9092 \
       --topic test
       this is a message
       this is another message
    5. start a consumer

       # command line consumer
       # dump out message to standard output
       $ kafka-console-consumer.sh --zookeeper localhost:2181 \
       --topic test --from-beginning
  2. setting up a multi-broker cluster

    1. make a config file for each brokers

       $ cp config/server.properties config/server-1.properties
       $ cp config/server.properties config/server-2.properties
       # edit config files
       $ pico config/server-1.properties
       $ pico config/server-2.properties
    2. just start two new nodes

       $ kafka-server-start.sh config/server-1.properties &
       $ kafka-server-start.sh config/server-2.properties &
    3. create new topic with replication factor of three

       $ kafka-topics.sh --create --zookeeper localhost:2181 \
       --replication-factor 3 --partition 1 --topic my-replicated-topic
    4. find out which broker is doing what

       $ kafka-topics.sh --describe --zookeeper localhost:2181 \
       --topic my-replicated-topic
       $ kafka-topics.sh --describe --zookeeper localhost:2181 \
       --topic test
    5. publish a few messages to our new topic

       $ kafka-console-producer.sh --broker-list localhost:9092 \
       --topic my-replicated-topic
       my test message 1
       my test message 2
    6. consume these messages

       $ kafka-console-consumer.sh --zookeeper localhost:2181 \
       --from-beginning --topic my-replicated-topic
       my test message 1
       my test message 2
    7. test out fault-tolerance

       $ ps | grep server-1.properties
       $ kill -9 7564
    8. leadership has switched to one of the slaves

       $ kafka-topics.sh --describe --zookeeper localhost:2181 \
       --topic my-replicated-topic
    9. messages are still be available

       $ kafka-console-consumer.sh --zookeeper localhost:2181 \
       --from-beginning --topic my-replicated-topic


  1. running-a-multi-broker-apache-kafka-cluster-on-a-single-node

    1. diagram

                           | kafka cluster |                     
                           |               |                     
                           |  +----------+ |                     
                     +------->| broker1  |-------+               
                     |     |  +----------+ |     |               
                     |     |  +---+  +---+ |     |               
                     |     |  | P0|  | P2| |     |               
                     |     |  | R1|  | R1| |     |               
                     |     |  +---+  +---+ |     |               
       +----------+  |     |               |     |   +----------+
       |          |  |     |  +----------+ |     |   |          |
       | producer ---+------->| broker2  |-------+---->consumer |
       |          |  |     |  +----------+ |     |   |          |
       +----------+  |     |  +---+  +---+ |     |   +----------+
                     |     |  | P0|  | P1| |     |               
                     |     |  | R2|  | R2| |     |               
                     |     |  +---+  +---+ |     |               
                     |     |               |     |               
                     |     |  +----------+ |     |               
                     +------->| broker3  |-------+               
                           |  +----------+ |                     
                           |  +---+  +---+ |                     
                           |  | P1|  | P2| |                     
                           |  | R3|  | R3| |                     
                           |  +---+  +---+ |                     
                           |               |                     
                           |  +----------+ |                     
                           |  | zookeeper| |                     
                           |  +----------+ |                     
    2. log.dir /usr/local/var/lib/

       $ tree kafka-logs kafka-logs1 kafka-logs2
       ├── my-replicated-topic-0
       │   ├── 00000000000000000000.index
       │   └── 00000000000000000000.log
       ├── recovery-point-offset-checkpoint
       ├── replication-offset-checkpoint
       └── test-0
           ├── 00000000000000000000.index
           └── 00000000000000000000.log
       ├── my-replicated-topic-0
       │   ├── 00000000000000000000.index
       │   └── 00000000000000000000.log
       ├── recovery-point-offset-checkpoint
       └── replication-offset-checkpoint
       ├── my-replicated-topic-0
       │   ├── 00000000000000000000.index
       │   └── 00000000000000000000.log
       ├── recovery-point-offset-checkpoint
       └── replication-offset-checkpoint

