Inserting Data Using Kafka

Apache Kafka is a distributed streaming platform. It allows you to create publishers, which create data streams, and consumers, which subscribe to and ingest the data streams produced by publishers.

You can use the OmniSci Core Database StreamInsert C++ program to consume a topic created by running Kafka shell scripts from the command line. Here are the steps to use a Kafka producer to send data and a Kafka consumer to store the data in the OmniSci Core Database.

You can also use the recommended KafkaImporter utility in place of the StreamInsert utility used in this example.

This example assumes you have already installed and configured Apache Kafka. See the Kafka website.

Creating a Topic

Create a sample topic for your Kafka producer.

  1. Run the kafka-topics.sh script with the following arguments:

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
    --partitions 1 --topic matstream
    
  2. Create a file named myfile that consists of comma-separated data. For example:

    michael,1
    andrew,2
    ralph,3
    sandhya,4
    
  3. Use mapdql to create a table to store the stream.

    create table stream1(name text, i1 int);
    

Using the Producer

Load your file into the Kafka producer.

  1. Create and start a producer using the following command.

    cat myfile | bin/kafka-console-producer.sh --broker-list localhost:9097
    --topic matstream
    

Using the Consumer

Load the data to the OmniSci Core Database using the Kafka console consumer and the StreamInsert program.

  1. Pull the data from Kafka into the StreamInsert program.

    ./bin/kafka-simple-consumer-shell.sh --broker-list localhost:9097 --topic matstream
    
    --from-beginning | /home/mapd2/build/bin/StreamInsert --port 9091  -p
    HyperInteractive --database mapd --table stream1 --user mapd --batch 1
    Field Delimiter: ,
    Line Delimiter: \n
    Null String: \N
    Insert Batch Size: 1
    1 Rows Inserted, 0 rows skipped.
    2 Rows Inserted, 0 rows skipped.
    3 Rows Inserted, 0 rows skipped.
    
  2. Verify that the data arrived in mapdql.

    mapdql> select * from stream1;
    name|i1
    michael|1
    andrew|2
    ralph|3
    sandhya|4