Illustration Image

6/25/2022

Reading time:3

Apache Cassandra Lunch #68: DataStax Apache Kafka Connector - Business Platform Team

logo

This resource is based on an article originally published here.

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce theDataStax Apache Kafka Connectorand discuss how we can use it to connectApache KafkaandCassandra. In the video recording embedded below, we go through some basic information regarding the connector, basic architecture of how it works, and also go through a simple Katacoda example from DataStax to show you how to use the connector. Additionally, we also discuss how we use the DataStax Apache Kafka Connector in ourCassandra.Realtimerepo, so be sure to check out the embedded video below!

The DataStax Apache Kafka Connector is open source software that works with the Kafka Connect framework. It synchronizes records from a Kafka topic with table rows in the following supported databases: DataStax Astra cloud databases, DataStax Enterprise (DSE) 4.7 and later databases, and Open source Apache Cassandra® 2.1 and later databases. The connector gets deployed on the Kafka Connect Worker nodes and runs within the worker JVM. The connector Workers running one or more instances of the DataStax Kafka Connector pull messages from Kafka topics and write them to a database table on the DataStax platform using the DataStax Enterprise Java driver.

Basic Architecture
Basic Architecture
  • Each instance of the DataStax Apache Kafka Connector creates a single session with the cluster.
    • A single connector instance can process records from multiple Kafka topics and write to several database tables.
  • Data is pulled from the Kafka topic and written to the mapped table using a CQL batch that contains multiple write statements.
  • A map specification binds a Kafka topic field to a table column. 
    • Fields that are omitted from the specification are not included in the write request. 
    • Fields with null values are written to the database as UNSET (see nullToUnset). 
    • To ensure proper ordering, all records are written using the Kafka record timestamp.
  • Use multiple connectors when different global connect settings are required for different scenarios, such as writing to different clusters or datacenters.
  • The Datastax Connector tasks store the offsets in config.offset.topic. 
    • In the event of a failure, the DataStax Connector task resumes reading from the last recorded location.
  • Ingest data from Kafka topics with records in the following data structures:
    • Primitive type values, such as integer or string
    • Complex field values in record types:
      • JSON formatted string
      • Kafka Struct
      • Avro
  • Built-in SSL, LDAP/Active Directory, and Kerberos integration
  • More Features: https://docs.datastax.com/en/kafka/doc/kafka/kafkaFeatures.html

The demo portion of Apache Cassandra Lunch #68: DataStax Apache Kafka Connector is split into two parts as mentioned above. In the first portion, we cover a DataStax Katacoda Scenario in which we create a Kafka topic, configure and start a Kafka Connect Worker, download and configure the DataStax Kafka Connector, and push data from the topic in Kafka to a Cassandra instance. In the second portion of the demo, we take a look at

Cassandra.Realtimeand discuss how that walkthrough uses the same basics we covered in the Katacoda scenario. If you want a more in-depth discussion and video demo, be sure to watch the embedded Youtube video below!

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

logo
cluster
troubleshooting
datastax

Explore Further

datastax

cassandra.lunch

cassandra

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?