--
In the world of NoSQL databases, Cassandra stands out as a powerful and highly scalable option. It is designed to handle large volumes of data across multiple nodes and data centers. In this blog, we will dive into the essential concepts of Cassandra, including clusters, keyspaces, databases, nodes, data centers, and the relationships between them. Additionally, we’ll explore critical concepts like primary keys, clustering keys, partition keys, indexing, and the differences between them. We will also cover basic queries, including create, alter, drop, and select, and conclude with a discussion on the ‘where’ clause and its rules, explained in simple terms.
In the end as a bonus we will create an node application to perform CURD operation using cassandra database.
Understanding Cassandra’s Structure
Cluster :
Cassandra is built to be distributed and highly available, making it an excellent choice for handling large datasets. A Cassandra cluster comprises multiple nodes, which can be spread across various physical locations. These nodes work together to provide fault tolerance, high availability, and scalability.
Keyspaces :
In Cassandra, data is organized into keyspaces. Think of keyspaces as analogous to databases in the relational database world. Each keyspace can contain multiple tables and is defined with specific configuration settings to control factors like replication and durability.
Databases :
In Cassandra, keyspaces and databases are often used interchangeably. When we say “create a database,” we are, in fact, creating a keyspace to store our data.
Nodes :
Nodes are individual instances of Cassandra running on physical or virtual machines. They work together to store and manage data. Nodes are categorized into two types: seed nodes and non-seed nodes. Seed nodes maintain information about the cluster’s topology and help new nodes join the cluster.
Data Centers :
Data centers represent physical locations or geographical regions where nodes are deployed. Cassandra is designed to support multi-data center deployments, which enhance data redundancy, fault tolerance, and disaster recovery.
Data Modeling in Cassandra
Primary Key :
In Cassandra, every table must have a primary key. The primary key is used to uniquely identify rows within a table. It can consist of one or more columns.
Clustering Key:
When the primary key consists of multiple columns, one or more of them are designated as clustering keys. Clustering keys determine the order in which data is stored on disk within a partition.
Partition Key :
The partition key is a subset of the primary key and is used to distribute data across multiple nodes. Rows with the same partition key are stored together on the same node.
Indexing :
In Cassandra, indexing is used to allow efficient querying of columns other than the primary key. Secondary indexes are available, but they should be used judiciously as they can impact performance and storage.
Cassandra Query
Now, let’s look at some basic Cassandra queries.
Create Query :
To create a keyspace (database) in Cassandra, you can use the following query.
CREATE KEYSPACE IF NOT EXISTS mykeyspace
WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
This query creates a keyspace named ‘mykeyspace’ with a replication factor of 3.
Let’s create a keyspace called “ecommerce” and a table named “orders.
-- Creating the keyspace
CREATE KEYSPACE IF NOT EXISTS ecommerce
WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
-- Switching to the ecommerce keyspace
USE ecommerce;
-- Creating the 'orders' table
CREATE TABLE IF NOT EXISTS orders (
order_id UUID PRIMARY KEY,
user_id UUID,
product_name TEXT,
order_date TIMESTAMP,
total_amount DECIMAL
);
Insert Data :
To insert data into the “orders” table, you can use the INSERT
query. Here's an example:
INSERT INTO orders (order_id, user_id, product_name, order_date, total_amount)
VALUES (uuid(), uuid(), 'Laptop', '2023-09-13 10:00:00', 1500.00);
Select Data :
Selecting data from a table in Cassandra involves using the SELECT query. Here’s a basic example
SELECT * FROM orders WHERE order_id = uuid();
Update Data :
To update existing records in the “orders” table, you can use the UPDATE
query. For instance, to change the product name for a specific order.
UPDATE orders SET product_name = 'Tablet' WHERE order_id = uuid();
Delete Data :
To delete records from the “orders” table, you can use the DELETE
query. Here, we delete an order by its order_id.
DELETE FROM orders WHERE order_id = uuid();
Alter Query :
To alter a keyspace or table, you can use the ALTER query. Here’s an example of altering a keyspace’s replication settings:
ALTER KEYSPACE mykeyspace
WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1':2};
Drop Query :
To drop a keyspace or table, you can use the DROP query. Be cautious as this operation is irreversible and deletes all data within the keyspace or table.
//drop a keyspace
DROP KEYSPACE IF EXISTS keyspace_name;
//or drop a table
DROP TABLE keyspace_name.table_name;
The Rules for WHERE clause in Cassandra :
The ‘WHERE’ clause in Cassandra is used to filter data based on specific conditions. It’s crucial to understand some rules when using the ‘WHERE’ clause.
- Equality: The ‘WHERE’ clause is optimized for equality conditions on the partition key. It’s efficient for queries like
WHERE partition_key = 'value'
. - Secondary Indexes: When querying on non-partition key columns, you may need to use secondary indexes. Keep in mind that using too many secondary indexes can lead to performance issues.
- No Support for Complex Queries: Cassandra is not designed for complex query operations involving joins and subqueries commonly found in relational databases.
- Token Function: Advanced users can use the
token
function to handle range queries efficiently.
Secondary Index :
Suppose you frequently query orders by user_id
. In Cassandra, you can create a secondary index to improve query performance.
CREATE INDEX IF NOT EXISTS user_id_index ON orders (user_id);
After creating the index, you can perform efficient queries like.
SELECT * FROM orders WHERE user_id = uuid();
Aggregations :
You can also perform aggregations on data in Cassandra, like calculating the total order amount for a user.
SELECT user_id, SUM(total_amount) as total_spent
FROM orders
WHERE user_id = uuid()
GROUP BY user_id;
Range Queries :
Cassandra supports range queries using the >=
and <=
operators. For instance, to find orders within a date range.
SELECT * FROM orders
WHERE order_date >= '2023-09-01 00:00:00'
AND order_date <= '2023-09-30 23:59:59';
Let’s Set up a project & Create API for CURD operation using Node JS Application and Cassandra Database.
Step 1: Setup Project :
Create a new directory for your project and navigate to it in your terminal:
mkdir cassandra-express-demo
cd cassandra-express-demo
Initialize a Node.js project:
npm init -y
Install the required dependencies: express
, cassandra-driver
, uuid
, and cors
.
npm install express cassandra-driver uuid cors
Step 2: Create an Express Application :
const express = require('express');
const cassandra = require('cassandra-driver');
const { v4: uuidv4 } = require('uuid');
const cors = require('cors');
const app = express();
app.use(cors());
// Cassandra connection setup
const client = new cassandra.Client({
cloud: {
secureConnectBundle: "./demodb.zip",
},
credentials: {
username: "",
password: "",
},
});
client.connect(err => {
if (err) {
console.error('Error connecting to Cassandra:', err);
} else {
console.log('Connected to Cassandra');
}
});
app.use(express.json());// Add other CRUD operations (read, update, delete) here
const port = process.env.PORT || 3000;
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
Step 3: Add CRUD Operations :
const express = require('express');
const cassandra = require('cassandra-driver');
const { v4: uuidv4 } = require('uuid');
const cors = require('cors');
const app = express();
app.use(cors());
// Cassandra connection setup
const client = new cassandra.Client({
cloud: {
secureConnectBundle: "./demodb.zip",
},
credentials: {
username: "",
password: "",
},
});
client.connect(err => {
if (err) {
console.error('Error connecting to Cassandra:', err);
} else {
console.log('Connected to Cassandra');
}
});
app.use(express.json());
// Create operation
app.post('/create', (req, res) => {
const { name, email } = req.body;
const id = uuidv4();
const query = 'INSERT INTO demokeyspace.user (id, name, email) VALUES (?, ?, ?)';
client.execute(query, [id, name, email], { prepare: true }, (err, result) => {
if (err) {
console.error('Error creating user:', err);
res.status(500).json({ error: 'Error creating user' });
} else {
console.log('User created:', id);
res.status(201).json({ message: 'User created successfully' });
}
});
});
// Read all operation
app.get('/read', (req, res) => {
const query = 'SELECT * FROM demokeyspace.user';
client.execute(query, { prepare: true }, (err, result) => {
if (err) {
console.error('Error reading user:', err);
res.status(500).json({ error: 'Error reading user' });
} else {
let user = result.rows;
user = user ? user : [];
res.status(200).json(user);
}
});
});
// Read operation
app.get('/read/:id', (req, res) => {
const id = req.params.id;
const query = 'SELECT * FROM users WHERE id = ?';
client.execute(query, [id], { prepare: true }, (err, result) => {
if (err) {
console.error('Error reading user:', err);
res.status(500).json({ error: 'Error reading user' });
} else {
const user = result.first();
res.status(200).json(user);
}
});
});
// Update operation
app.put('/update/:id', (req, res) => {
const id = req.params.id;
const { name, email } = req.body;
const query = 'UPDATE demokeyspace.user SET name = ?, email = ? WHERE id = ?';
client.execute(query, [name, email, id], { prepare: true }, (err, result) => {
if (err) {
console.error('Error updating user:', err);
res.status(500).json({ error: 'Error updating user' });
} else {
console.log('User updated:', id);
res.status(200).json({ message: 'User updated successfully' });
}
});
});
// Delete operation
app.delete('/delete/:id', (req, res) => {
const id = req.params.id;
const query = 'DELETE FROM demokeyspace.user WHERE id = ?';
client.execute(query, [id], { prepare: true }, (err, result) => {
if (err) {
console.error('Error deleting user:', err);
res.status(500).json({ error: 'Error deleting user' });
} else {
console.log('User deleted:', id);
res.status(200).json({ message: 'User deleted successfully' });
}
});
});
const port = process.env.PORT || 3000;
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
Step 4: Start the Express Application :
Now that you have added the CRUD operations, you can start your Express application by running the following command in your project directory:
node server.js
You can find the source code here ,I have also integrated these APIs in the Angular application to perform CREATE,READ,DELETE,UPDATE operation from frontend.
Thanks for reading ❤️.