Apache Cassandra Blog

Cassandra Query Language (CQL) Tutorial

Apache Cassandra and the Cassandra Query Language (CQL) have advanced over the past couple of years. Key improvements embrace:

  1. Vital storage engine enhancements
  2. Introduction of SSTable Hooked up Secondary Index i.e SASI Indexes
  3. Materialized views
  4. Simple position based mostly authentication

This submit is an up to date to “A Practical Introduction to Cassandra Query Language”. The tutorial will consider two issues:

  1. Cassandra Query Language and its interaction with the new storage engine.
  2. Introducing the varied CQL statements by way of a sensible example.

Cassandra Query Language or CQL is a knowledge management language akin to SQL. CQL is an easy API over Cassandra’s inner storage buildings. Apache Cassandra introduced CQL in version 0.Eight.  Thrift an RPC-based API was the popular knowledge administration language prior to the introduction of CQL. CQL supplies a flatter learning curve compared to Thrift and thus its is the preferred method of interacting with Apache Cassandra. In reality, Thrift help is slated to be removed in Apache Cassandra four.0 release.

CQL has many restrictions compared to SQL. These restrictions forestall inefficient querying across a distributed database. CQL queries shouldn’t go to numerous nodes to retrieve required knowledge. This has the potential to impression cluster-wide efficiency.  Thus CQL prevents the following:

  1. No arbitrary WHERE clause – Apache Cassandra prevents arbitrary predicates in a WHERE statement. The place clauses should have columns laid out in your main key.
  2. No JOINS – You can’t be a part of knowledge from two Apache Cassandra tables.
  3. No arbitrary GROUP BY – GROUP BY can only be utilized to a partition or cluster column. Apache Cassandra 3.10 added GROUP BY help to SELECT statements.
  4. No arbitrary ORDER BY clauses – Order by can only be utilized to a clustered column.

Let’s begin off by understanding some primary concepts i.e. cluster, keyspaces, tables aka column family and first key.

  1. Apache Cassandra Cluster – A cluster is a gaggle of computer systems working together which are seen as a single system. A distributed database is a database system that is unfold across a cluster. Apache Cassandra is a distributed database unfold across a cluster of nodes. Consider Apache Cassandra Cluster as a database server spread throughout a lot of machines.
  2. Keyspaces – A keyspace is just like an RDBMS schema/database. A keyspace is a logical grouping of Apache Cassandra tables. Like a database, a keyspace has attributes that outline system-wide behaviour. Two key attributes are the replication issue and the replication strategy. The replication issue defines the variety of replicas the replication technique defines the algorithm used to find out the location of the replicas.
  3. Tables – An Apache Cassandra table is just like an RDBMS desk. Like in an RDBMS a desk is made up of a lot of rows. This is the place the similarity ends. Consider an Apache Cassandra desk as a map of sorted maps. A desk incorporates rows each of which is accessible by the partition key. The row accommodates column knowledge which are ordered by the clustering key. Visualise an Apache Cassandra table as a map of sorted maps unfold across a cluster of nodes.
  4. Main Key – A Main key uniquely identifies an Apache Cassandra row. A main key could be a simple key or a composite key. A composite secret is made up of two elements, a partition key and a cluster key. The partition key determines knowledge distribution within the cluster while the cluster key determines type order within a partition.

The diagram under helps visualise the above ideas.

Apache Cassandra Cluster, Keyspace, Desk and Main Key Overview

Notice how keyspaces and tables are unfold throughout the cluster. In abstract, Apache Cassandra cluster incorporates keyspaces. Keyspaces include tables. Tables include rows which are retrieved by way of their main key. The objective is to distribute this knowledge across a cluster of nodes.

To undergo this tutorial you need Apache Cassandra put in. You’ll be able to both use a Docker base Apache Cassandra set up or use any of the installation strategies specified by 5 methods to install Apache Cassandra. I can be using the Docker base Cassandra set up.

<img class="wp-image-317 size-full" src="https://basecdn.sgp1.cdn.digitaloceanspaces.com/31.220.61.170/uploads/2019/04/1555228513_8_Cassandra-Query-Language-CQL-Tutorial.png" alt=""

Cqlsh is a Python based mostly utility that lets you execute CQL. A cqlsh immediate may be obtained by merely executing the cqlsh utility. Within the Docker-based installation navigate to the containers command prompt. As soon as on the command prompt simply sort cqlsh. For native installs the cqlsh utility may be found in CASSANDRA_INSTALLTATION_DIR/bin. Executing the cqlsh utility will take you to the cqlsh prompt.

Navigating an Apache Cassandra Cluster

The first issues you might want to get conversant in is navigating around a cluster. The DESCRIBE command, or DESC or shorthand, is the key to navigating around the cluster. The DESCRIBE command can be utilized to listing cluster objects akin to keyspaces, tables, varieties, features, aggregates. It may also be used to output CQL commands to recreate a keyspace, desk, index, materialized view, schema, perform, combination or any CQL object. To get a full listing of DESCRIBE command choices please sort HELP DESC at the cqlsh immediate. The HELP offered is self-explanatory.

Describe command help

Describe command assist

Typically the very first thing you need to do when you’re the CQLSH prompt is to listing keyspaces. The

command supplies an inventory of all keyspace. Go forward and execute this command. Since we’ve got freshly installed Apache Cassandra all you will notice an inventory of system keyspaces. Methods keyspaces assist Apache Cassandra to maintain monitor of cluster and node related metadata.

Listed here are a couple of gotchas. There isn’t any command to record all indexes. To record indexes it’s a must to question the system keyspace. To listing all indexed you will have to execute the next command:

Equally, there isn’t a command to listing all materialized views. To get an inventory of all of the views you need to execute the next command:

CQL Create Keyspace

Let’s create our first keyspace. The keyspace known as animals. Execute the assertion under to create the animal keyspace.

Two key parameters to pay attention to is the class and the replication factor. The category defines the replication technique i.e. algorithm used to determine the location of the replicas. The replication issue determines the variety of replicas. Since I’ve three node cluster I’ve chosen a replication issue of 3.

Execute the

command once extra to see if the aminal keyspace is created. Subsequent, let’s hook up with the created keyspace with the help the of the USE command. The USE command allows switching context between keyspaces. As soon as a keyspace is chosen all subsequent instructions are executed in the context of the chosen keyspace. Please execute

CQL Create Desk

Let’s create a table referred to as monkeys within the animals keyspace. Execute the next command to create the monkey table.

Take special word of the first key. The first key outlined above is a composite key. The first key has two elements. i.e. partition key and cluster key. The primary column of the first secret is your partition key. The remaining columns are used to find out the cluster key. A composite partition key, a partition key made up of multiple columns, is defined through the use of an extra set of parentheses earlier than the clustering columns. The partition key helps distribute knowledge throughout the cluster while the cluster key determines the order of the info saved inside a row. Thus sort and household is our composite partition key and common_name is our cluster key.  When designing a table consider the partition key as a device to spread knowledge evenly across a cluster whereas the cluster key helps determine the order of that knowledge inside a partition. Your knowledge and question patterns will influence your main key. Please notice the cluster key’s optionally available.

CQL INSERT

Let’s insert some knowledge into above table. When inserting knowledge the primary secret is obligatory.

Execute a select assertion to see that the info has been inserted successfully.

You need to see the following output:


Observe columns in pink are your partition key. Cyan coloured columns are your cluster key. The columns in purple are the rest of your columns.

CQL Consistency Degree

Cassandra allows users to configure the quantity of replicas in a cluster that should acknowledge a learn or write operation before considering the operation successful. The consistency degree is a required parameter in any learn and write operation and determines the precise variety of nodes that should efficiently full the operation before considering the operation profitable.

Let’s verify the consistency degree set in our cqlsh shopper. To examine your consistency degree simply execute

You need to see

This suggests that just one node must insert knowledge efficiently for the statement to be thought-about profitable. The statement is executed on all replicas but only must be successfully written to at least one with a purpose to be successful. On every node, the insert assertion is first written to the commit log and then into a memtable i.e. a write back cache.  The memtable is flushed to disk either periodically or when a certain measurement threshold is reached. The process of flushing converts a memtable into an SSTable.

Nodetool Flush

Let’s manually flush the above insert to disk so that we will look at the created SSTable. You possibly can pressure a flush by executing the following at the command prompt.

The above command flushes all tables within the animals keyspace to disk. Navigate to the animal keyspace knowledge listing to view the outcomes. Take a look at your YAML file on your knowledge listing location. Within the Docker-based set up the directory is:

The flushed SSTable could be found in a file with the suffix -Knowledge.db. My file known as mc-1-big-Knowledge.db. We will output the contents of the file using a command line device sstabledump. The utility sstabledump outputs the contents of -Knowledge.db file as JSON. To get the JSON output simply execute

My mc-1-big-Knowledge.db has the following contents

Notice the partition object accommodates the partition keys used while the rows array accommodates the row knowledge for the partition. The rows object incorporates a clustering array that shops our cluster column related knowledge. The cells array in the row object accommodates all further column knowledge.

Let’s insert some further knowledge to see how this affects the underlying storage.

Flush memtable knowledge to disk by operating the following command:

The first thing it is best to discover is a brand new file mc-2-big-Knowledge.db. Notice mc-1-big-Knowledge.db has not been updated to as SSTables are immutable.

Please run the next command to output JSON output for mc-2-big-Knowledge.db.

You must get the next output.

Notice we now have knowledge in two partition keys. Observe that in the second partition row knowledge is sorted by the cluster key. Discover that the second insert didn’t error out even thought that knowledge already existed. It’s because inserts in Cassandra are literally upserts. An upsert inserts the row if it does not exist otherwise updates the prevailing knowledge.

CQL DELETE

Let’s delete some knowledge. Please execute the following delete command:

Again run

In your knowledge directory, it is best to see a brand new file mc-Three-big-Knowledge.db.

Please run the next command to transform mc-3-big-Knowledge.db into JSON.

You need to see the next output:

The first thing to notice is that deletion does not end in any actual knowledge being bodily deleted. It’s because SSTables are immutable and are by no means modified. In truth, the delete assertion prompted an insertion of a special worth referred to as a tombstone. A tombstone data deletion related info. There are numerous sorts of tombstones. The delete assertion above brought on the insertion of a row tombstone.

CQL UPDATE

Like inserts updates in Cassandra are an upsert. The next is an example of an replace assertion.

Despite the fact that the primary key specified within the replace statement doesn’t exist knowledge will still be inserted.  That is because of the upsert semantics of a CQL update statement. Within the case, the first key already exists the appropriate values will probably be up to date. Be happy to flush the animal keyspace and think about the info file created.

CQL Time To Stay aka TTL

A compelling function in Cassandra is the power to run out knowledge. This function is particularly useful when dealing with time collection knowledge. The TTL function lets you expire columns after a set variety of seconds. Under is an insert instance with TTL

Utilizing the TTL perform you’ll be able to query the number of seconds left for the column to expire. Please word a TTL is simply assigned to non-primary key columns.

Use the nodetool flush command to verify knowledge saved to disk. You must see something just like the following JSON.

Take a look at the liveness_info object. It has an addition of ttl and expires_at component as opposed to previous knowledge file outputs on this tutorial. The expires_at particular the exact UTC time when the columns will expire. The ttl component specifies the TTL worth passed in on insert.

When a TTL is current on all colums of a row then on the expiration of the TTL the whole row is omitted from choose assertion outcomes. When the TTL is just current on specific columns then only that exact columns knowledge is omitted from the outcomes.

CQL supports an array of knowledge varieties which incorporates character, numeric, collections, and user-defined varieties. The next desk outlines the supported knowledge varieties.

Category Knowledge Sort Description Example
Numeric knowledge sort tinyint Eight bit signed integer. Values can vary from −128 to 127. 3
smallint 16 bit signed integer. Values can range from −32,768 to 32,767 20000
int 32-bit signed int. Values can vary from −2,147,483,648 to 2,147,483,647, from 3234444
bigint 64-bit signed lengthy. Values can vary from −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 89734998789778979
varint Arbitrary-precision integer 4
float 32-bit IEEE-754 floating point 10.Three

10787878.87,
,
‘Example Text – नमस्ते’,
‘Example Ascii’,
‘2017-02-03′,
’06:10:24.123456789’,
‘2017-02-03T04:05:00.000+1300’,
now(),
uuid(),
true,
textAsBlob(‘Test Blob String’),
[30, 2, 65],
‘monkey’, ‘gorilla’ ,
1 : ‘monkey’, 2 : ‘gorilla’
);

double 64-bit IEEE-754 floting level 8937489374893.2
decimal Variable-precision decimal
Textual knowledge varieties varchar UTF-Eight encoded string. Example Textual content – नमस्ते
text UTF-8 encoded string. Example Text – नमस्ते
ascii ASCII character string. Instance Ascii
Date Time knowledge varieties date A simple knowledge. Doesn’t have a time element. 2017-02-03
period Defines a time interval. Period might be specified in three totally different codecs.

  1. Quantity Unit  -[n]y[n]mo[n]w[n]d[n]h[n]m[n]s[n]ms[n]us[n]ns
    • y: years
    • mo: months
    • w: weeks
    • d: days
    • h: hours
    • m: minutes
    • s: seconds
    • ms: milliseconds
    • us or µs : micorseconds
    • ns: nanoseconds
  2. ISO 8601 format: P[n]Y[n]M[n]DT[n]H[n]M[n]S or P[n]W (P2Y3M11DT6H)
    • P signifies the interval (required)
    • nY the variety of years
    • nM the number of months
    • nD the number of days
    • T the start of a time part (required for specifying hours, minutes, or seconds)
    • nH the variety of hours
    • nM the variety of minutes
    • nS the variety of seconds
  3. ISO 8601 various format: P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]
1y5mo89h4m48s
time Simply time. There isn’t a corresponding date worth 06:10:24.123456789
timestamp Date and time as much as millisecond precision. 2017-02-03T04:05:00.000+1300
Identifier knowledge varieties timeuuid A time based mostly universal unique identifier. A good way of producing conflict free timestamps. c038f560-3e7a-11e7-b1eb-bb490fc4450d
uuid Common distinctive identifier 0771a973-4e23-462d-be70-57b97b1d2d39
Boolean knowledge varieties boolean true/false true
Binary knowledge varieties blob Arbitrary bytes 5465737420426c6f6220537472696e67
Distributed Counter counter 64-bit signed integer that may solely be incremented or decrimented.
Collections Listing Sorted assortment of non-unique values [30, 2, 65]
Set Sorted collection of distinctive values ‘monkey’, ‘gorilla’
Map Key value pair 1 : ‘monkey’, 2 : ‘gorilla’

Let’s create a desk which has all of the above knowledge varieties and insert some knowledge into the desk.

Be happy to flush the desk to disk and observe the JSON knowledge.

Please word if you choose col20 in cqlsh you will notice the following error:

Did not format value ‘”x00xfex02GSxfcxa5xc0x00’ : ‘ascii’ codec can’t decode byte 0xfe in position 2: ordinal not in vary(128)

This can be a cqlsh bug in Cassandra Three.10. Please check with CASSANDRA-13549 for additional particulars.

CQL Counters

A counter is a 64-bit signed integer whose worth can solely be incremented or decremented by way of an UPDATE assertion.

Maintain the following in mind when using a counter:

  1. A counter column cannot be a part of a main key.
  2. A desk that accommodates a counter can solely include counter columns. Columns with some other knowledge sort will not be permitted.
  3. You can’t use a TTL with a counter column.
  4. Counter aren’t idempotent. This is particularly tough when handling misguided conditions.