Simple cassandra cluster deployment with fabric and azure cli.
05 May 2016 – Karelia
In our project, deployed to azure, we used single node cassandra cluster during test phase. It was deployed manually as described in some blog posts. But after some time we decided to move to a real single datacenter cluster with 3 nodes for more realistic tests and use version 3 with secondary indexes and materialized views. We already have scripts for farm creation from vm images, but unfortunately cassandra config is a little bit harder and some time ago new “Resource management” deployment mode for azure was introduced. After some investigation I found existing, ready to use, templates for cassandra, unfortunately some of them are broken and some of them uses Datastax Enterprise version, but we want to use open source version. And from my point of view all that templates are too hard to write and understand (if you don’t have a time) and this time we decided not to use vm images at all, because in case of changes you need to maintain them. So we decided to write some simple scripts which will use azure cli and Fabric for deployment.
First of all you need to install azure cli and fabric.
First thing to do is to login into your azure account and select subscription and set resource mode.
if you don’t see your subscription then you need to execute
So everything is ready and we can start. First step is to create a resource group. In our case we will use “West Europe” region.
We will put all vms into subnet of a virtual network.
Next step is optional for you and do it only in case if want to access your nodes from internet. We will create a three (number of nodes) public ips in our group.
Now it is time for network interfaces. We will create them in our network and bind them to public ips.
Our nics are hidden by azure’s firewall so we need to configure it and add allow rules for ssh and cassandra ports.
Main thing is to create vms and attach data disks to them. Every vm will use a separate nic, a ssh key file and ubuntu 15 disk image. How to find a correct image URN for vm described here.
Our vms are ready to use, but we need to use ssh to install all software and configure it.
We will use fabric automation and to use it we have to create fabfile.py. Our tasks from fabfile need to know internal and external ips of created vms. Thanks to azure cli json output it is easy to do.
Fabfile.py is a python script. And we need to install pyyaml lib for cassandra’s config file edit.
Now we need to import all needed libs and load info from json files
Fabric uses roledefs as an identity of a group of vms for a task execution, we need two: one for all cluster nodes and second one for only one node of a cluster for cqlsh commands. Also we need to specify ssh user and key file.
We are ready to a first tasks: java, jna installation and java version check.
Next step is pyyaml and pip installation.
We have attached disk to every vm and we have to partition it, format and mount.
install cassandra
For cassandra configuration we have to write a simple script which will allow us to read and write cassandra settings in /etc/cassandra/cassandra.yaml
Let’s name it yaml_option.py
Now in fabfile.py we will add tasks for this script.
Just some helper tasks for cassandra
And our cassandra configuration.
Also after cassandra config we have to setup keyspace, schema, authorization.
As you can see for cqlsh we use rc file cqlshrc.
And our schema.cql is just a cql file with a keyspace and tables creation.
And final deploy task.
now we could deploy everything from a command line.
Now our cluster is ready to use. You could use public ips as the cluster address.