Posted on

MongoDB Certified DBA Associate Exam

MongoDB Inc. Most of the questions were use cases and right indexes for the use case. Sharding
Sharding is one of the features in which MongoDB has a lot to offer. Aggregation Framework
Aggregation framework itself being an advanced concept in MongoDB, the questions were quite comprehensive. It was a single use case for 3 subsequent questions. There was lack of certification in the MongoDB ecosystem and after having worked with MongoDB for few years now, the news about MongoDB certification got me excited. Questions would be scenario based, where given the output of particular command, one would have to infer details about the health of the database.

I would be posting few sample questions for each section soon. The questions were mainly of multiple choice with single and multiple correct answers. System requirements are:

1. Philosophy & Features
This section is to test the basic understanding of NoSQL and MongoDB concepts. Questions included philosophy of sharding, when to shard a collection, configuration of shards, processes involved in a sharded cluster and role of balancer. Server Administration
This section was the most difficult section in terms of specificity. Indexing
Questions here would test the in-depth knowledge of indexes. Replication
This section had questions about availability concepts of mongodb and replication. MongoDB website has very sparse information, so I am hoping information here will help fellow exam takers.

Why get MongoDB certified now?
According to MongoDB – “Certification helps you establish technical credibility and facility with MongoDB and contributes to your organization’s proficiency in running applications on the platform.”

MongoDB is growing to be one of the preferred NoSQL databases in the market. Main focus would be on updating partial documents, which would need knowledge of update operators on MongoDB. From this section, more emphasis was on comprehending the importance of correct indexes for a given scenario rather than the syntax for creating indexes. Windows or Mac OS – Currently Linux is not supported by Software Secure.

2. For now, MongoDB has released Associate level exam for both the certifications and has the other levels on the road map. Complete knowledge of profiler, collection stats, explaining a query were the main focus. this section should not be that difficult to answer.

2. Most of the questions would definitely need some hands-on experience with MongoDB.

The questions were quite comprehensive and were divided into sections with each section having around 7 to 10 questions. If you have good understanding of NoSQL concepts, difference between RDBMS and NoSQL, difference between Document Store vs. An in depth understanding of data migration between shards in a sharded cluster would be crucial to answer all the questions of the section.

7. There would be a practice exam to guide you through the login process and give you an idea about the type of questions.

There would be an initial system check for the above. MongoDB would calculate the results based on the difficulty level of the questions for each test taker. This section deals with various CRUD operations on MongoDB. has recently released certification program. Here, I would be talking in depth about the DBA certification and will discuss the developer certification in the future blog.

What does DBA Certification consist of?
According to MongoDB Inc., “A MongoDB database administrator has in-depth knowledge of run-time configuration, processes, scaling, backup and recovery, monitoring, and performance tuning for production MongoDB instances.”

My views about the exam is that it is designed to test the hands-on knowledge on MongoDB setup, administration and monitoring skills of the test takers. There are no pre-requisites to register for the exam and also registration is free. But working with mongodb journal would be a must as there were questions specific to the location and its importance.

8. CRUD Operations
CRUD stands for Create, Read, Update and Delete. Its flexibility in terms of the supported amount of data and ease of horizontal scalability and administration, both on-premise and cloud, is making corporates opt for MongoDB as the preferred next generation database.

With growing users of mongodb, this certificate would definitely make you stand out from the crowd. I had many questions about the exam until the last minute. MongoDB replica sets, priorities of nodes in replicaset, primary elections, arbiter’s role in election and respective configuration were the main focus.

6. The questions deal with replica set configuration techniques and best practices. Most of them were direct questions about JSON structure, collections and documents, fundamentals of replication and sharding. On successful completion, you would be receiving a badge like the one below. On successful system configuration check, you would be prompted to pay $150 for the certification and take up the exam.

Exam Results
The result for the exam would be available 2-3 weeks following the close of the exam period. There were also questions where one has to type in the answer in the simulated MongoDB prompt.

The test lasts 90 mins, and can have variable number of questions. Though the section is quite straightforward, with options having a slight difference in the syntax, it turns out to be quite difficult to get them right without hands-on experience.

3. Working Microphone

4. There is no negative marking, and each question is weighted equally.

The sections covered are as follows:

1. So, I was not surprised to see the number of questions being more than other sections. Good Internet Bandwidth

You can register for the exam at https://education.mongodb.com/. Working webcam – You will be required to take clear a picture of a photo ID, your self and quick scan of the surroundings.

3. You will have to find a quiet location where you would not be disturbed during the span of the exam. I recently appeared for the MongoDB Certified DBA Associate Exam and am going to share a few details about the exam and my experience. Understanding of the admin database would provide a kick start for this section as most questions where about user administration. Watch out for updates to the blog.

How to register for the Exam?
The information on MongoDB website about how to register and take the exam was really confusing. is currently offering certifications

C100DBA: MongoDB Certified DBA Associate
C100DEV: MongoDB Certified Developer Associate
For both the certifications, MongoDB has 3 levels – Associate, Professional & Master. Emphasis on compound indexes can be expected. Hence, it is time to leverage the opportunity and become one of the few professionals certified for MongoDB.

Certifications offered by MongoDB
MongoDB Inc. General experience in database and query performance tuning would be helpful for the section.

4. Clear understanding of various reducers like $match, $project, $group in aggregation framework and the syntax would be a necessity to answer the questions.

5. Key Value Store, MongoDB etc. You will be able to take the exam anytime during that window.

The exam is a web-proctored exam conducted by Software Secure Inc. A question about framing an aggregation pipeline for a given scenario stands out. The first step to give the certification exam is to register for a one week long test window on the website. Application Administration
The questions here were about journaling, authentication and authorization on MongoDB. As the certification is available for a few weeks in a year, the number of certified DBAs would also be accordingly less. A clear understanding of outputs from various administration commands like mongostat is expected. I have been working with production MongoDB sharded and replicated clusters on AWS for few years now, which helped me to take up the certification confidently.

Posted on

Reasons why DynamoDB is better than MongoDB

Which option will you choose?

Reason 4: Have you checked out the DynamoDB features lately?
DynamoDB, in the classical AWS style, was released with just bare-bone features. Add 3 config servers to it, and you are looking at a 12 node cluster. At 4 AM the conversation between a systems engineer and me is like following:

Engineer: Hey, got woken up by the pager, seems like CPU utilization is spiking, but requests are running fine. Many issues and bugs are only manifested when you test on the real deal. You still need monitoring for performance issues, but the things that need to be monitored are really few in numbers compared to MongoDB.

Reason 2: People don’t like to spend money on hardware (if they don’t have to)
I was in a meeting with a customer trying to understand their requirements to design the MongoDB cluster for them. Compare that to the two big upgrades DynamoDB did in last 12 months. They had about 1TB of data, and at peak they did about 1000 reads/second and 50 writes/second. Thus, if you looked at DynamoDB a year ago, and found it severely limiting in features, its time to look again. Can I just resolve this issue and look at it tomorrow?

Me: You woke me up to just ask this?

Why will you want your staff (or yourself) to have to go through this kind of conversation anytime during the day, let alone 4 in the morning? With DynamoDB, AWS engineers take care of such issues, not you. The phrase “But it worked in test cluster”, can be safely rejected.

If you are evaluating NoSQL databases, I suggest you give DynamoDB a sincere try. So either you need to fork out more money and get a test cluster that looks like production cluster in size and number of servers, or be ready for such bugs to slip though.

DynamoDB allows you to create the same cluster as prod but with lower throughput. The cost was about $3000 per month (this was before the recent price reductions).

If the DynamoDB was used, the starting capacity would have cost around $500, which is a huge reduction in cost. And as always, no upgrade is really as simple as the documentation states. You get calls from all your clients asking for when you can upgrade their existing MongoDB clusters to this latest and greatest version. Below I give five reasons to choose DynamoDB over MongoDB.

Reason 1: People don’t like being woken up in the middle of the night
One sure-shot way to motivate someone to rethink their priorities in life, and reconsider their choice in becoming an IT professional, is to hand them pager-duty for a MongoDB cluster. Thanks!!. Also, unlike MongoDB, one does not need to preprovision a lot of capacity, as DynamoDB allows rapid scaling up (and down), so unlike MongoDB, we would not have to preprovision a lot of extra capacity (and thus cost).

Also in case you get official MongoDB support, it costs extra and costs are relatively high, based on the number of hosts. If you have further questions about DynamoDB vs MongoDB, feel free to send me a comment below or at my email bhavesh at cloudthat.in.

If you liked the article please share it. I looked around but found nothing. In the MongoDB vs DynamoDB matchup, DynamoDB has a lot of brilliant features that help ease the pain of running NoSQL clusters. And as a seasoned professional you don’t want to make any changes to production during business hours, you know things can go wrong. Maintaining a MongoDB cluster requires keeping the servers up and running, keeping the MongoDB processes up and running, and performance monitoring for the cluster. Access controls to the tables and rows can now be controlled via IAM accounts which opens up so many usecases that are not possible in MongoDB.

The latest set of features improve querying by allowing to set filters. Check this image for example (time there are in UTC).

Screen Shot 2014-04-29 at 12.11.09 pm

In the middle of the night, a client’s MongoDB Cluster generated few automated CloudWatch alarms. This is also true for MongoDB, where for big production clusters with many shards and replicasets, most likely the QA cluster is just one machine. Thus even with lesser costs, you can test the real deal. You could not even have arbitrary indexing, the only index was the primary key. But since then DynamoDB has evolved rapidly with new feature like Local Secondary Indexes and recently Global Secondary Indexes that allow arbitrary keys to be indexed. With DynamoDB the AWS support that works for all other AWS services works for DynamoDB without having to buy additional support, which in comparison usually ends up being much lower.

Reason 3: Updating MongoDB version on a production system is not a good way to spend a Sunday
MongoDB releases version 2.6, and calls it their “best version yet”. They also wanted to avoid any downtime if and when they had to scale up, so preferred we pre-provisioned the capacity and started with capacity they would need at the end of the year when they had grown 3x. Your team will thank you when they can enjoy their sleep and weekends. Thus, you sacrifice your precious weekends to the altar of MongoDB upgrades. It was all seamless, and I was just notified that new features and better performance is available without me doing any real work. It was not that much better than a glorified hosted Memcache cluster. We designed a MongoDB cluster with three shards, and each shard having three replica-set thus 9 machines. If you are considering MongoDB or any other NoSQL databases, its a must that you consider DynamoDB. The reason why you decided to not use DynamoDB then, might have been fixed now.

Reason 5: Finally QA can test the real deal.
You have heard this before from QA personals, “But it worked in the Test cluster”. They were growing like crazy and expecting 3 times traffic growth in next one year.

Posted on

Data Warehouse as a Service powered by Azure

If 1000 queries are thrown, then there is a slight chance that 20 queries might fail to execute ( SQL Data Warehouse will be in preview, it will grow in the near future and so will the reliability ).

How secure SQL Data Warehouse?
As we know security is a two way process, the users who is using SQL Data Warehouse should secure his/her laptop. DWaaS is the first enterprise class cloud Data Warehouse, which can grow or shrink. The job of Compute Node is to serve as power for the service and underneath data it is loaded in SQL Data warehouse, which is distributed across the node of service.

Storage: The storage media used for SQL Data Warehouse Blobs. For security reason Azure is providing some measures like:

Connection Security: It’s one of the measures where we can set firewall rules and connection encryption. Before starting any business, you might think how to equip your data, how to maintain and manipulate your data (as it will be there in abundance). At present SQL Data Warehouse support SQL Authentication with username and password.

Authorization: It refers to what we can do with SQL Data Warehouse database, which will allow user to do anything in database. You might go and ask some IT expert what to do, how to do and a bunch of other questions. The IT guy may suggest you to build your own Data Warehouse, to which you will ask if it is cost effective or not, what’s the upfront cost, what will be the maintenance charges, etc.

For this reason Azure has comes up with a solution called SQL Data Warehouse as a Service (DWaaS). The best practice will be to limit the access to the user.

Encryption: Azure SQL Data Warehouse provides “Transparent Data Encryption” to secure our data when it is at rest or stored in database files and backups. Suppose if the traffic is high in day time, you can add any number of machines you want and when the traffic is low, you can remove number of machines.

Reason 3: The reliability of SQL Data Warehouse is estimated to be 98% i.e. It offers full SQL server experience in cloud, which customers expect. Firewall rules will be applied for both server and database. The best part for this is, when user will interact with data, it will directly fetch from Blobs ( The blob storage is one of the best storage options in Azure when the data is enormous amount ).

Difference between on-premises Data Warehouse and Cloud Data Warehouse as a Service
On-Premises Data Warehouse
Cloud Data Warehouse as a Service
Reliability
Reliable, but not much More reliable
Scalability
Not scalable More scalable
Speed
Faster, but may fail in any point of time (in care of hardware failure) Faster and will up always (SLA of 98% is provided)
Deployment
It will take time Within no time it will deploy
Cost effectiveness
Lots of capital required to setup on-premises data warehouse as a service You have to pay what you use

Why should we go for SQL Data Warehouse as a Service in Azure?
Reason 1: It can handle and scale petabytes of data and is highly reliable for all data warehouse operations.

Reason 2: You can scale up and scale down the services as per your requirement. DWaaS is one amazing solution for the organizations, which are just starting or in a process to start. If the password is key logged by Man in the Middle attack, then the cloud provider will not be able to do anything. Until and unless any IP, which is whitelisted can’t enter into database. We can also set server-level firewall using PowerShell (this can be done in Azure Classic Portal).

Authentication: It refers to how you prove your identity when user enters and getting connected to database. Here TDE provides file level encryption.

Stay tuned for more blogs on Azure and if you have any queries or comments please feel free to post. The organization should not worry about spending the upfront cost and maintaining the hardware or software resources they buy.

Architecture of SQL Data Warehouse
For users, it’s like sending data to a database, but underneath SQL Data Warehouse runs “Massive Parallel Processing (MPP) Engine”, which helps in dividing the query send by user to Control Node.

Control Node: When a command is passed to Control Node, it breaks down the query for faster computing into set of pieces and passes on to other nodes of the service.

Compute Node: Like Control Node, Compute Node is powered by SQL Databases.

Posted on

Performing Python Data Science on Azure

The web app offers support for Python, Julia and Ruby.

jupyter
Figure 2: Jupyter is an open source web based tool for interactive Data Science and Machine Learning

Each Jupyter Notebook consists of a back-end kernel known as IPython. Additional packages can also be installed through the Jupyter front-end interface with the use of magic commands. However, over time I realized that IDEs save a lot of time and its more about efficiency rather than practice. Compile and run

When I started performing Data Science and Machine Learning, this process proved too cumbersome. Azure Notebooks are Jupyter Notebooks that are hosted on the Cloud using Azure virtual machines.

Azure Notebook
Figure 4: Azure Notebooks are Jupyter Notebooks that are hosted on Azure

Access and use of Azure Notebooks is completely free and you need not possess an Azure account to access these resources. As such, a custom kernel can be created based on one’s requirements.

During the initial stages of my foray into Data Science, I realized that I needed separate environments for different tasks. I could set up TensorFlow and all its dependencies in one kernel and data visualization in another.

IPython
Figure 3: IPython forms the backbone of a Python Notebook

One of the biggest drawbacks of using Jupyter Notebooks for Python development or Data Science is that it is quite resource intensive. However, this drawback can be overcome with the help of Azure Notebooks. Jupyter Notebook is an open-source web app for interactive coding. Apart from the cost benefit, Azure Notebooks come pre-installed with a variety of different packages that aid in Data Science. Microsoft has also loaded Azure Notebooks with a plethora of study and practice material in form of notebooks that enables users to approach Data Science and programming in an interactive manner. I have always been a proponent of text editors over IDEs. The availability of multiple kernels was a boon. I found myself constantly going through the sample notebooks to find new and interesting ways to use Jupyter and Azure.

Jupyter Notebooks can also be imported from fellow Azure Notebook users or from GitHub ensuring that one has the fastest possible method to pull resources. The kernel analyses and runs your code. Let us know in the comments section.

To know about our service offerings, kindly visit www.cloudthat.in (for training services) and www.cloudthat.com (for consulting services). Although this system has proved to be reliable, modern applications of programming possess alternate requirements. The Azure Notebooks service is deployed when one attempts to visualize data or make Python code specific changes in Azure Machine Learning Studio. This enables Data Scientists to work dynamically with data and numbers. One such tool for Python is Jupyter. For example, when I needed to visualize a dataset in Python, I would have to save the graph locally on my system, close the program and view the graph. The packages that comes installed include the entire Anaconda stack as well as numerous Microsoft specific modules such as CNTK and Azure ARM.

Azure Notebook
Figure 5: Azure notebooks come with a wide range of packages pre-installed

Another major benefit of using Azure Notebooks is that the tool does not exclusively feature Python. Open the terminal
2. Veterans in the interactive Data Science sphere will feel right at home on Azure Notebooks.

What do you use for Data Science? We would love to hear about the new technologies or tools that you use. It also has support for R and F#. This is partly since I originally started coding with text editors and text editors ensured that I did not constantly look for auto-correct to help me. In the past, programming has followed a general cycle involving write, compile, check and execute. Soon, I realized that this did not align with my philosophy of efficiency.

Figure 1: Conventional programming techniques employed text editors such as Notepad++ or IDEs such as PyCharm

When it comes to Data Science, real-time interaction with the data is paramount. With Jupyter, multiple kernels are supported. The terminal for the machine hosting the Jupyter Notebook can also be accessed.

Azure Notebook Library
Figure 6: “Libraries” can be easily shared via GitHub or within Azure Notebooks

Posted on

Shrinking An Amazon EBS Volume

Now copy your data recursively from your source volume to your newly created volume using the following command(Assuming your old volume is mounted at /mnt/ebs)

sudo cp -r –preserve=all /mnt/ebs/. Amazon’s Elastic Block Store Volumes are easy to use and expand but notoriously hard to shrink once their size has grown. Here are the steps for Shrinking any mounted EBS volume on EC2 Instances.

For various reasons you may need to expand or shrink the size of your EBS volume. /mnt/ebs1 –verbose
Copying data recursively preserves the file attributes such as mode,ownership,timestamps and security contexts, if possible additional attributes such as links.

5. /mnt/ebs1 –verbose
sudo cp -r –preserve=all /mnt/ebs/. After copying the data detach your old volume and mount your new volume at the same mount point using the following command

sudo umount -l /mnt/ebs
sudo umount -l /mnt/ebs

sudo mount /dev/sdi /mnt/ebs
sudo mount /dev/sdi /mnt/ebs
Automating the process by Shell Script
In order to automate all this process here is the Script which will automate the process

Download the zip from this page https://github.com/cloudthat/ebs-shrink

Extract the zip and navigate to the extracted folder

Step 1: Place the script on the instance with permissions to execute

scp -i /ebsreducescript @:/home/ec2-user/.
scp -i /ebsreducescript @:/home/ec2-user/.
Now SSH into the instance using the following command and Update file permissions

ssh -i ec2-user@ip-address
ssh -i ec2-user@ip-address

sudo chmod 500 ebsreducescript
sudo chmod 500 ebsreducescript
Step 2: Configure AWS Cli

Also before executing the script you need to configure awscli use the following commands to install

sudo yum install python-pip
sudo yum install python-pip

sudo pip install awscli
sudo pip install awscli
Run aws configure at the command line to set up your credentials and settings.

$ aws configure
AWS Access Key ID [None]: ANKITIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: ANKITXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-1
Default output format [None]: json

$ aws configure
AWS Access Key ID [None]: ANKITIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: ANKITXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-1
Default output format [None]: json
This Script Will Create a New Volume of User Defined,Mount it to /mnt/ebs1 Copy all the data from the attached EBS Volume recursively that need to be Migrated preserving all the user permissions,then Unmount it and Mount the Newly Created EBS Volume at the Same Mount Point

Note:The migrated volume is not deleted it is just unmounted So if any error occurs we can reattach it. Because you are only charged for the space currently allocated to your EBS volumes, it is also cost efficient to allocate only your approximate short term need and expand the volume as the need arises (this should be considered during the planning phase of your projects and/or during formulation of growth strategies for your service).

The below steps have been tested on an Amazon Linux instance to resize an EBS Volume using the AWS console

Create a new EBS Voume of your desired size from the console and attach it to your instance
Now login to your instance using ssh and format your new attached volume using the following command (Assuming that new volume is attached at /dev/sdi)

ssh -i ec2-user@ip-address
ssh -i ec2-user@ip-address

sudo mkfs -t ext4 /dev/sdi
sudo mkfs -t ext4 /dev/sdi
3.Then make a directory at /mnt/ebs1 and mount the new volume using following commands

sudo mkdir /mnt/ebs1
sudo mkdir /mnt/ebs1

sudo mount /dev/sdi /mnt/ebs1
sudo mount /dev/sdi /mnt/ebs1
4. This Script can only use for Attached(Mounted) EBS Volumes not Root Volume.

Posted on

Cassandra Multi-AZ Data Replication

The key value which we need to define in the config file in this context is called Snitch. We shall check the status of the cluster using this command as shown below: cass The owns field above indicates the percentage of data owned by each node. The owns field indicates the percentage of data owned by the node.

Let us perform some tests to make sure the data was replicated intact across multiple Availability Zones.

Test 1:

Node 1 was stopped.
Connection was made to the Cluster on remaining nodes and records were read from the table user.
All records were intact.
Node 1 was started.
On Node 1, ‘nodetool repair -h hostname_of_Node1 repair first’ was run.
Connection was made to the Cluster on Node 1 and records were read from the table user.
All records were intact.
Test 2:

Node 1 and Node 2 were stopped. We can control how nodes are configured within a cluster, including inter-node communication, data partitioning and replica placement etc., in this config file. Using CQLSH, you can execute queries using Cassandra Query Language (CQL).

Next, we shall create a table user with 5 records for tests.

CREATE TABLE user(user_id text,login text,region text,PRIMARY KEY (user_id));
1
CREATE TABLE user(user_id text,login text,region text,PRIMARY KEY (user_id));
Now, let us insert some queries in this table:

insert into user (user_id,login,region) values (‘1′,’test.1,’IN’);
insert into user (user_id,login,region) values (‘2′,’test.2′,’IN’);
insert into user (user_id,login,region) values (‘3′,’test.3′,’IN’);
insert into user (user_id,login,region) values (‘4′,’test.4′,’IN’);
insert into user (user_id,login,region) values (‘5′,’test.5′,’IN’);

insert into user (user_id,login,region) values (‘1′,’test.1,’IN’);
insert into user (user_id,login,region) values (‘2′,’test.2′,’IN’);
insert into user (user_id,login,region) values (‘3′,’test.3′,’IN’);
insert into user (user_id,login,region) values (‘4′,’test.4′,’IN’);
insert into user (user_id,login,region) values (‘5′,’test.5′,’IN’);

cqlsh> select * from user;
1
cqlsh> select * from user;
query

Now that our keyspace/database consists of data, let us check for ownership & effectiveness:

12 (1) As we can see here, the owns field above is NOT nil after defining the keyspace. So, let us go ahead and create a sample keyspace. Apache Cassandra is an open source non-relational/NOSQL database. Basically, a snitch indicates as to which Region and Availability zones does each node in the cluster belongs to. Cassandra nodes use seeds for finding each other and learning the topology of the ring. But, in this case, we shall use EC2Snitch as all of our nodes in the cluster are within a single region.

We shall set the snitch value as shown below: snitch

Also, since we are using multiple nodes, we need to group our nodes. NetworkTopologyStrategy places replicas on distinct racks/AZs as sometimes, nodes in the same rack/AZ might usually fail at the same time due to power, cooling or network issues.

Let us set the replication factor to 3 for our “first” keyspace:

CREATE KEYSPACE “first” WITH REPLICATION ={‘class’ :’NetworkTopologyStrategy’, ‘us-east’ : 3};
1
CREATE KEYSPACE “first” WITH REPLICATION ={‘class’ :’NetworkTopologyStrategy’, ‘us-east’ : 3};
The above CQL command creates a database/keyspace ‘first’ with class as NetworkTopologyStrategy and 3 replicas in us-east (In this case, one replica in AZ/rack 1a, one replica in rack AZ/1b and one replica in rack AZ/1c). The nodetool utility is a command line interface for managing a cluster. It gives information about the network topology so as to the requests are routed efficiently. We shall create a keyspace with data replication strategy & replication factor. We will also learn how to ensure that the data remains intact even when an entire AZ goes down.

The initial setup consists of a Cassandra cluster with 6 nodes with 2 nodes (EC2s) spread across AZ-1a , 2 in AZ-1b and 2 in AZ-1c.

Initial Setup:
Cassandra Cluster with six nodes.

AZ-1a: us-east-1a: Node 1, Node 2
AZ-1b: us-east-1b: Node 3, Node 4
AZ-1c: us-east-1c: Node 5, Node 6
Next, we have to make changes in the Cassandra configuration file. This strategy will also help in case of disaster recovery.

Stay tuned for more blogs!!. Cassandra uses a command prompt called Cassandra Query Language Shell, also known as CQLSH, which acts as an interface for users to communicate with it. We shall use NetworkTopology replication strategy since we have our cluster deployed across multiple availability zones. Additionally, Cassandra has replication strategies which place the replicas based on the information provided by the snitch. cassandra.yaml file is the main configuration file for Cassandra. It is massively scalable and is designed to handle large amounts of data across multiple servers (Here, we shall use Amazon EC2 instances), providing high availability. And, the total number of replicas across the cluster is known as replication factor. It is used during startup to discover the cluster.

Cassandra nodes use this list of hosts to find each other and learn the topology of the ring. There are different types of snitches available. Hence, from the above tests, it is quite clear and is recommended to use 6 node cassandra cluster spread across three availability zones and with minimum replication factor of 3 (1 replica in all the 3 AZs) to make cassandra fault tolerant from one whole Availability Zone going down. In this blog, we shall replicate data across nodes running in multiple Availability Zones (AZs) to ensure reliability and fault tolerance. As we can see, the owns field above is nil as there are no keyspaces/databases created. Replication strategy indicates the nodes where replicas are placed. (Scenario wherein an entire AZ i.e; us-east-1a would go down)
Connection was made to the Cluster on remaining nodes in the other AZs (us-east-1b, us-east-1c) and records were read from the table user.
All records were intact.
Node 1 and Node 2 were started.
‘nodetool repair -h hostname_of_Node1 repair first’ was run on Node 1
‘nodetool repair -h hostname_of_Node2 repair first’ was run on Node 2
Connection was made to the Cluster on Node 1 and Node 2 and records were read from the table user.
All records were intact.
Similar tests were done by shutting down nodes in us-east-1b & us-east-1c AZs to check if the records were intact even when an entire Availability Zone goes down. We shall do so, by defining seeds key in the configuration file (Cassandra.yaml) .

Posted on

Integrating AWS API Gateway, Lambda and DynamoDB

On the whole AWS API Gateway is a beneficial package for the developer. These dedicated servers are explicitly set to handle the API calls for an application. Install the one specific to your Browser. It also has a useful testing platform which can be used to test the calls. It also provides caching and monitoring.

In my next blog I have talked about securing your API’s in API gateway, passing Query string & headers, transforming response type and returning the custom error code.The hands on is included. The increase in the number of API calls, increases the load of the API server which may require auto-scaling, which is cost-consuming.

1

The latest approach of the best architects is to utilize a new AWS service that explicitly replaces the need for a dedicated API Server. Finally a REST Client will be used to call the API.

Step 1: Create a DynamoDB table:
3
Create a DynamoDB table named Employee with the attributes of primary key as emp_id. Use code from the following location, https://s3-ap-southeast-1.amazonaws.com/cloudthatcode/addEmployee.zip. Use Lambda as the integration type and select the Lambda function you created earlier.

Test the API with the following values from the console.
{

“emp_id” : “2”

“emp_Name” : “Snow”

“emp_Salary” : “200000”

}

{

“emp_id” : “2”

“emp_Name” : “Snow”

“emp_Salary” : “200000”

}

You should see the entries in the DynamoDB table.

For creating an API we need to create Resource, create POST method under the resource and utilize the Lambda function as integration type and the Deployment of API is up for launch.

For Detailed Instructions..

Step 4: Call the API using REST Client

Test the API setup. The next part of the blog is a detailed tutorial on how to use AWS API Gateway along with AWS Lambda & DynamoDB.

People who are familiar with DynamoDB, API Gateway and Lambda can proceed with the high-level instructions. There are extensions for Mozilla firefox and Chrome which can be used which can be use to make REST calls. The bottleneck of this setup is that the API server has to be maintained to handle all the API calls. AWS API Gateway provides the ability to act as an interface between application and database which uses AWS Lambda function as the backend.

To get the essence of AWS API Gateway, we need to get hands-on with it. Invoke the API URL with the following values

{

“emp_id” : “3”

“emp_Name” : “Jack”

“emp_Salary” : “200000”

}

{

“emp_id” : “3”

“emp_Name” : “Jack”

“emp_Salary” : “200000”

}

Check the DynamoDB table again for the entries.

The response from the DynamoDB is processed by the Lambda function and routed via the API Gateway.

For Detailed Instructions..

With minimal effort a REST API has been created which accepts data, which in turn gets processed by Lambda function and finally stored in a DynamoDB table. Accept the rest of attributes as default and review the table details.

For Detailed Instructions..
Step 2: Create a Lambda Function:

Create a node.js Lambda function called addEmployee. Next a Lambda function which inserts the data into DynamoDB table. The API gateway provides nifty features where we can create different stages for development. Replace IAM access and secret keys of a user with access to write into the DynamoDB table.

For Detailed Instructions..

Step 3: Create an API in API Gateway :

Next create an API called Employee_API, a resource called employee and create a POST Method. If you have any queries on API gateway or you have trouble getting it setup, feel free to drop a question into the comment section below or you can also ask the questions to https://forum.cloudthat.com

Thank you. In the current world of API, every mobile application and website have to communicate using dedicated API servers. Also for people who are new to these services, there are detailed instructions which can be followed for step-by-step guidance.

The first step will be to create a DynamoDB table which stores the data. This API is ready for use and can be used in any application.

The API gateway frees the developer from maintaining infrastructure for APIs. API servers act as an intermediary between the application and the database. An API Gateway is setup to trigger the Lambda function. For this Install a REST client.