Extend your access patterns with ease
We all are very familiar with the Relational Database Systems. We know how to work with them. What are the best practices, and we know what we can do with them. But RDBMS has problems when the load becomes bigger. With a high load of data, the queries may take really long time to finish. Especially if the queries use
For that reason, the NoSQL databases are on a rise today. They are solving the scaling issue. However, as with everything in the IT world, it is not an ideal solution. It’s rather a tradeoff. The NoSQL databases in general are taking more space than traditional relational DB because the data is not normalized. The learning curve might also be an issue.
If we come from the relational world, we can have a real headache learning DynamoDB. There are features of DynamoDB that need explanation. That’s why in this tutorial I’m going to explain to you what a Global Secondary Index is, why it is useful, and how to create one.
Before moving straight to indexes, let’s briefly talk about data modeling of DynamoDB and single table design. I will compare DynamoDB extensively to RDBMS, to show you the differences.
When modeling relational data, you have the rules of normalization to follow. You know what to do. Generally, you don’t care how you will query your data. Modeling it’s only about having an optimal, not redundant data model. If your application requires data from a few tables? No problem, a simple
JOIN operation will do the job. The number of operations you can invoke on your tables is almost infinitive. Using SQL, you have total flexibility. But in the world of DynamoDB, all of the above is not applicable. Seriously, you can throw that knowledge into the dustbin.
You have to remodel your thinking to work with DynamoDB. Firstly, to model the data, you have to start by considering the access patterns of your app. I mean, you have to know the exact queries you will trigger before even having the table!
To understand that, let’s see an example. Suppose, we are developing a simple note app :). There will be users saving their notes. The note will have the date, content, and category. In RDBMS, we would probably end up with a relation like this:
How can we save the same relation in DynamoDB? As I said, we have to know the access patterns first. Initially, we will do three two operations:
- Add a new note
- Find user’s notes
- Delete user’s notes
Saving the items is not the access pattern, so we have to consider only two scenarios. To find the user’s notes, we can use a user email. It’s a good starting point. Then, to delete a note, we have to identify the exact one. The user’s email, with the timestamp of creation, should do the job. So, having such criteria, we could model the table to have the user’s email as a hash key, and the date timestamp as a sort key. This way, we would fulfill our requirements.
There are at least four ways to implement the one-to-many relation in the DynamoDB. We have just used one pattern: composite primary key and the query API action. Underneath, you can see the table with the example data:
We have the user and all of its notes are grouped by item collection. To create the same table as mine, you can use AWS CLI and this query:
To create, query, and delete notes, you can use these functions:
Great. After this a little long introduction, we are finally ready to talk about the Global Secondary Indexes (GSI in short). Now you know that data modeling in the DynamoDB is all about access patterns. What will happen if our access patterns change or there will be a new one? Or maybe just during the first iteration of modeling, we see that we cannot cover all access patterns with one composite key?
That’s quite a common scenario. Our example is elementary. In more complex tables, it’s almost certain we’re going to have such problems.
The GSI extends the DynamoDB table of new access patterns. It has the same structure as the primary key. You can define a hash key and optionally add a sort key, but you can use another item’s attributes. Under the hood, Dynamo copies your items and saves them in a reshaped formula, to allow you to query them with the same speed.
The important thing is, that the GSI can only be used in a read operation. You cannot use the GSI to save any data. Each DynamoDB table can have up to 20 Global Secondary Indexes. Still, if you have more access patterns, it’s not a big deal. There is a pattern of index overloading, but it’s a big topic and I won’t cover it in this tutorial.
Moving back to our example. Let’s imagine, that in our app, users can label the note with the category. Then, the user can fetch all notes from specific categories. With the current key, we could only fetch all users’ notes and then filter them by category. Definitely, not an ideal solution. To add the functionality of searching by category, we have to define a new GSI. The access pattern will be as follows:
- Fetch a subset of user’s notes, being assigned to XYZ category
The GSI will be similar to our primary key. The hash key will be the same; we will have the string filed of the user’s email. The range key will differ. This time, we will set the category attribute as the range key. In fact, if we knew these access patterns, during the first iteration, we could create the Local Secondary Index instead of the Global Secondary Index (we can create LSI only during the creation of the table). Underneath, you can see the reshaped table with the following index:
To create the GSI, you can do it in the AWS Console or you can type this command:
For better readability, I’m placing not stringifyed JSON below. It’s the same, but it’s crucial to see it in better formatting.
To create the GSI, you have to use the
update-table API. There are a few options we can pick when updating the table, as one of them is the creation of indexes. Then you have to set the name of the table you are updating, and the region of the table.
attribute-definitions options sets the primary key of the index. And finally, for the
global-secondary-index-updates, you have to provide the JSON with the index formula. You can create, update or delete multiple indexes in a single operation. In our case, we only create one index.
Provided index JSON has to contain a few parameters. First is the index name. The next one is a
IndexName which is index’s primary key definition. An interesting one is the
Projection parameter. When creating the index for the DynamoDB table, you may decide which attributes to save in your index. You don’t have to save all of them. In our case, it is useful to have them all, so I used the
Finally, for the GSI, we have to set its own
ProvisionedThroughput. The index may be more or less occupied than the original primary key, so there is an option for setting different values.
Finally, you can test the new index, querying for the data:
The one difference between this query and the last one is the
index-name flag provided.
That’s the whole story about GSI. I hope you have found that explanation easy and useful. When working with DynamoDB, the GSIs are one of the most important assets in a developer’s hand. So, take the time to learn them properly, and you will gain great results.
Have a wonderful day, and see you next time!