How to organize information and leverage DynamoDB features for advanced ways of accessing data
Dashbird is a monitoring platform for monitoring modern cloud infrastructure by providing alerts, insights and data visualisation.
In comparison to relational databases (RDBMS), DynamoDB requires a different approach to data storage and access patterns modeling.
RDBMS support ad hoc queries that are computed on demand, allowing for flexible access patterns. Usually, developers don’t have to think too much about how they will need to access the data in the future. This comes with a few disadvantges, though:
DynamoDB solves all these issues offering high scalability, fast and predicatable queries at any scale. It does require developers to think in advance about how data will need to be accessed later.
Adjusting DynamoDB to support different querying models later is possible. Nonetheless, this adjustment is usually more expensive in DynamoDB than developers are used to in an RDBMS.
The following items will cover strategies to enable flexible and advanced querying patterns in DynamoDB.
Consider a table that contains professional profiles (think of it as a version of LinkedIn). The base table’s primary-key
is the user ID. People can be based in different cities. Retrieving all users based in New York, NY, USA, for example, would require a Scan, which is inefficient.
A global secondary index[^1] can arrange users by the location
attribute. In this case, the city, state, country values would become the primary-key
in the index. Querying by primary-key == "New York, NY, USA"
would return the results in a fast and efficient way.
A combination of attributes are commonly needed when querying. Following the example above, suppose the application need to query by location
and employer
. Say someone needs to retrieve all professionals based in New York, NY, USA that work for Company XYZ.
A simple secondary index as outlined above wouldn’t be enough. There are two ways to support querying combined attributes:
location
and employer
valuesEach item has a location_employer
attribute whose value is the original attribute values concatenated. This artificial attribute is then used as the primary-key
of a secondary index. The following query returns what the application needs: location_employer == "New York, NY, USA_Company XYZ"
.
DynamoDB does not build this type of attribute automatically. The logic to build and keep the location_employer
attribute up-to-date must be implemented in the application backend.
Good programming practices must be followed in order to ensure data integrity. Especially DRY: there must be only one place within the application responsible for inserting and updating the user object data. Developers only have to worry about that part of the application to keep location_employer
perfect and up-to-date.
Some programming languages offer features such as decorators and property objects. In Python, for example, it’s possible to create a user object that takes care of the artificial attribute by itself:
class User():
def __init__(self, user_id, name, location, employer, *args, **kwargs):
self.user_id = user_id
self.name = name
self._location = location
self._employer = employer
self._location_employer = f'{location}_{employer}'
@property
def location(self):
return self._location
@location.setter(self, new_location):
self._location = new_location
# When a new location is set, it automatically updates the combined attribute
self._location_employer = f'{new_location}_{self.employer}'
@property
def employer(self):
return self._employer
@employer.setter(self, new_employer):
self._employer = new_employer
self._location_employer = f'{self.location}_{new_employer}'
Instead of using a secondary index based on an artificial attribute, developers may also insert additional items to support combined attribute queries.
Consider a new user is being created:
{
'primary_key': 123, // User ID
'sort-key': 1234567890, // Timestamp of user registration
'name': 'John Doe',
'location': 'New York, NY, USA',
'employer': 'Company XYZ'
}
The following additional item is inserted in the same table:
{
'primary_key': 'location_employer_New York, NY, USA_Company XYZ',
'sort-key': 123,
}
Notice the primary-key
pattern: it starts with what was the attribute name in the previous topic (location_employer
), then concatenates the values for that particular user (New York, NY, USA_Company XYZ
). The sort-key
contains what is the User ID, serving as a reference to the original user item.
When querying this table, the application can use: primary-key == "location_employer_New York, NY, USA_Company XYZ"
. One or multiple items are returned, it extracts the User IDs from the sort-key
s and issue another read request to retrive the users information.
If the application is read-intensive, it might be a good idea to project (or copy) the entire user information in the additional items to spare the second read requests. This would increase storage space usage, thus should be thought carefully.
The same warning applies: the application must follow good practices - especially DRY - in order to keep additional items integral and up-to-date with the base user item.
When writing to tables following this pattern, it is highly recommended to wrap requests in transations[^2]. A transactional query ensures that the user item will never be inserted/updated if the additional item failed to insert/update.
[^1] Refer to the Secondary Indexes page
[^2] Refer to the Operations and Data Access > Transactions and Conditional Updates page
Save time spent on debugging applications.
Increase development velocity and quality.
Get actionable insights to your infrastructure.