In my last blog post, Enterprise Reference Implementation for Conversation Aggregation, I went over all the features and capabilities that a conversation-enabled application might want to include. Today, we will discuss a specific aspect of that implementation: what your storage strategy might look like.
This blog post is relevant to everyone regardless of your chosen implementation, whether you decide to use the Reference Implementation or implement your own design.
We had previously identified these key takeaways while designing your enterprise-grade conversation application from a storage perspective:
- Need extensible and scalable backend storage that doesn’t impact the availability
- Performant data access that requires significant read/write access to a ratio roughly of 50/50
- Ability to query for relationships between data points
However, there are also other storage-related concerns we didn’t call out in that blog that are just as important but tend to lean more toward your solution’s deployment and operational aspects.
These include:
- Backup and/or disaster recovery solution
- Data mobility
- Regulatory compliance
- Hybrid or multi-cloud architecture
We will cover those topics in this blog, as well as discuss who they apply to your application regardless of design and implementation. Let’s get started!
The Usual Suspects: High Availability, Scalability, etc.
If you have spent any time in a tech role, you know the following software solution requirements quite well:
- Need extensible and scalable backend storage that doesn’t impact the availability
- Backup and/or disaster recovery solution
- Regulatory compliance
If you haven’t yet, you’ll learn at some point—probably sooner rather than later, and hopefully without losing your job over the experience. It belovingly is a rite of passage in many people’s careers in tech, often reminisced upon with a “you have to get a load of this” kind of vibe to it. Still, we should pay our proper respects to them in the hopes that newcomers heed all the warnings.
We all know that we want to pick both physical storage and software/database solutions that are highly available, incurring near zero downtime, and are scalable to grow with our needs. Still, one aspect that gets overlooked is having a backup/recovery story which is far different from a disaster recovery story. Let’s dive into that a little more.
If you are storing both conversation data and its metadata, you probably don’t want to lose that information. Should you belong to the club of applications that require data retention compliance due to legal, financial, medical, etc. requirements, then you must keep data around for your business to operate. It’s not to say that losing that information is the only concern, but also being able to recall that information or a document as it looked at a specific point in time.
This topic has been around for a while. Thankfully, there are a TON of great articles that talk about physical storage strategies and others about achieving high availability on database platforms. So I’m setting that off to the side to move on to more exciting discussions.
For those that are curious and might not have had a past in the storage world, disaster recovery is basically what happens when an entire site is wiped off the face of the planet. Would your business continue to function or bounce back quickly? That’s disaster recovery, but this never happens, right?
Making the Data Work for You
An interesting topic on our list is having a software storage platform or datastore to store and retrieve conversation data. The breakdown of conversations is quite interesting, and the relationship between what is said might be as important as the content itself. Let’s do this with a straightforward example.
Jane: I’m going on vacation next week.
John: Where are you going?
Jane: Hawaii.
Simple enough. Suppose we break down this simple conversation with the intent of storing the conversation in a datastore. In that case, we can easily see two individuals involved in this conversation: John and Jane. There are a total of three sentences spoken. One of which is a question. If we strictly look at the data or tangible things like words and letters in the conversation above, we might say we would have stored all the “data” or words, sentences, questions, etc., in that conversation.
However, let’s look at non-tangible things or things that we instinctively know based on experience. There is additional information, or metadata, in this short conversation that we need to capture as well. Metadata, in this use of the word, is actually true to the definition in which it’s being used; that being data about other data.
These are things we know but fall more in the category of metadata (with more obvious examples at the top):
- Who says what matters – switch around the names, and it wouldn’t make sense
- Order matters – mix up the order in which sentences were stated
- One sentence is unique; that sentence is actually a question
- A topic about going on vacation is mentioned
- A location (or if we get meta, an entity or thing) is given as an answer
This begs the question: How do we capture these relationships? And how do we do it efficiently so we aren’t duplicating copies of data within the system? This naturally lends itself to using Graph databases, because these databases prioritize relationships between datasets as much as the data itself. In other databases you might see something to the effect of foreign keys to associate one piece of data that might not be atomic or have a single instance to another piece of data that also might not be atomic. That just makes my insides want to cry a little!
Suppose we were to store this in a Graph database. In that case, it could look something like the picture below, with the Nodes representing data and the arrows representing relationships between the data.
Then, when trying to recall specific data or a specific interaction, you can create a query on relationships to retrieve data on both ends of said relationship. Pretty cool.
Suppose you wanted to query all conversations that mentioned the Topic of “Traveling to Hawaii”. In that case, those queries will look vastly different depending on the type of database you might be using. On a traditional relational database, you might have a Topics table, a Message table, and a Topic-to-Message “relation” type table with one foreign key pointing to Topics and another pointing to the Messages. In a Graph-type database, the relationship itself is queryable. That query could look as simple as give me all Topic Relationships where the value equals “Traveling to Hawaii”.
Data Mobility Means Portability and Selectivity
The last storage topic that I find fascinating and brings the previous two topics together is data mobility. What is data mobility? It’s the ability to relocate your data from one location, for example, AWS, to another location, say an on-premise data center, and also run your application from either of those locations ideally simultaneously.
This isn’t a new concept, but it’s often dismissed as something we don’t need to worry about. The tech industry has alleviated most of these concerns due to the things mentioned above, like highly available systems with disaster recovery strategies using redundant systems across multiple regions, etc.
Like all things in life, nothing ever goes as planned. Otherwise, we would need a plan-B or C and all of these multiple contingency plans in place after that. Besides just reading your cloud, database, vendor’s service level agreement (SLA), etc., and praying they can actually walk the walk, enterprise architecture allows for the ability to:
- To relocate your database from a cloud provider to on-premise or vice versa
- Have multiple instances of your database run concurrently in multiple cloud providers
- Give the option (with some effort) to change the underlying database technology
You’re probably saying to yourself, why do I care? I made a good choice, and my cloud provider isn’t going anywhere. You would probably be correct and I wouldn’t dispute that. BUT, what happens if a few years down the road your cloud provider starts charging you 10x for the cost of its services? Now it’s starting to sound like data mobility is something you might be interested in. It should also go without saying that data mobility isn’t a thing unless you try the migration plan at least once, and realistically it should be done periodically to make sure the plan is still viable.
Conclusions
So, what can you take away from all of this? There are three main things:
1. Storage is non-trivial even to this day. Humans are making far more interesting but far more complex systems we need to care for.
2. Enterprise applications don’t mean large for the sake of being large. It’s important to create capabilities and contingency plans from A to Z with a team to help you along the way.
3. Applications that process conversation insights have different software storage requirements, and you should find a solution that best fits your needs.
These are specific storage-related reasons among the many more diverse (non-storage) reasons not listed here due to scope. This is why we created an enterprise reference implementation for processing conversations in your business. It can help cut down on your implementation time and build a robust application with very little effort.
In our next blog in this series, we will look at specifics on how easily, effectively, and less code-intensive it is to implement your own application using a completely reusable code and project.