Backend Insights #2: The need for polymorphic databases, cloud grey failures & DB design models
Hello, folks, hope you are doing fantastic. This is a relatively longer post than the previous one and in comparison to the length of the posts, I intend to keep for this newsletter, but I promise you'll find it interesting.
With that being said, let's get on with it.
Grey failures: the Achilles heel of cloud-scale systems
Our infrastructure is facing a lot of grey failures. What would you make of this statement?
We add in redundancy to our system components to ensure availability, but what if the system does not detect the infrastructure problems rapidly and reliably to activate redundancy?
Production systems are challenged by grey failures where the system as a whole finds it challenging to detect and diagnose partial or intermittent failures, unlike other failures that are more definitive in nature.
Grey failure symptoms are not immediately obvious and may come across in the form of performance degradation, random packet loss, flaky I/O, memory thrashing, capacity pressure, non-fatal exceptions, etc.
As our systems increase in complexity, grey failures become more common and are the reason behind most cloud incidents.
One instance of this is when a system component fails to serve requests, but its heartbeat module works fine. The error detection module relies on the component's heartbeats to ascertain that the component is healthy and thus detects no failure.
Understanding and defining grey failure despite its variability is thus key to building highly available and reliable cloud systems. Here is a technical paper by Microsoft discussing tackling the problem of gray failure head-on by addressing its fundamental traits: Gray Failure: The Achilles' Heel of Cloud-Scale Systems (microsoft.com) Do check it out.
TypeDB: A polymorphic database with a conceptual data model and a strong subtyping system
TypeDB is a polymorphic database with a conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a beautiful and elegant type-theoretic language: TypeQL.
(⊙_⊙') When I first read this on their website, I needed a minute to process this, maybe longer. Polymorphic database, conceptual data model, strong subtyping system, symbolic reasoning engine, type-theoretic... Subtyping? Types? in a database? That's a lot of DB jargon for my puny brain to process.
I vaguely know a few terms, but I won't say I really clearly understood at the first go. I delved deeper to comprehend what these terms meant and how they powered TypeDB, in addition to the use cases that would fit this type of database.
I'll quickly go through the concepts associated. It may serve as a refresher for us, reinforcing concepts you may already be aware of.
Data modeling levels: Conceptual, Logical & Physical
Conceptual, logical and physical data models, referred to as different levels of data modeling, are largely used in the industry when designing databases for our services from the bare bones, with each level serving a specific purpose.
The conceptual data model deals with a high-level overview of system data, providing an overview of the data needs of the service and business requirements and the mapping between them. Typically, the stakeholders and data architects work in conjunction to create a conceptual data model.
If we take the example of an online retail store, the conceptual data model might include entities like Customer, Product, Order, Payment, etc. It will describe the relationships between these entities, such as "A Customer can place many Orders," "An Order contains many Products." and so on.
The logical data model further elaborates the conceptual data model by giving a detailed representation of the data structures, tables, columns, relationships, and rules derived from the conceptual model. It serves as a blueprint for implementing the physical data model. At this stage, the data model’s primary function is to visualize data elements and how they relate to one another.
Taking the above-discussed example of our online retail store, the logical data model will specify tables like Customer, Product, Order, Payment, etc., along with their attributes and relationships. For instance, the Customer table may include fields like customerID, name, email, address, etc.
The physical data model involves implementing the logical data model with a specific database technology. This entails specifying details like indexes, constraints, DB-specific features, creating schemas, optimized storage, partitioning, performance tuning and so on by DBAs and developers.
In our online retail store, the physical data model would involve decisions like what columns would be indexed for faster querying, how the data would be stored on disk, partitioning strategies for better performance and so on.
Additionally, for instance, the logical Customer table could be split into different database tables with specific data types and indexing strategies optimized for the chosen database platform, such as MySQL, PostgreSQL, Cassandra or MongoDB.
Industry approach for designing databases
In the industry, when designing databases from the bare bones, in addition to having the conceptual, logical, and physical data modeling phases, we start with requirement gathering and analysis, where we work with the stakeholders to understand the business domain and the requirements.
This is where service companies with prior projects and experience in a certain business domain like Fintech, insurance, health, etc., have an edge in the market and it becomes relatively easy for them to get the projects.
In all three data modeling phases, there are a couple of common approaches that are applied throughout, such as normalization, which is used to eliminate data redundancies, thus improving its integrity and review and iteration, where every model is reviewed by the stakeholders to ensure its alignment with the business requirements and the necessary adjustments that are needed to be made, if any.
Documentation, maintenance, monitoring, and refactoring are steps that are done continually as an ongoing process after the physical model is implemented. These are done as long as the service is in commission to keep the database component scalable, reliable and available.
Next up, polymorphic database.
Polymorphic database
What is a polymorphic database?
A polymorphic database can store data in multiple forms, offering flexibility and adaptability to the application. Picture a database field that can store integers, strings or a complex data structure. This enables the application to adapt to dynamic changes as it is developed iteratively without requiring any significant schema modifications.
For instance, consider a CMS (Content Management System) where users can post multiple types of content such as articles, images, videos, etc. Each type of content may have different attributes and data requirements.
To store this in our database schema, for instance, in the Content table, we can have the following fields:
Content Table:
contentID (primary key)
title
type (denotes the type of content: article, image, video, event, etc.)
data (polymorphic field to store varied data types)
We have a 'type' field that denotes the type of the content and the data field, which is polymorphic in nature and stores the content. The content can be in the form of text (e.g., HTML, Markdown), image (e.g., JPEG, PNG), video (e.g., MP4, AVI), miscellaneous data or metadata (e.g., date, location, agenda), etc.
This is a rudimentary example of working with polymorphic data that can make our application flexible and adaptable, in addition to simplifying the implementation. The application can also query the database in a conceptual manner as it knows how to interpret the data based on the content type (more on this later).
The simplicity is due to the fact that there's no need for additional tables or complex relationships to store different types of content.
Let's go through a few use cases where these databases fit best.
Polymorphic databases use cases
Content management systems is one use case that we discussed above. Besides this, these databases come in handy when modeling complex graph relationships, for instance, for building medical systems where an entity like a disease may have diverse attributes and relationships with other entities like symptoms, treatments, related diseases, etc.
The same goes for storing e-commerce products that have diverse attributes and relationships. For instance, a book product will have different attributes in contrast to an electronic product, that will have a different set of attributes. Also, this data is hierarchical in nature besides being complex. Different types of products are listed under different segments and so on.
Complex relationships in social networks and recommendation systems is another use case of this.
In addition to storing multiple data type values under a single field, we have polymorphic associations as well that allow a single field to be associated with multiple tables. This is particularly advantageous when dealing with systems where new types of content or relationships are frequently introduced.
In a conventional relational database, we would have to make separate tables with redundant data to store such complex data. I won't go into further detail because it is not required right now. We may continue this discussion in future posts when polymorphic databases resurface.
Hopping back to TypeDB
We are now aware of the conceptual data model and the polymorphic databases. Looking back at the TypeDB description:
TypeDB is a polymorphic database with a conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a type-theoretic language: TypeQL.
We can now understand that TypeDB supports polymorphic DB fields and associations. In addition, it deals at the conceptual data modeling level, handling the implementation intelligently with its reasoning engine, making things more intuitive for developers to understand and work with.
Now, what is a subtyping system? We will quickly discuss this as well.
DB subtyping system
We are aware of types in the context of programming languages. For instance, in OOP, if we have a Vehicle class type, a Car class can be a subtype of Vehicle. This is a form of inheritance.
Subtyping refers to the relationship between types where one type is considered a subtype of another. A subtype inherits properties (attributes, methods, etc.) from its supertype.
But relational databases don't have inheritance intrinsically built into them, right? They have more like flat-table relationships.
This brings us to the problem of object-relational impedance mismatch.
Object-relational impedance mismatch
Object-relational impedance mismatch refers to the inherent disparity between the OOP paradigm and the relational database model.
OOP paradigm supports objects (with both attributes and behavior), encapsulation, inheritance (hierarchical relationships) and polymorphism. In contrast, the relational data model has data stored in flat tables consisting of rows and columns in a normalized fashion with limited or no support for hierarchical data.
It's tricky to directly map objects to the DB tables without an ORM (Object Relational Mapping) framework like Hibernate or the Entity framework in .NET. This clearly shows the incompatibility between the intrinsic design or structure of OOP and that of a relational database.
TypeDB strong subtyping system
TypeDB tries to tackle this mismatch with its subtyping system. The schema with types and subtypes is similar to the application code structure, where the subtype inherits all the attributes, roles, and relationships of its supertype.
Strong subtyping means that the data remains consistent across different types. Queries against a supertype automatically include data from its subtypes. Developers can express complex relationships with intuitive conceptual modeling and such.
This hierarchical subtype structure makes it fit for storing hierarchical and complex relationships that we've already discussed above.
Coming to the last part: type-theoretic language.
TypeDB is a polymorphic database with a conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a type-theoretic language: TypeQL.
Type-theoretic language
The type-theoretic language refers to a query language called TypeQL, which is designed based on the conceptual data model of TypeDB, which has a central design principle that everything has a type.
TypeQL queries resemble natural language and are intuitive to express, even for complex queries, which is due to the conceptual data model. It also supports declarative polymorphic querying.
If you've implemented a GraphQL API in the past, you might remember the use of types in the schema as well; that are used to map schema with the application code.
I end this post here, assuming you found it helpful and can now well comprehend the introductory TypeDB description. Also, this post will come in handy when we come across similar products in the near future and want to brush up on the concepts involved. Also, do check out the TypeDB website if you wish to delve into further details.
On a related note, what comes to your mind when you hear the terms Strongly typed, Weakly typed, Statically typed and Dynamically typed in the context of programming languages?
You'll find the discussion on this in my previous post here:
Furthermore, your feedback is crucial to this newsletter. I can't stress this enough. Please do reply to this email with your thoughts and feedback and if you need anything in particular to be brought up in my future posts. You can connect with me on LinkedIn & X as well.
I'll see you in the next post. If you found the content helpful, consider sharing it with your network for more reach.
See you around. Cheers!