Backend#8: How Atlassian leverages declarative metadata-driven system architecture in their data platform
Atlassian's internal first-generation data lake grows by 85 terabytes a day. Since it's a data lake, it contains data from varied sources and with varied degrees of documentation, metadata, lineage, quality checks, and observability, acting as a centralized repository for storing vast amounts of raw, unprocessed data.
To leverage data from the lake, devs/teams first have to locate the data, connect to it via a pipeline and then process it. Different data lake users have different processing or transformation logic to process their data, along with having to provision data pipelines in addition to ensuring data quality and observability.
Provisioning a data pipeline involves setting up and configuring the required infrastructure, tools and processes to move the data from the lake to the destination.
To cut down the hassle of setting up the entire process from the bare bones every time, Atlassian leveraged a declarative metadata-driven approach to build a deployment capability in their system to deploy the transformation logic, be that in SQL, Python, Kotlin, etc, for any purpose ML, analytics, and such, with the pipelines simply in one go.
Declarative metadata-driven provisioning system
Atlassian's data lake deployment capability provisions resources for executing data pipelines, ensuring data quality and observability and enables users to define their transformation logic using configuration metadata without worrying about the underlying execution details.
This is a declarative metadata-driven approach where the user expresses what they want and the system handles the deployment. Teams can provision data pipelines without manual intervention, requesting pipeline resources (compute, storage, etc.) through a metadata-driven interface without the need to write low-level implementation details.
The deployment capability acts as an abstraction layer, simplifying the deployment process by providing a consistent interface for different execution capabilities (be it running SQL queries, executing Python scripts, etc). The capability ensures system governance by enforcing standards, security and compliance, streamlining the deployment process, making it easier for teams to manage their data pipelines.
With the metadata-driven deployment capability, the entire process of accessing data from the data lake and moving it through a data pipeline to the destination with processing logic is expedited, saving a lot of time that would otherwise be required in setting things up from the bare bones every time and later managing them.
Declarative metadata-driven systems in the control plane
Declarative metadata-driven systems are more prominent in the control plane in contrast to the data plane as they focus on configuration and system management.
I've briefly discussed the control plane and the data plane in my former post where I discuss TigerBeetleDB, a database built for financial transactions. You can give it a read.
Most systems that can be managed with configuration files fall under this category. One prominent instance of this is Kubernetes and Atlassian's data platform is inspired by the same.
In Kubernetes, we can define our desired state using the YAML or JSON files with metadata such as labels, annotations and other specifications that define the state of pods, services, etc., and based on that, it manages the system state. Other similar tools are Helm, Terraform, Ansible, AWS Cloudformation, etc.
For a detailed read on Altassian’s data platform, you can visit the blog article here.
If you found this post insightful, do share the web link with your network for more reach. You can connect with me on X & LinkedIn in addition to replying to this email.
If you wish to learn software architecture from the bare bones, check out the Zero to Software Architecture Proficiency learning path I've authored that educates you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.
I'll see you around in my next post. Until then, Cheers!