Backend#9: Elasticsearch data lake and its integration with existing system architecture
Elasticsearch Search AI lake
Elasticsearch recently launched their search data lake that enables us to run real-time low-latency searches on unstructured data stored in the data lake.
Elasticsearch is an open-source search and analytics engine built on top of Apache Lucene. With it, we can augment our application with features like full-text search, real-time analytics, etc.
The Elastic data lake has a serverless compute layer on top that facilitates running compute operations on the data lake without worrying about things like infrastructure management, scalability and observability. They've focussed on keeping the compute and the storage decoupled for independent layer scalability.
Furthermore, the complete Elasticsearch Search AI Lake product supports AI features out of the box such as vector search, semantic search, ML model integration, RAG (Retrieval Augmented Generation), etc., as well.
In my previous post, I’ve discussed another data lake use case on how Atlassian leverages a declarative metadata-driven system architecture with its data lake for quick deployments. Do give it a read if you haven't yet.
Leveraging the ES data lake with our existing system architecture
As I was going through the Elasticsearch product documentation, I wondered how we can leverage the data lake in our system architecture, it being a ready-to-use plug-and-play product. And this is what I figured:
Picture we have an e-commerce platform where we want to provide product recommendations to the users based on their present and past browsing behavior and shopping history.
We can move the historical user data, site activity, product details and the related info into the Elastic data lake. Since it enables us to run real-time searches on unstructured data, data with varied data models wouldn't be an issue.
If you want to delve into the details of how Elasticsearch data indexing and search happens, do go through the linked docs.
With Elasticsearch AI, we can analyze our ecomm service data to generate personalized recommendations for our users. For instance, for a user who has purchased running shoes, the system can suggest related products like fitness trackers, sportswear, etc.
How does Elasticsearch computes related products?
Elasticsearch contains a relevance engine called ESRE (Elasticsearch Relevance Engine) that clubs machine learning features with text search. It offers a full suite of sophisticated retrieval algorithms, in addition to the ability to integrate different LLMs to enhance our product's capabilities.
Image similarity search is one use case amongst many where AI-based reverse image search can help customers find similar products.
In our e-comm platform use case, besides tailored searches, we can leverage the Elasticsearch AI data platform to get real-time inventory insights as well.
We can stream inventory and sales data to the data lake, and with the serverless compute layer, we can run low-latency queries, getting insights into our stock availability and thus improving supply chain efficiency.
Regarding AI/ML
I am guessing most readers of this newsletter aren't ML engineers. They are more like somewhere in the backend development or product management realm, like me.
I believe we've arrived at a point where AI, ML are no longer a niche thing in the backend engineering and web services realm. They have come to the forefront with every web service leveraging it in some way or the other. And given the current state of the backend, it's a good idea to be aware of the fundamentals of the space.
By fundamentals, I do not mean writing basic ML code but rather having an understanding of the associated concepts, frameworks, technologies, how web services are leveraging them to better their functionality, and so on.
Having a continual overview of the space, staying on top of things would enable us to make more informed technology decisions, confidently participate in tech discussions and steer our career in the right direction.
I actively discuss AI concepts as I come across them in this newsletter. For instance, in a former post where I brought up Cloudflare Workers AI, I discussed concepts like AI inference and AI Gateway.
Please do let me know your thoughts on this if you find my posts insightful, in addition to if you are actively learning ML, just want to have an overview of how things are, are apprehensive about it or whatever comes to your mind.
You can reply to this email or connect with me on chat if you are on Substack or X and LinkedIn are good places for discussion as well.
There are a few AI terms that I have encountered when researching the Elasticsearch Search AI data lake, such as vector search, semantic search, embeddings and RAG. I'll briefly discuss it in my next post.
If you found this post insightful, do share the web link with your network for more reach.
If you wish to learn software architecture from the bare bones, check out the Zero to Software Architecture Proficiency learning path I've authored that educates you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.
I'll see you around in my next post. Until then, Cheers!