Infusion of AI in IT Production Support

The Fluid Intelligence Podcast

Hear it on Spotify

Artificial Intelligence (AI) is rapidly transforming every industry, and production support is no exception. But how exactly is AI changing the way these critical teams operate? Are AI solutions making their jobs easier, or are they creating new challenges?

Sooryanarayanan Balasubramanian, Senior Director of Marketing at Factspan will be guiding our conversation and joining him as the podcast guest is Ritesh P Iyer, an expert of data mesh and AI for enterprise clients at Factspan. Ritesh is a Senior Director of Pre-Sales at Factspan.

Together, Soorya and Ritesh will be diving deep into the world of AI-powered production support. They’ll discuss the challenges faced by production support engineers, the solutions AI offers, and how Factspan is helping clients build “smart support systems” for the future.  

So, whether you’re a production support engineer, or simply curious about the future of AI, this episode is packed with valuable insights you won’t want to miss. Let’s get started!

Full Transcript: 


Welcome to the first episode of the Fluid Intelligence Podcast. When we discuss tech today, it feels pure magic. Remember Gen AI feels like a distant dream today. It’s the heartbeat of every innovation that is happening around, and probably every conversation that we are going to have here. But here is the kicker. Do we really know what’s coming next? Are we prepared for the future?

But there is certainly one thing that we can do. Keep ourselves ready for the change. As tech leaders, we need to embrace fluidity and evolve our intelligence to stay relevant. The Fluid Intelligence Podcast isn’t just a series of conversations. It’s a guide, compass and probably your visionary partner in your journey of data and AI.

Today we will talk about the world of production support engineers and how AI has infused into their daily operations. I’m Soorya, Senior Director of Marketing at Factspan and I have Ritesh Iyer, Senior Director of Pre-Sales. He is a champion of data mesh and AI for our enterprise clients. Welcome Ritesh.


Thanks Soorya. Thanks for inviting me.


Yeah, before we deep dive into the full topic, let’s start with the primer. What is happening in the production support world? What incentive drives them?


So, Soorya production support is about systems being always on and always right. With the advent of cloud, streaming applications and the advancements in AI. So, the SLAS have significantly shortened.

So, if you look at earlier on, it was all the systems, the expectation was that the systems could be always on and always right maybe in a couple of days or maybe a couple of hours.

But now the expectation is that it should be always on and right in a few minutes. So, think of the Amazon Prime Day sale, so many people log in right to buy products and if there is a server outage at that point in time, the e-commerce could lose out their customers to competition.

Right at the same time, from a user standpoint, I wouldn’t want downtime to hit me when I’m either checking out or when I’m paying, because then I lose trust on the platform.

Think of another example in the USA, there’s a lot of retail pharmacies, right, so they have the system where the patients can directly interact with the pharmacists at real time through SMS, right regarding their prescription orders or status.

Now, if there is a lag, I mean the pharmacist, so the pharmacy cannot afford that because it’s pertaining to somebody’s medicines or health recovery and so on.


Yeah, especially healthcare, iIt plays a lot more.


Exactly, Soo now the SLAS are now to the tune of maybe 5 minutes or at the Max 10 minutes. So in essence, in summary, if you look at it, production support is all about keeping pace with the advancements in the digital landscape and at the same time providing a seamless experience to the users.


You spoke about always on, always right. When I look back, it feels like a daunting task even let alone the production support team, right?

Maybe it actually feels like the F1 World of IT that we exist in today. 




Having said that, I just wanted to understand, how has their role evolved over the years, let’s say 10 years back, five years back and now?


Sure. That’s a great question. So gone are the days when the production support engineers used to be looked upon as mere troubleshooters. The expectation is that now they are looked upon as strategic collaborators.

So each one plays multiple roles in a team, so they could be developers, testers, as platform engineers and so on.

Earlier on, we used to have large teams of production support, so 50 to 100 member teams, obviously depending on the size, scale and the complexity of the systems that we’re looking at. But now clients  have started optimizing their teams. So if you look at it, it’s almost 50%.

Right. So that’s because they have started adopting the DevOps based pod model. So what does a pod look like? It’s a team of dedicated roles with complementary skills, right? Complementary in the sense that a single person plays multiple roles and you assume full ownership from right from the development to support.

So the expectation on these production support and engineers have increased. So every production support engineer needs to keep himself abreast with the latest technological advancements through learnings, continuous learning and certification.

Secondly, so they have to be at the forefront of automation, they have to prioritize customer experience and satisfaction and finally they have to be seamless collaborators. So we at Factspan Soorya we so we actually deployed data and AI engineers for the job because we believe that they are the guys who are going to understand the system well, they can not only solve.

All the intricate technical challenges but they can actually bridge the gap between the technology And the users.


OK. Nice. Interesting. You spoke about the DevOps world of production support. While we said that it’s daunting earlier, it also feels logical in the current context of how the business is actually evolving today, right? Having said that, again, in this context help us understand what are the bottlenecks on daily operations or production support engineer phases.


Yeah, sure, sure. Yeah. So if you traverse 10 years back when there was no dev OPS, the bottlenecks and challenges were way different. So it was all about manual processes, collaborations between teams and so on. Now with DevOps in picture, the bottlenecks and challenges are very different. So people, especially organizations are facing a technology hammer. There are too many tools out there right? So be it open source, enterprise cloud native. There’s a problem of choice, correct. And so they don’t know what is the right tool for them.

If you look at a very big organization there could be multiple verticals and each of those verticals end up choosing different sets of tools for monitoring, your ticketing, your problem management and so on, right? So how do I get a seamless view or an integrated view of the various issues which are prevalent in an organization, right. So that’s become a challenge, right? So within organizations as well. So there are pockets of technical debt Legacy infrastructure which hinders efficient automation or proactive solutioning right?

The last one. So the last issue is no brainer. So it’s a skill gap which most of the organizations face. That is how I identify, train and retain the right talent for the job.


Right. So there is a problem of choice there is the standardization that needs to happen inside systems and also between legacy and the modern systems. And now there’s also a skill gap that is evolving with the way the industry is moving.


Exactly, that’s true.


Interesting! Let us come to the obvious elephant in the room, which is AI. We understand with the way it develops into the productions of our world, things have changed a lot, but with AI being featured in a lot more of these scenarios, how has AI played in diffusing some of the boundaries that exist between the hierarchical structure of the production support?


So firstly Soorya, I like the way you said hierarchical structure, right? So and that’s true. So production support. So it’s a tiered framework. So where we have got multiple levels L1, L2 and L3. L1 is supposed to do a lot of basic support basic fixes, right.

L2 does a little more complex and advanced fixes, and L3 does more specialized fixes in collaboration with the development team. So yes, AI is empowering. Each of these layers, so each of these tiers. And it’s blurring the boundaries between them, right? So it’s touching almost every aspect of production support, so be it. Predictive maintenance of your platforms and infrastructure or proactive monitoring of your issues and raising alarms to the relevant teams or it could be an NLP solution which goes through humongous Log files and gives out what could be the possible root cause analysis. It could be an auto healing solution which could fix it.

Your known errors or known issues or it could be virtual digital assistants which could actually help the users with their basic queries. So AI is completely transforming the production support landscape from being seen as very reactive to being very proactive.


And it’s also Putting into every operation of the collection, supporting exactly. That’s true in the earlier part of the conversation. I remember you saying how data engineers today play a significant role in ensuring the systems are always on and always right.

Having said that, how is Factspan enabling this transformation for our clients?


Yeah. So as you know Soorya Factspan is a data and AI company and we are enabling clients to be truly AI native. We have our offering AI subsistence. For  production support where we help clients reimagine their production support operations, right? Help them build smart support factories through a very data-driven AI led sort of an approach which we call it as fact IAP.

So what does this framework focus on? There are 3 major things. The three key elements of production support? That is, how can I make my monitoring intelligent? How can I make my system smart through automation and how can I improve efficiency through process enhancements?

Right. So to help accelerate the clients journey towards building a smart support factory, we have got loads of assets and accelerators, apart from various frameworks, models and so on as well as best practices, right? So the first one being the Integrated Command Center. So Soorya, if you remember, I spoke about the challenge being a technology hammer and the challenge being that there is. So I mean there’s a struggle to get an integrated view.

Of all the issues across the estate, right with multiple tools being. So, Integrated Command Center rightfully solves that issue where the production support engineer as well as the leaders alike can get a single view of anything and everything that you need for production support, be it the platform health, be it the chops, the pipelines.

The incidents, the problems, alarms and so on, right. And the beauty about this is that it’s not just a dashboard, it is a Control Center. It’s like a war room for action support where you can actually actively go and remediate any of these issues. 

So there are other features like sweet proactively gives out some of these. So there’s a failure predictor which actually proactively gives out what sort of issues would occur so that you can take steps before the issue. 


How does it predict?


So it scans through your logs, it scans through your Jobs, your history, historical jobs, historical data and accordingly predicts that similarly, we have got a KDB which is a known error base right so known error database based auto heal solution which actually goes and fixes all the known errors which are logged into the system.


OK, so it takes the knowledge base and then works.


Exactly. And this frees up a lot of time for the production support engineers and they can focus on something much better, right in terms of automation, in terms of improving the processes and so on.

And no conversation can end without, so without calling our generative. Yeah, right. So. So we also have an LLM. Powered data OPS copilot where? So where we are helping the production support guys actually.

Go and read through those humongous logs and decide the root cause analysis of that particular issue. Triage the incidents and then proactively fix them. So yes, we have all these assets and accelerators in place and so and we have deployed them for a few of our clients and 

we have seen at least around 50 to 60% improvement in the mean time to resolution and at least around 30 to 35% reduced incidence.


Awesome! So if I understand correctly what you’re saying is the two actually monitor issues in real time, fix them, sometimes even before a production support engineer sees it, but still logs it, and also it learns over a period of time and makes itself much more effective as the learning keeps happening, yeah.

So overall, I think it’s a great tool probably to improve productivity of the production support engineers and also reduce the resolution time of any of the issues that can happen in the future

Thanks ritesh. It was quite insightful. Obviously I’m curious to know more, but I I think we should keep it for another day


My pleasure, Soorya.


Awesome. I hope you enjoyed this conversation. If you find it interesting, please like the video and subscribe to the channel. If you have any queries, comments or even topic suggestions, please feel free to write to us at Until then, look forward to seeing you in the next episode. Cheers! 

Scroll to Top