Modernizing IT Delivery with AIOps

Aug 24, 2021 3:00 PM - 4:00 PM EST

Key Discussion Takeaways

Do you feel like your company is drowning in IT tickets? Is your team unable to handle the heavy load? Is your system down longer than you can stand?

AIOps can help. If you're monitoring many different technologies, you've got storage, computers, applications, and databases. Each one of them can flag an event and error condition, but how do you determine the root cause? What happened, and in what order? AIOps is there to make sense of it all and figure out the answer in warp speed. AIOps can fix the issue before it's a critical outage, then restart the service documented, update the ticket, and close it all without human intervention. How can you implement AIOps into your company?

In this virtual event, Greg Irwin is joined by Lee Koepping, the Principal Solutions Architect at ScienceLogic, to discuss how AIOps can relieve your IT headaches. Lee explains what AIOps can do, how they can improve CMBD accuracy, and how long it takes to implement an AIOps system.

Here’s a glimpse of what you’ll learn:

Lee Koepping explains ScienceLogic’s areas of expertise
What are AIOps?
How AIOps have benefitted the Federal Department of Veteran Affairs
Is a centralized plan important?
How long does it take to integrate an AIOps system?
Using AIOps to identify real problems that only humans can fix and letting automation take care of the rest
How AIOps can improve CBMD accuracy
What kind of projects does ScienceLogic support?
How AIOps can free up capacity and enable a team to get through their backlog
Lee explains anomaly detection

Request The Full Recording

Event Partners

ScienceLogic

ScienceLogic is a leader in ITOM & AIOps, providing modern IT operations with actionable insights to predict & resolve problems faster in a digital, ephemeral world.

Guest Speaker

Lee Koepping

Principal Architect / Federal Sales Engineer at ScienceLogic

Lee Koepping is the Principal Solutions Architect at ScienceLogic, the market’s leading monitoring and AIOps platform. ScienceLogic provides modern IT operations with actionable insights to predict and resolve problems faster in a digital world.

Lee has over 25 years of experience in the electronics and information systems fields and over 20 years of direct management of high-caliber technical sales engineers within corporate leadership positions. Previously, Lee was the Chief Technology Officer at Blue Door Networks LLC, the Senior Director of Technical Solutions and Chief Technology Officer at Iron Bow Technologies, and the Senior Director of Pre-Sales Engineering at Apptis Technology Solutions LLC.

Greg Irwin LinkedIn

Co-Founder, Co-CEO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Event Moderator

Lee Koepping

Principal Architect / Federal Sales Engineer at ScienceLogic

Lee Koepping is the Principal Solutions Architect at ScienceLogic, the market’s leading monitoring and AIOps platform. ScienceLogic provides modern IT operations with actionable insights to predict and resolve problems faster in a digital world.

Lee has over 25 years of experience in the electronics and information systems fields and over 20 years of direct management of high-caliber technical sales engineers within corporate leadership positions. Previously, Lee was the Chief Technology Officer at Blue Door Networks LLC, the Senior Director of Technical Solutions and Chief Technology Officer at Iron Bow Technologies, and the Senior Director of Pre-Sales Engineering at Apptis Technology Solutions LLC.

Greg Irwin LinkedIn

Co-Founder, Co-CEO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Request the Full Recording

Please enter your information to request a copy of the post-event written summary or recording!

Need help with something else?

Aaron Conant

Co-Founder & Managing Director at BWG Connect

BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution.

Co-Founder & Managing Director Aaron Conant runs the group & connects with dozens of brand executives every week, always for free.

Schedule a free consultation call

Discussion Transcription

Greg Irwin 0:18

First off, Big thanks to Lee Koepping, and the team at ScienceLogic, we're going to be co hosting here between BWG, and ScienceLogic. And we're basically going to follow a format that we've been doing pretty consistently for the past eight years, which is really a group conversation around the way different organizations and agencies use technology. Talk about some real world stories, you know, the good, the good, the bad, the ugly, and use it really as a as a information exchange. So I'm going to start with Lee, and basically take advantage of the expertise that they have here in AI Ops, but then I'm going to go around the group, and try and get everybody actively involved to talk about where the where the focus is, where the where the opportunities are, and really what's impacting people's different kind of realities of how they've deployed these technologies. I'll ask everybody to take a personal challenge. Looking across our grid today, try and connect with one person across the grid. It doesn't have to be ScienceLogic, or bw G. But if you hear something interesting from somebody, expand your own network, reach out to those individuals. And I promise you'll be better for the where we will send out the list of individuals of everybody who's joining. And we'll be happy to help connect people. If you're having trouble connecting with them, it will be an opt in. Also, we use the chat window throughout here. It's incredibly helpful. So as we go drop your own comments, questions, stories, anecdotes into that chat window. And again, I promise it'll be a more a more valuable session to make sure we're hitting on the points you all care about. But before so before we go any further, let's get you into the mix. Do us a favor, give a little bit of your intro personally, and a little bit on who ScienceLogic is,

Lee Koepping 2:21

please? Sure, yeah, my name is Lee Koepping. I'm the principal architect at ScienceLogic, primarily focused on federal though, I do get involved elsewhere. Personally, I am. As old as I look, I've been in the industry quite a while I have been a CTO at a billion dollar company at a startup CIO and an operations company. I actually came across ScienceLogic, having bought this platform twice, actually, in my career, and had partnered with him as well in the industry. And now I find myself here, ScienceLogic is the market leading I'd say at this point, according to the our analysts, it is a monitoring and AI ops platform. Ai ops is primarily focused around things like automation and sort of achieving the art of the possible but monitoring is very foundational to that. So it is a platform that is ubiquitous in terms of what it monitors, so traditional network storage things inside your four walls, Compute Cloud containers into the application space, public cloud, as well. But, but the regardless of make model and manufacturer and we've even gotten into many sensor technologies, we're orbiting the Earth now on the International Space Station, we support the Mars rover mission run ships in the fleet, and traditional commercial businesses, state and local education. So pretty broad customer base, as a monitoring platform, we use many, many protocols to to ingest information, and then really feed that back in a multi tenant Business Services approach. I think, hopefully part of the discussion today, it'll be the challenge that we all have in the industry of things like containers and ephemeral technologies that are running one minute gone the next minute, how do we maintain the CMDB? If something does go bump in the night, how do we actually capture that? Try and get immediate resolution or reduced mttr? When when everything's moving around, and then sometimes bouncing between your your private data center and a public cloud and hybrid mix? That's really what our solution was built to address about 1718 years ago before cloud was fashionable. We've since you know sort of grown into that the industry sort of rallied around us and so yeah, happy to answer any questions about the platform. I think a lot of today is really how do we address that challenge of variety of devices and types of information and KPIs that you might get to understand its current, you know, health risk or availability, the velocity at which that comes and didn't simply the volume, humans can't keep up with it anymore. And that's where AI ops becomes really critical. Leveraging algorithms and machine learning to help us feeble old humans make sense of it all, and hopefully automate some things as well. So

Greg Irwin 5:20

perfect. Let's go into some stories. And I want to, again, encourage people take a moment, and do me a favor, just drop in one thing you'd like to hear about on the session, either from ScienceLogic or from other other agencies. You know, what's, what's one thing? Maybe it's Hey, is anyone using AI ops? You know, hey, when? How hard is it to implement AI ops? What's the ROI of AI ops? Those are? I'm going to start with even a simpler question for me that I'll leave, maybe you can help us with it. Just what is AI ops?

Lee Koepping 5:57

Hmm. That's the best question. Because if there are 10 of us on this call, we could easily come up with 12 definitions.

Greg Irwin 6:08

So let's go. Let's go. Let's talk right, like you're involved in the deployments. So yeah, I'll

Lee Koepping 6:13

give you a give you my take on it. Ai ops is nothing new. It has been around forever, it's a new name on an on. On a lot of, I won't call them old technologies, but old techniques. And so what it really is, is it's it is using artificial intelligence and algorithms, to speed operations. And largely the speed comes from automation. Now, sometimes that's in the form of just visibility and tying things together for humans. In other cases, it is doing things for humans, whether that be maintaining a CMDB, which is really hard thing to do, and ephemeral environment, or correlating multiple streams of data, a problem in the IQ realm manifests itself 12 different ways. If you're monitoring all those different technologies, you've got storage, you've got your compute, you've got your application, you've got your database, each one of those things can flag an event and error condition. How do I know what the root causes? What caused what what happened in what order and AI ops is really making sense and bringing that together and figuring out that answer in wire speed. And in many cases, being able to be predictive. And then either just getting it sooner, that's foundational AI Ops, or actually resolving if it's something that repeats of restarting the service 100% of the time fixes this before it's a critical outage, then restart the service documented, update the ticket, close it all without human intervention. those are those are real, tangible forms of AI ops.

Greg Irwin 7:43

Let's go to a an agency story. Do share, share the name if you can don't share it, if you can't, what's one organ? What's one agency that has deployed? Maybe it's event correlation? Maybe it's, you know, automatic resolution, one aspect of it and what's what's the up?

Lee Koepping 8:06

Yeah, let me one that should be recognizable. Again, I spend a lot of my time in the federal space. I'm working very closely with the Veterans Affairs. They are a massive, they were a company, they would be fortune five, I think, just in terms of sheer size, hundreds of 1000s of devices all across the country worse. Everybody knows their primary mission, which is hospitals, and taking care of veterans. But veterans also have other benefits. There's cemeteries, there's better businesses, there's all these other aspects. They adopted cloud some time ago and stood up a cloud group. This was a group that had complete Greenfield, they usually run I think it's somewhere around eight, they would settle in on his major data centers throughout the country spidering out to hundreds of hospitals, and some localized data centers. And so everything they have was distributed everywhere. And they were looking for efficiencies, they were looking to get out of the real estate business, they were looking to get workloads migrated to the cloud. And in doing that, they were presented with a pretty big challenge of we don't we've got every tool known to man. As many government agencies do, none of them were exceptionally well it cloud or multi cloud. And this is an environment that is taking advantage of their buying power across three major commercial cloud providers. And so that was our initial entry is Hey, I just need something to monitor three clouds. In getting in the door, we realized quickly that they also needed automation. COVID had hit right at the big, literally right at the beginning of our project. And so all of a sudden what they had is a slow, methodical plan to migrate workflow loads into the cloud changed overnight. And there's a really good article by a gentleman named Dave cat No. So who's in charge of that cloud group? I think it was entitled sprint crawl, walk run, because they were forced to convert nearly the entire agency over to telework overnight at the beginning of the pandemic, and yeah, that wasn't forecasted anytime soon. Yeah, so you had these massive workloads moving in. And that wasn't just general cloud infrastructure. These were very specific apps, very interdependent on a lot of things, some of it still on prem, some, and all of a sudden that telemetry was was overwhelming. And they had to get sense of that, they had to look at the world through the lens of a business service, not device up device down, there was just no way their team newly formed, was going to be able to keep up with that, with the one workload much less the stuff they had originally planned. And so it's really good. And it's continuing to evolve. So there are a lot of success stories sort of coming about every week. Some of them very, very small, but in fact, impactful to somebody, some of them very large, and really affecting the agency at large. And getting them to contemplate maybe changing the way they do business and look at things and introduce automation in the traditional policies and procedures in the old on prem world, as they call it. So yeah, that was a great story, a great experience to not just support a noble mission, but to really bring new technologies to a customer that was eagerly awaiting the new technologies, because they have no other choice is just too much too soon, too fast. And so it was a perfect marriage of need and capability. And just the whole esprit de corps of everybody working together to get something done.

Greg Irwin 11:43

So late, we had mark and Ryan, on line here from the VA. Got and Mark, thank you for, for your question. If it's okay, I'm gonna ask you just to elaborate a little bit because I take I take the language says, You have a lot of AI ops across various systems and applications, no central plan, and it shows the and it shows is what I'm interested in. So Mark, if you wouldn't mind if you're able to, what do you mean by and it shows and then the next part, I'll turn to Lee on it, which is how important is the centralized plan? How long does it take to integrate AI ops systems? Martin, are you able to share a little more color?

Mark 12:27

I mean, there's there's a lot of different things that have been bought, right, that have AI ops integrated in it. So they've been, you know, base, to be blunt, probably. They're siloed. Right. So there's a lot of a lot of capabilities within different systems. But they're not integrated in any way that I could find. And I don't, I haven't seen a plan yet. That shows how, you know, we're going to use that capability in each one to over to enhance each other's systems and enhance the overall picture what we were trying to get

Greg Irwin 13:05

with mitre telemetry from an APM, you get telemetry from, you know, logs, and and another application on one cloud one in one in Google one in AWS. Now, you have to start bringing those bringing those together.

Mark 13:22

Yeah. And then we have bots doing things we have doing things. And then we're starting to show the other side of what apps is supposed to bring. But it's not. It's not planned for and it's just each group is doing their own thing.

Greg Irwin 13:38

Yes. All right. Lee, let's talk about now that you start getting one app in fine, you start bringing in more and more points of data that are important, is it. And how hard is this a point where complexity grows with the number of systems that could

Lee Koepping 13:56

add it? Certainly complexity grows, right? That's that's simple physics. I will say a centralized plan is a bit of a utopian vision for need for an agency the size of yet there are a lot of fiefdoms, there are a lot of missions. There are a lot of silos, that's a fact. I don't think that's a barrier. I think you you you eat the elephant one bite at a time in those situations. Whereas I think other agencies can and do have the ability, luxury, I call it to do a top down. And so yeah, it's cultural. It's no different than cloud itself. That was a cultural revolution. People had to get used to not being able to walk down the hall and hug their their local server. It's all the sudden someplace, they can't see it. They can't touch it, they can't follow the wire. Regardless of any potential cost benefits. If there were any real or perceived. Or efficiencies or anything else that was a cultural revolution, it's still going on. And I don't think AI ops is any different. To be honest, I think it is a cultural revolution, there is no AI ops thing, there's no skew, there's no product, there's no license key that enables it. It's really a journey of automation, then it is crawl, walk, run, and it does take time to achieve and you look for your small victories, and you take things in bite sized chunks, and that builds momentum, it's no different than any other major shift in thinking in our industry. Again, these aren't new technologies is really just new concepts of, you know, I have to relinquish some amount of control that humans have, and let the machines do the work, because I've just got too much otherwise. Right?

Greg Irwin 15:45

Do you see is there that general push year over year of coming into a central central hub

Lee Koepping 15:55

May as you know, varies wildly. It varies vary wildly. And that all depends on how an agency or organization or company, you know, does things as mundane as their budget, if their budget is carved up into little pieces will serve as your culture so as your operations mentality. If it isn't, then maybe it isn't. And there's certainly pluses to that. But there's some minuses as well. Very hard to be agile when you're trying to think and move as a single group. And, you know, I think the world of it is a balance of both. Yes, centralized control is great. centralized visibility is fantastic. Not always achievable. And if you push it too hard, top down, you lose a lot of efficiencies of grassroots agility. And then the real answer someone open.

Greg Irwin 16:47

Sorry, don't we're gonna are gonna put you on mute here for a second.

Lee Koepping 16:52

Here we go. So yeah, I'd love to hear from the group. If they see those same challenges, we got a representation of SMS agencies, big agencies, small agencies wide, state level, federal level, there is no one size fits all I gave, there's one product that did it, if there was one way to do it, we'd all be working there. So it is very much trying to figure it out the hard way.

Greg Irwin 17:19

You know, look for everyone on here. This This session is not to take you through speeds and feeds and features of, you know, of ScienceLogic, this or different AI ops tools. What we're gonna do here is go through different people's journeys, where you are, what's the problem you're trying to solve? Maybe you've taken some of those first steps, and what's the feedback? and collectively we should be able to learn a thing or two across the group? So let's let's start that. I'm going to ask Tamra Tamra geils. Tamar Giles. I'm sorry. I may be mispronouncing it. But, Tamra, can I invite you to get us started and maybe just share a little bit of what's happening in Alabama? Emergency Management?Nope. Tamra, are you on with us? I'm sorry, I was taking another call. Okay, are you able to share a little story with us on what you're doing in terms of event management and event resolution? Within Alabama emergency management?

Tamra 18:32

Um, as far as basically our bow base, how to

Greg Irwin 18:39

properly it's it's simple. It's what are your priorities? Really simple. It may be that you're just dealing with tickets, it might be that you have it all automated. And you already are you already have, you know, good event. coordination. What are your priorities in terms of managing event resolution? And it's just like you said, I'm responding with our tickets, anything coming up? phone calls, whatever is needed? Pretty much. So we'll get there. Go ahead. What do you want to learn from this call? What? We're going to cover one thing that would be useful for you, what would you like us to be talking about

Tamra 19:29

if I'm going to be honest, um, I'm representing stepping in for jail. So I really was kind of surprised on what I needed to know. I'm just kind of here just listening.

Greg Irwin 19:46

You're welcome. It's all about learning that the way we do it is share each other's stories and priorities. It doesn't need to be sophisticated. Yeah, I'm happy for you to be on and to listen. We'll keep going around that. Thanks, Cameron. Okay, let's go to your right on my on my grid here over to Billy Frederick a. Billy, nice to speak with you again. Are you on the line with us? deli going once going twice you with us now All right, let's try. Let's try Dalton black. Are you on the line with us? Yes, I am. Now, good afternoon. Thanks. Good afternoon. Thanks. Thanks for dialing in. What's your what's your what's your exposure here to it? This could be notification. It could be ticketing, it could be what what's your interest here in this topic?

Dalton 20:38

So I'm, I'm a telecom specialist, our telecom manager for BLM Nevada. So as we start going into filled comm with modernization, these fields are going to start crossing over. So my, my interest is, as we modernize how we move forward and implement a lot of this stuff into our telecom fulfilled communications needs.

Greg Irwin 20:59

Got it? How do you handle tickets as they come up or issues as they come up today?

Dalton 21:06

So right now, the BLM uses remedy for IT services, it didn't doesn't really meet our needs for telecom for the radio side. So we've created our own system in Nevada. It's a SharePoint site as the department is I guess, trying to go to the same one system ticketing system for the whole agency we're getting involved in in making that it solution work for radio. So we're just now getting involved on what what that needs to look like to meet our needs as well.

Greg Irwin 21:43

Don't What do you want to hear hear as we as we go across this group,

Dalton 21:46

but I'm just I'm just I'm just here to learn whatever you guys pass, there's not one one area that I want to focus on or whatever I just trying to see what what's coming our way that can help us modernize portville communications.

Greg Irwin 22:00

You guys, we're happy to do it. Thanks. Thanks so much for dialing in. Appreciate it. Don't. Let's try it. Let's try one more free to free to Johnson over at Raytheon, freedom. Good afternoon. Nice to meet you. little intro and tell us a little bit about some of your focus on this area.

Frida 22:20

Well, good afternoon. So I am at Raytheon, and I manage cloud, any cloud management portal, trying to have a cloud enablement portal that shows the maturity of the organization and where we need to go what the gaps are, in terms of different areas, not just cloud management, but also in services. And I was very curious to know what it I mean, ai ops comes in with IT services. So wanted to just jump in and listen and see what's out there. Since I'm just coming out of the weeds. I'm trying to understand what's going on out in the world.

Greg Irwin 23:05

You gotta you gotta are afraid to thank you. So Lee, we've gone around three or four people, they're all asking the same thing, which is what is it? I mean, I think that there's, we're at the between sprint, crawl, walk, run, I think we're at the word crawl.

Lee Koepping 23:22

And most people are that's a fair. Fair judgement. I don't I don't think different than a decade ago, everybody has tools. They're monitoring, they see things. But candidly, most everybody's still reactive. I've heard that from folks today. We, we view the world through tickets. If something's you know, right, it's quiet. If something's wrong, there's a ticket. And so that's, that's indicative of there is a part of the organization that is monitoring, they have these tools. Maybe they have all technologies represented, maybe they don't, but some sort of eyes on the performance, the availability, the capacity of it. I would venture a guess for most people that still very much device focused, you've got an event console with lots of events, lots of red, lots of green, lots of yellow, but not necessarily a priority to that. So that's one issue. The other issue is the reaction of tickets and the time it takes somebody to recognize that start working on an issue, go to the right tool to trust but verify and validate that it's really an issue, then go find the right resource to effectively come up with a resolution. Sometimes that's handed off, sometimes it's coordinated. That's a very manual process as well. And then there's the shades of gray. I like to call it, it's not necessarily broke, but it's degrading. Maybe it affects some people, maybe it doesn't affect anybody yet, but it's starting to topple over. That's that proactive utopia that we've all wanted to get to in this industry. Where you know, hindsight is always 2020 Oh, great. If we saw it happening, we just didn't know what it was. Until it happened. I think AI ops pulls the pulls those elements together. And first and foremost consolidates data in a data lake kind of concept. I think that's foundational to AI Ops, if you, you've got to have your data in one place, certainly your event information, but I would argue the telemetry that that is behind those events, the good performing the shady performing the bad performing, all of that needs to be in the same pool. Because everything is interrelated today. You know, one cloud device is really the result of, you know, 14, well, not maybe 14, but but about a dozen different subcomponents, a security group, a storage, blob, a service, a region, all of these things, have to have to be working for that one compute instance, that we all recognize running the operating system to be on top of it. And then you've got database on top of that, or web server or something supporting the application. And then you've got, you know, checking the application itself. Servers look up all the time, even when something awesome is not working. And so, you know, there's lots of telemetry points to bring in and correlate, and if any, one of them is down, how do I know how that affects the overall health risk or availability of the application. So, so getting past this device, by device view, and getting a business service view, I think is very foundational. Then taking those things which are ticket worthy, and that's not every event, I think that's a mistake a lot of agencies make is, you know, everything's a ticket, Well, not really, first do some correlation. These three, four or five things can be correlated through traditionally through a human but but nowadays through through AI and ml, and some some correlation to say that this is the actionable event. This is the thing we need to tick it open on this is the thing that needs to get dumped into a team's or a Slack channel or hit somebody text pager or whatever that is.

And all those other things happened, that you don't hide them, they don't go away. They happen, but they're merely supporting information to the root cause. Right. And so I think that's the second big pillar. And then the other is being able to visualize performance metrics to be able to do some degree of forecasting to really leverage ml to say, based on my last 60 days, what is my next six months going to look like? Am I going to Am I going to reach capacity in a month in two months. And that's important for two reasons. One, obviously, to see the cliff coming. But in today's world, you know, you, even in a cloud environment, you still have budget impact, I can't just go turn the switch. And that infinite bandwidth and infinite storage, I may need to plan for those things. And I may need to prioritize how I plan for those things. And also in a cloud environment, I may need optimize, if I'm under utilizing these resources. Why why why don't I turn them down a little bit, and save those those pennies and nickels, when I'm really going to need them to increase capacity somewhere else. insight to those things, the automation of the whole ticket to incident to to resolution, as well as just viewing the world through the aggregate not, you know, viewing the forest, not necessarily the trees anymore. Those are all three big pillars, I think of AI ops.

Greg Irwin 28:33

And, Lee, there are a couple of ways to understand the trend. I mean, the most obvious is, I'm drowning, I'm drowning, well dress us, we don't have a team that can cover it, or you know, our systems are down for applications, you're down longer than people can stand. We have one person who knows how to fix everything. And that one person is, you know, is is a major risk point for the organization. There are all these things that become insanely obvious about, hey, there's something that needs to be done here. Absent that, as you said, it's not broken, maybe I'm fine. It's quiet. So I'm good is to look and see what other organizations are doing. So let's, let's start there. Just in terms of what you're doing across your customer base. What do you see for demand and deployment of these types of solutions? And I understand it's probably, you know, with the foundation of an ITSM and understanding your assets and an inventory in your CMDB just an inventory of your assets and applications and licenses, what you have and how they're correlated. Before you can get to, you know, using bots to help solve problems, but what so what do you see in terms of just the bulk of pace of organizations and agencies is deploying these types of systems.

Lee Koepping 30:04

I mean, that's the whole CMDB rev revolution. Again, ServiceNow is a big driver of that maybe it's remedy, any traditional ITSM tool has some form of CMDB. That has been a strong demand, although we would consider ourselves very much a monitoring platform that has all these other benefits. We're often employed specifically to solve that problem. First, the CMDB.

Greg Irwin 30:32

Problem, the I'm sorry,

Lee Koepping 30:34

the problem is the accuracy. Yeah, the problem is the accuracy of a CMDB. People have spent a lot of money on on products and, and contractors and consulting and everything else, they recognize they need a consolidated repository, not just an asset repository, but a repository of service, linkage of assets of, you know, other tangential things like like the finances behind those, or the history behind those, from a stability or support perspective. It's critical, everybody wants that. And everybody's sort of touched on that and in different forms or fashions. The problem that we have today is, again, it becomes a volume and velocity issue. In that, to get that there's a lot of integration points. And that doesn't always go well. And the other thing, quite candidly, is a lot of these manufacturers have these great solutions for ITSM. They'll have their own tools, but they're not designed for the fidelity of a lot of today's workload, they will do well with things like containers, or ephemeral environments, whether it be cloud or localized hypervisors, where things are up down on purpose, with a frequency that, you know, in a container world, something can only be up, maybe it's only up for five minutes, and it's gone again, well, how do I get that in the CMDB, so that during the five minutes when it was up, if it experienced an issue, and I open up a ticket that actually ties to that thing? Well, that's never gonna happen. If you're feeding it once a day. Those are some of the challenges that that our customers have come to us and say, I get what you guys do over here, and I can see it, how do I get that over here. So that integration has actually become pretty paramount. It's also the hardest thing to do, because moving data is easy. We have data, we can move it over here, Table to Table, it's not the technology, again, it's somewhat comes down to culture. Well, I need it to be fed by these 15 other tools, or I've already got a vision of what my CMDB should be. So you've got to sort of change how you're sending me the data. So I can get it in a way I think is, you know, the way I want it versus the way that the operations team was viewing the dashboard and sees it a different way. may want it. And so then you get into these mapping exercises. And again, it's less about the technology and a little bit more about the culture and the politics in some cases. That's hard. That's hard. There's no skew that fixes that. That's just a lot of meetings and a lot of Whiteboard sessions. And, you know, to be candid a lot armwrestling

Greg Irwin 33:15

Is it better is I had a boss who used to tell me that the other side of the desert might be fantastic for crossing the desert. Oh, Kelly. So the question is, is it worth crossing the desert port?

Lee Koepping 33:29

It is we've we've had organizations that have done it. And it wasn't easy for any of them. Some some easier than others. But when they get to the other side, they Yes. Now when they have to run a report, whether it's on what we would consider traditional IT data, you know, where's all my serial numbers of my contracts? And what's expiring that's pretty common. But it may be what's depreciating this year, or it may be you know, what, I need to put this thing in maintenance. What else could it affect? If it's a shared services device? Those are all things you would expect to get from a CMDB. It's there and it's accurate. Those are push button reports, you know that any of the ITSM tools probably have out of the box or easily created. So yeah, if that if that fidelity of data is there, if that accuracy is there, those things become really easy. And these are things that they used to spin up whole Tiger teams, you four people go off in a room, come up with this report that I owe the big or some other agency, and that becomes a three week effort. And now it's push button or it's just delivered on a quarterly basis via email. That's the other side of that desert.

Greg Irwin 34:38

Let's let's pause for a minute I want to try Chris Tongass. Chris you and I talk on box and ECM but I know you do a whole lot more than that at Attorney General. For DC first, are you on the line with us?

Chris 34:58

Yes, I am. Yeah,

Greg Irwin 34:59

I'm Hear it good. Good to speak with you again, Chris, thanks for joining. Can I ask you, what are you doing? Where are you? And what are you doing around ITSM, and then automating some of those ITSM processes.

Chris 35:13

So, so here's where we are. And then it may sound kind of basic, and maybe even kind of ridiculous. But we're backing into this. And the reason why we're backing into this is because we first started with, with building out a SIEM platform, once we built out that same platform, we implemented Splunk. And a couple of other things. At the same time, we were doing that the we're part of a larger organization, which is the DC government, the DC government was was proceeding down a different path where they were using ServiceNow. So we decided that we were going to stick with what we know, take the take the very basic skill that we've developed from implementing our steam platform. And it does kind of some of the things that were what we was talking about, it'll spit out messages, and then certain kinds of messages will be sent to our private Slack channel. So we know what to do, we're going to do the same thing with our server and cloud infrastructure. So we're kind of backing into it. And we're rolling our own solution. And we're doing it ourselves, just because that's kind of the way we are here. So it's very, it's very interesting to hear at least perspective and hear what other people are doing. But we're backing into it based on something else that maybe not maybe not might be considered a Ops, maybe it would be considered sec DevOps. But that's how we got our start. Our our interest is making sure that some of our 24, seven applications that are used for processing arrests, and doing other really critical public safety things work the way they're supposed to work, there can be no outages. Automating This is really important, too. So I hope that doesn't sound ridiculous, but that's how we're backing into.

Greg Irwin 37:14

Doesn't sound ridiculous at all. But in terms of the monitoring, is that application specific monitoring and alerting? Or is there a third party alerting, that you're layering on top to make sure that you know that, you know, you're not getting memory overload or just, you know, the the systems that the applications are performing and the infrastructure underlying it is there,

Chris 37:36

like applications right now, it's application monitoring. But again, we're at the very beginning of this. And we were a relatively small shop, and we have an infrastructure footprint. That's not huge, but it's a blended infrastructure. But that encompasses a couple of cloud platforms and other things. So you know, we're backing into this right now, application level, that probably will grow to be other things. We are toying with the idea of a third party monitoring service, but we're not quite there yet. We probably will get there if that makes sense.

Greg Irwin 38:13

Perfect. Great. Chris, what's one thing you'd like to hear in the remainder in the remaining time we've got as I go with Lee or others on the line here? Is there a salary or an area of focus? I'll tell you,

Chris 38:26

we don't even know what we're doing kind of flies in the face of the way I normally run things here on my normal strategies. I'm really interested, if somebody stopped us through, I was looking at a package solution that might do everything or do most things, because we haven't found that yet. But again, we're very, very early and very

Frida 38:51

quick question. Sure, please. And, Chris, when you say you sending message notifications on Slack, or all your users on the same tenant, oh, no.

Chris 39:03

That's a private stock shells and stuff.

Frida 39:08

Yeah, I know, we kind of have a degree of slack and teams and a whole lot of things. So it's, we're also dealing with that sort of issue. But thank you.

Lee Koepping 39:18

That's its own own world, we have found out through the use of our platform and what we're asked to do and implementation. It's not the traditional Well, I need just send an email to this alias. That's fine for some people. But you talk to the security team, they're using Slack, you talk to the server team, they're probably using teams, if they're windows, folks. They want these things dumped in their native format. They don't want another email. Yeah. And and the good news is you can do all of those things, right. So first and foremost, I need to deliver information to you that I think is critical or that you said you wanted. How would you like that? Don't Don't force them to go dig it out of their inbox because the ones See that as a chore never gonna find it in the first place, give it to them in their native format in their native language. And that in and of itself, that little teeny tiny thing, make sure that the message is received, it's acted on it's, it's, it's convenient for them. That in and of itself has a measurable effect on the mttr. You can call it AI ops as manual as that sounds, that's part of that journey.

Frida 40:26

But how do you sort of also handle when, if the messages are not relevant anymore, and they just can get bombarded? And very soon, they just sort of ignore all the simplifications that Well,

Lee Koepping 40:38

I mean, for your power users, somebody within that group, I think should have the rights or the capabilities to come into the sending system and say, you know, what, just disable that maybe they don't know how to create it, maybe they don't know how to, you know, coded or do they, whatever is behind the the message, but what they can very well do is, is one of two things. One is disable the notification, to give feedback to that team that's actually monitoring. So you know, what, I want you to turn the sensitivity down a bit. I still want the notification, but I don't want to see it. 90% I want to say that, you know, 95%? Yeah. Again, it's it's about tailoring that data, the cleaner the data, the less frequent the data, the more meaningful the data, the a the better the chances are, it's acted on, received and understood.

Frida 41:31

Sounds like the bespoke notification system.

Lee Koepping 41:37

Yeah, I mean, that's there's all industry behind that. But, I mean, that's one of the things that that we've, I wouldn't say struggle with, but it's a topic of conversation within the development of our own product is have to be really open because there are so many teams, if you really want to be at the center of an organization, from a support perspective, you have to deal with security teams and server teams and network teams, and management, and everybody wants to consume their information in a different way. And so you, you there is no one size fits all is not not a not a good approach, if you want them to collaborate, and if you want them to receive the right information at the right time at the right level. So yeah, you have to you have to consider all of those things.

Greg Irwin 42:22

I, I always your grown up in it, I always thought of three things. On spec, on time on cost. Can you give us some ballpark here of what we're talking about? And obviously, if you're gonna do a full ITSM implementation, and the full CMDB, and telemetry and application monitoring and infrastructure monitoring and AI on top of all of that, you know, that's, that is That's it? That's gigantic. Right? But can you break it down a little bit in terms of maybe some pieces and what these projects look like, in reality, as you think to some of the agencies and departments that you're supporting? What is a project look like these days,

Lee Koepping 43:10

scope mean of time biased and jaded perspective coming from our tool and its implementation to effectiveness. But yeah, that's something we measure, and typically, weeks, two to three weeks is a normal install. To get to a baseline, I've discovered everything, I'm getting events, I've got some simple notifications. And the reason that sounds really short is because a lot of that for us is the enablement of the organization to then go in and tailor over time, the different kinds of notifications, right, we've hopefully made it pretty easy. So they can do that they don't require our professional services. So we're really focused with getting a solid architecture, resilient, secure architecture in place, making sure we've discovered the estate, making sure the things that are, you know, focused on the collection, and the presentation of that are there, the fine tuning of it, most organizations will do that on their own at their own pace over time. Now, there are certainly many longer instances of that in the federal government, because it takes time to get a badge takes time to access this. There's a lot of fiefdoms, you have to coordinate. There's an extreme view of that, for the agency I brought up earlier, which is you know, that they'll never be an end to that. We're dealing with hundreds of 1000s of devices and we're just in the cloud now just starting to reach out to other entities. There are multiple hundreds of business services and each business services construct may have multiple sub components representing you know, 1000s of devices. And you know, what one team wants to see is very different than what another team wants to see. That may never end that you know that that is rolling out, it's up, it's running the estate has been discovered, massaging that creating that services view that's unique to the consumer. The whole CMDB thing is, is magnified to intensities I can't even talk about on this call. When you get to an organization that

Greg Irwin 45:16

big, doesn't matter, at least from your solution, if your grounding is remedy, or ServiceNow, or Atlassian,

Lee Koepping 45:27

they all have their efficiencies. Now we integrate all three and, and more. Each one has its own strong points, I grew up in the ITSM space, that's kind of where I started my career and still pretty passionate about it. And I love things that are sort of requests management, workflow oriented, those are also the hardest things to work on monitoring is relatively easy. You've got the data or you don't, it's kind of black and white. And there's all kinds of cool things you can do with it. But the acquisition storage presentation of that is pretty straightforward. In an ITSM system, when you get into processes and procedures, anybody that's using remedy will know full well, you can develop yourself into a corner really quickly, with a flexible tool becomes You know, it gets flexible options, which means I can really tailor it. And then the one guy that did all that moves on and I'm kind of stuck in a box. And other companies have taken a different approach. ServiceNow is much more sort of locked down as a SaaS offering saying well use what we have, and you can sort of tweak around the edges. But they don't specialize that I think as much of the flexibility or the customization. There's certain advantages to that. And there's certain disadvantages. The mechanisms by which you move the data to them is pretty similar for us, we do everything via API. Yeah, it's a similar outcome, I want to take this event and make it an actionable incident, or I want to take this set of both device and asset data and populate the CMDB. The mechanics are largely the same across those platforms. So kind of we could be Switzerland in that regard. And we don't have favorites. But But yeah, it's it's always its own set of discussions and workflow. In fact, that integration is often longer than the implementation of our core toolset, to do all the monitoring and everything else, because again, it's a little bit more black and white. To see those results.

Greg Irwin 47:23

Lee, we set up some questions in advance. The other one, which you and I had talked about was just the incredibly tight labor market, and finding the right people with the right skills. So you know, it's, I'll ask the question here, in deploying these AI ops solutions, whether it's ScienceLogic or an alternative, what's the practice in terms of its ability to really free up capacity and team that couldn't get through their backlog? You know, actually taking some of the break, fix work off the table and freeing people up for, you know, I don't want to say more value added times, but for more interesting tense. Yeah,

Lee Koepping 48:09

it's interesting is it's a little bit of a slope effect, maybe a bit of a bell curve on the front end. There's a lot of participation to, to do something like that, when you're especially when you're talking about dozens of technologies, it's great that we've got a single system that does that. But somebody has to go enable that credential, enable, you know, make sure it's secure and do all those things on each one of those technologies. That's a lot of work upfront. And so that actually makes life a lot harder coming in the door. Now, things have been made easier over recent years, because everybody's got a tool. This isn't like 10 years ago, where I don't have a tool that does this, I got to start from scratch, and I got to enable SNMP or configure PowerShell, or, you know, whatever the case may be. People have already done that they've already had, and still have lots of different tools. They're now interested in consolidating those to some extent. And the, the fact that we have that's been one of our feathers and our cap over the last almost two decades now is being able to cross all those technologies, because we don't rely on any one protocol and generations of those technologies. Believe it or not, they're still mainframes out there. And they need to be monitored. They're critical. But there are also satellite terminals and public cloud and then those are monitored, totally different way. So yeah, I think the implementation of this there's a really get it, maybe do a lot of enablement and make sure security is happening everything on the front end. And then I start to recognize the benefits at least I got my data in one place. And and now I don't have to do swivel chair. Monitor your management. In many cases, we find that people still want to isolate those views. That's an advantage of being multi tenant. Well, we're okay putting everything in one system but I don't want the server guys to see what the network guys have and vice versa because they're just not ready yet, right? There's a little bit of territory war going on there. And that's fine, too. You can configure a system to do that. But yeah, I think it's a lot of work on the front end. And then there's a point where you really quickly realize the benefits. And then you sort of call it a deer in the headlights, you're like, Okay, now that I can see the data, wow, have I got a lot of data and you get lost in the data. And you start seeing things that aren't really problems just because you've never had visibility at that level before. And then you got to kind of pull yourself out of that and say, You know what, I still have the same problems, I still get the same calls. Maybe I can automate those, maybe I can make it faster. But why don't I start instituting the resolution for those? I think that's, that's that final frontier, if I'm doing the same if I'm using an ITSM system to track the level of support that I'm providing, and I find out I'm doing I'm running that same script, or restarting that same service on this type of device, every time to solve this combination of events, why can I just automate that, and maybe you make it push button at first, and you prove that it works. And then you just eventually automated to the point of humans not even involved in that. There are a lot of people that jump to the end of that chapter, I'm not one of them, I want to, I want to build a history that Yeah, have the last three times I've done this, it worked. Okay, I'm willing to take a shot at making that a push button thing. So now when I get this event on this device, allow any operator, the ability to push button, I don't need to give them the credentials, I don't need to train them on, what it does is push that button. And if it resolves itself, fantastic. If it doesn't, you know, go to go to tier three. If after doing that three or four times, it works, well just just automate it, just flip that switch and make it something that runs immediately when I get this right combination of event device scenario and go ahead and document it. I want to inflate my ticket statistics, I want to justify my staffs existence, but they didn't actually have to touch it.

Greg Irwin 52:01

Right. So it's okay to interact like nobody bats 100 or bats 1000. You look across the people who've made this journey. What's the likelihood of success? Maybe not within three months? Because there's process change. There's a learning curve,

Lee Koepping 52:19

there's it depends on what they define as success that I think is a big a lot of people just want tomorrow to be a little easier than today. However high or low that bar is proportionately other people have. They want robots running the world. Right? Well, let's

Greg Irwin 52:37

start with the lowest basis of like, Hey, you know, I, I have three people who are on the verge of quitting because they couldn't handle it. Yeah. They just couldn't take another outage of that server. To where are we now?

Lee Koepping 52:51

I think getting getting rid of the war room or talent mentality, at least in the space I focus on. That is the biggest benefit. Right? Any outage would generate a bridge eight people jump on the bridge. Each person proves their innocence until there's two people left and they'll fight it. It's Yeah, it's like Thunderdome, right. If you can get rid of those in a relatively short period of time, I'm not talking weeks to months to culturally back out of those things. To see the situation. It's black and white. You know, where the problem lies. Just have the conversation with one person instead of the seven or eight people that has such a dramatic effect on on just general morale. Right. It's still somebody's problem. Somebody's still got to fix it. problems will happen. Somebody weekend may have gotten shot dealing with that. But he didn't ruin eight other people's weekends. And that has a cumulative effect. I think positive effect on morale.

Greg Irwin 53:49

Good stuff. Let's Let's invite one more. Vince, Vince Egan, Vince, I don't know if you're in a spot where you can talk. But you've been even on the line here for the call. Do you have any comments or questions here for for Lee or the others on the call?

Vince 54:07

Oh, hi. Yeah, my name is Vince Egan. I work for housing and urban development. I'm not really involved in this aspect of it I am involved in we make changes to the FHA systems. And I'm involved in moving, migrating the data for our changes, from tests to from development to test, to user acceptance tests to third party tests and on to production. But we do have a ticketing system called service desk that we use. And those tickets are tried to be resolved with our tier two group first and then if They're not able to be resolved the they create a CSR, which means that there's some code that needs to change in the system, usually. And then that's where I get involved once a quarter we work with the management on selecting the CSRS, we're going to implement for our quarterly release. But another group panels all the, you know, the monitoring the tier one, tier two, and I'm in the tier three group. Mike, I appreciate all the things that everyone's been saying about the need for tools to track down problems, we have a problem now that that appears in production. But none of the other systems have it is kind of weird. It's very difficult to trace down, we're not getting anything in the air logs and stuff. So we're down to the point where this PeopleSoft system, so we may, may have to rebuild the application servers and web servers and such to it to get it right. But all the other systems that we did with the upgrade when fine except the production, and it just happens once in a while, but we haven't figured it out. And we don't know the root cause. So that's one of the things that, you know, a system like you're talking about AI Ops, would be great to be able to zoom in on zero in on these kinds of issues, and ultimately be able to fix them, you know, the system be able to fix them instead of people fixing them. But the first would be able to identify these kind of problems. But yeah, I can appreciate all the things everyone's saying the need for these kinds of systems is there.

Lee Koepping 56:53

There's a new frontier that's tangentially associated with AI ops. And that's called anomaly detection. It's something we've tackled as a, as a company in a technology. And we introduced it a couple versions ago. And it's it's maturing, we're, rightly or wrongly, a company built on engineers. So we tend to roll things out pretty raw, and get customer feedback to shape it. And one of those things is anomaly detection, because I think what we've found is black and white thresholds is great, you still need them, right? That's typically always going to happen, right? It works or it doesn't, that's kind of binary, where you cross the threshold and went back below, but the anomalies are the weird behavior. So if you can imagine a CPU issue, and when it crosses 90%, you get an alert, right? Everybody's got a tool that does that ours is no different. And you start to look for trends. And just that exception, data often did go over how many times what was going on when it happened. We do some things with have done some things for many, many years with deviation from norm. This, this went above it, but it also did it last Tuesday, and it does it every Tuesday. So now that's okay. But if it happens on a Wednesday, that's an alert. Getting into true behavioral correlation means that same statistic could flatline at 65%, not below a threshold, not above a threshold in the middle of the day. Nobody's going to catch that nobody's looking for it. No system in the world is set up to see that. But if you're analyzing that data, and you're using ml, it's a, the way that we do it is it's constantly expecting an x value. And if it's not that it flags is an anomaly. It's not an event, it gets treated differently. You start looking for those anomalies, and you start stringing together a dozen anomalies in the middle of the day on a Wednesday. That could be that in combination with what then happens on Thursday, starts to find those whack amole issues. And I hear what you're saying, Vince, and I've looked, I've been in the trenches supporting this stuff as well. There is nothing worse than the whack a mole problem. I can't see it coming. I don't know when it's going to hit I just know it will. And I can't find the root cause because by the time I see it, it's black and white, and I never saw the shade of gray that led to it. Even though I'm looking at you know, hundreds of pieces of telemetry. I didn't look for an abnormality, I was only looking for events. So that's a that's an area we've gotten into in the last couple of versions and are continuing to mature. It's really exciting stuff. But I guess the moral of that story is help us on the way from an industry perspective to start looking for those whack amole issues that don't present themselves as cleanly as traditional uptown

Greg Irwin 59:41

things. Well, Lee and then send everybody we're at the hour so we're gonna wrap this up a big A big thanks to you, les and the team at ScienceLogic for co hosting here, obviously, look, if it's if it's directly relevant for you wonderful We'd love to connect you with Lee and his team. If it's relevant for a colleague, we'd very much appreciate those. Those intros to continue to get the word about word out about what ScienceLogic is doing. In the meantime, Blake, great, great session. Thank you so much for taking some time and speaking the opportunity, learning a little bit. Right. Alright. So with that, everyone, let's wrap it up. Thank you all and I look forward to speaking with everybody on a future session. Thanks, everybody. Bye bye. Thank you.

What is BWG Connect?

BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution. BWG has built an exclusive network of 125,000+ senior professionals and hosts over 2,000 virtual and in-person networking events on an annual basis.

E: info@bwgconnect.com T: (908) 679-8946

A: 25 Commerce Drive, Cranford, NJ 07016

|

Key Discussion Takeaways

Here’s a glimpse of what you’ll learn:

Event Partners

ScienceLogic

Guest Speaker

Lee Koepping

Greg Irwin LinkedIn

Event Moderator

Lee Koepping

Greg Irwin LinkedIn

Request the Full Recording

Need help with something else?

Aaron Conant

Discussion Transcription

What is BWG Connect?

BWG Strategy

BWG Strategy

Account and Support

Account and Support

Disclaimers