Powering the Intelligent Enterprise with AIOps

Nov 2, 2021 3:00 PM4:00 PM EST

Request The Full Recording

Key Discussion Takeaways

Do you wish there was a quicker way to figure out the problem in your operational system? Or a way to predict potential issues and fix them before your operations stall?

AIOps can surf through data to find snags a thousand times faster than a human brain. Thanks to developed automation technology, AIOps could even predict how long your equipment can last. For example, the company that manufactures Covid-19 tests is developing machine learning algorithms to predict failure. If you can predict when parts will break down, you can switch from having emergency maintenance that stalls operations to scheduled maintenance that prevents breakdowns and keeps things up and running.

In this virtual event, Greg Irwin is joined by Pat Vogelsang, Vice President of Sales Engineering at ScienceLogic, to discuss how AIOps is powering the intelligent enterprise. Pat defines the benefits of AIOps, the feedback mechanisms that identify problems, and the services he recommends in conjunction with ScienceLogic.

Here’s a glimpse of what you’ll learn:

 

  • Pat Vogelsang defines AIOps and explains its benefits
  • What's the typical training time needed to start retrieving relevant datasets and algorithms?
  • Pat explains the types of data collected by AIOps
  • How AIOps can help predict and prevent system failures
  • Pointers for moving data to the cloud
  • Benefits of using ScienceLogic in coordination with ServiceNow
  • Pat describes the feedback mechanisms that identify if a problem was resolved
  • Current trends surrounding AIOps
  • Best tools to use in conjunction with ScienceLogic
Request The Full Recording

Event Partners

ScienceLogic

ScienceLogic is a leader in ITOM & AIOps, providing modern IT operations with actionable insights to predict & resolve problems faster in a digital, ephemeral world.

Guest Speaker

Greg Irwin LinkedIn

Co-Founder, Co-CEO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Pat Vogelsang

VP Sales Engineering at ScienceLogic

Pat Vogelsang is the Vice President of Sales Engineering at ScienceLogic. ScienceLogic’s team helps IT operations leaders deliver better business outcomes through data-driven automation. Pat manages the field engineering team, which in turn helps clients solve complex AIOps related problems. With the ScienceLogic team's assistance, operations teams and IT departments can smooth out their ITIL processes and reduce the time it takes to solve problems.

Pat holds a Bachelor of Science in Geology Earth Science from the University of Rochester and an MBA Certificate in Business/Commerce from the University of Pittsburgh. Previously, Pat was the SE Director of Southeast/Federal for Infoblox, the Vice President of Customer Service at Netcordia, and the Vice President of Customer Service at ECI Telecom.

Event Moderator

Greg Irwin LinkedIn

Co-Founder, Co-CEO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Pat Vogelsang

VP Sales Engineering at ScienceLogic

Pat Vogelsang is the Vice President of Sales Engineering at ScienceLogic. ScienceLogic’s team helps IT operations leaders deliver better business outcomes through data-driven automation. Pat manages the field engineering team, which in turn helps clients solve complex AIOps related problems. With the ScienceLogic team's assistance, operations teams and IT departments can smooth out their ITIL processes and reduce the time it takes to solve problems.

Pat holds a Bachelor of Science in Geology Earth Science from the University of Rochester and an MBA Certificate in Business/Commerce from the University of Pittsburgh. Previously, Pat was the SE Director of Southeast/Federal for Infoblox, the Vice President of Customer Service at Netcordia, and the Vice President of Customer Service at ECI Telecom.

Request the Full Recording

Please enter your information to request a copy of the post-event written summary or recording!

Need help with something else?

Tiffany Serbus-Gustaveson

Senior Digital Strategist at BWG Connect


BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution.

Senior Digital Strategist Tiffany Serbus-Gustaveson runs the group & connects with dozens of brand executives every week, always for free.


Schedule a free consultation call

Discussion Transcription

Greg Irwin 0:18

The way the format works is there's no strong preset presentation. I spent a lot of years in product management and product marketing, where we did lots of different events. And the ones that I loved the best were these interactive sessions, where basically we could share some stories, talk about, you know, where some of the hidden the hidden techniques, or the hidden challenges around different products or services or processes. And I always found them the most interesting. So that's what we're doing today. This is an open dialogue. And I want to encourage everyone, in fact, we have this chat window, which is like the best feature ever, because we can have a dialogue. And while we're doing it, anyone, anyone can drop a comment or a question internet chat. And what I want to add offer and ask is that everybody takes an opportunity to share your own comments or feedback into that chat. It might be that you, you know you have an answer to the question that someone has, or maybe you have your own question on top of what's being shared, it makes it a much a much more interesting forum. So warm up your fingers and get your keyboard ready and in drop your questions and comments. And there we go. Thank you, Pat. Video to me. Great to see you. I hope you're doing well. Folks, videos, great, but it's not necessary. So if you're able to, if you have nothing embarrassing on behind you, or you have a nice backdrop like Sumeet turn it on. If not, that's cool, too. You know, this is this is a fully voluntary exercise here. So we appreciate everyone taking the time. Again, my name is Greg, I've been moderating these sessions for for a long while. And, you know, I'm happy to introduce and invite Pat Vogelsang over at ScienceLogic. So Pat, it's time for you to give a little intro, just tell us who you are personally, and a little bit on who ScienceLogic is, please.

Pat Vogelsang 2:28

Yeah, this my Pat Vogelsang. I work at ScienceLogic, I manage our field engineering team. So our guys and gals out in the, in the field who help customers solve, hopefully, fun in complex and sometimes simple AIOps related problems. ScienceLogic is a fault performance management platform geared towards the AIOps market. We really try really hard to work with operations teams and IT departments to, to help them with their ITIL processes. And to speed things up, reduce the time it takes to find problems in your environments, or reduce the time it takes to find the fix, and to implement as much of the cool new technology that's out there today. And make it as easy as possible for everyone to use it to take full advantage. That's what we're that's what we're doing ScienceLogic. And we're helping both enterprises and service providers. So we look at both of those markets.

Greg Irwin 3:36

So what I think we have to define AIOps, because it, you know, it's one of these things that can be defined lots of different ways. So give us the headline, but then I'd like to ask you, maybe from a use case perspective, how is it being deployed to drive quicker resolution or better performance? But first, yeah, give us your Jonathan.

Pat Vogelsang 4:03

It's good. Someone someone's gonna jump in and say something ago bad

Greg Irwin 4:08

that this is this, this question is all yours. Follow me.

Pat Vogelsang 4:12

So AIOps is an interesting animal. And it seems when you get on the glorious internet, you're going to find that almost every monitoring vendor, like ScienceLogic is going to have their AIOps story. You know, when when AIOps was originally coined, it started off as being automated intelligent operations. And that's that was the intent of us vendors way back when we started down this road, I gotta write that down, auto query, automated, intelligent operation. So how do I like it? How do you how do you take all this knowledge that the operators have the human capital, and how do you start to automate all these tasks everybody's doing on a regular basis? And how, how do you make operations more seamless, and so you can do it more efficiently. And that was that was the goal as us in the vendor community set out with Gartner to define AIOps. But as it turns out, AI means artificial intelligence to the world. And so therefore, automated intelligent operations didn't stick very well, that was like there was not glue. There was like Teflon, it just slipped off the storyboard. So let's call it let's call it artificial intelligence. And then with artificial intelligence comes in the whole concept of machine learning. Yeah, but now it's how do we how do we start to apply machine learning techniques and kind of looking at big data problems to help solve it operational issues, and that's really where, where we're all heading AIOps, and we like to look at as a journey, it's a, it's a journey that we might never get to the end of. And we're planning to talk a lot today about all the, because I'm not a data scientist, by the way. So just keep that in mind. But data science is an interesting study. And in, it takes a lot of information to to bring machine learning to bear. So there's some angles that the vendors are taking. And in the angle that we chose a ScienceLogic machine learning round was just to look at performance data, and see if we can help identify anomalous behavior. So we're also used to thresholds and static thresholds. And when the server gets beyond some CPU or interface beyond some utilization, to go ahead and set alerts, pretty common. And oftentimes not very useful. It really gets depending on what what you're looking at. Example, our product itself is a is a back end back end SQL database will SQL as a as a memory hog. And so memory is always at 90 Plus close to 100%. utilization, because it should be because it's supposed to use it. So you know, so what do you look for? So? So we we take machine learning to look at it? Can we say something's anomalous? And the word of our product managers on the phone? He would say he calls it, he calls it the weirdness score? Is the data weird? Is this weird data? Is it normal data? And so we try to highlight the

data is is not normal? It doesn't mean that there's a problem. It just means that it's not normal something Yeah, something and then what we're trying to do? Yeah, let's jump, let's jump to a win. So I get it. It's not just a threshold, it's not just to trigger an alert. All right, how has somebody used that weirdness pattern that isn't just easily identifiable in a way that solved the meaningful problem? Well, so we've had, we have a large a large integrator in there, it it was just a large GSI. And this was not for their own their GSI business, but this was in their IT department. And what they found is that it was difficult to identify and tie problems back to the services. So one of the things that they they they did with the ScienceLogic solution is in an A title back end of the anomalies, but the ability to, to identify services within a service is a deliverable. It's an application, it can be any group of IoT devices, characteristics that define something that meaningful for the company, in putting those into a service view, and then one of the things we introduced in service views is this idea of anomalies. So we're looking to they built over 100 of these and in so instead of just looking at a, at a blank list of events or problems coming in, they look at services and and what's the health availability and risks of their service so they can prioritize properly. And then one of the key factors they take advantage of is looking at this anomaly. So it comes in with events, it comes in with KPIs, but but then we add in this idea of is something anomalous. And so that's because they get that little bit extra of information is something of not normal anymore. It really helps to draw attention to and help to look deeper to see where there might be problems and then through all of that with with with adding anomalies, plus, organizing things in a different way, looking at services, you know, they they claim that if they've reduced the time it takes To find the root cause by about 30 minutes on the average for, for for incidents and problems out there with their their finding

Greg Irwin 10:08

out what kind of applications and these is this network anomalies for this application layer,

Pat Vogelsang 10:14

it will it can be any. So we can apply that to any time series data. So whether that's from a network interfaces, whether it's CPU, it could be, it could be an application data coming from an APM tool that's measuring the performance of a page refresh or something inside of an application, it could be very, very specific. It could be network anywhere across the infrastructure with, which is one of the other nice things, we take that anomaly detection, and we can roll across any, any time series KPI that's being collected.

Greg Irwin 10:50

I have a question I'm going to ask I'm good at asking a lot of questions that I've put in the chat here. Folks, I'd like your questions more than mine. So help me out. And think think to yourself, what's one thing you'd like to hear not just from ScienceLogic, but from others across the group. And hopefully, we can have, we can bring some others in and share some some stories of projects that different organizations are working on. And hopefully some successes that can be shared across the group, you can just jump right in with it, or probably the easiest is to drop it into the chat. Okay, Patrick, thank you for that. Um, man. I think one of the questions I've got is the training, and what kind of training does it take to be able to get a reasonable, you know, sample set, that you can then start realizing some some models that have some some highly useful information that with that story, or the one that you just shared? Or others? What's the typical training time needed, so that you can get relevant datasets and algorithms? Now, Pat, that's for you.

Pat Vogelsang 12:10

It's from Oh, that's for me. Oh, yeah, for sure. Man, that's for me, okay. So.

Unknown Speaker 12:14

So

Pat Vogelsang 12:16

if you want to get laser precision, and you want a machine learning algorithms that tell you that something is unequivocally a problem. More than likely, in most it monitoring platforms, there probably is not enough data for a machine learning algorithm to really learn enough to then tell you that this something is truly bad. Yeah, which is why we've elected to go down this path right now of looking at anomalies in with that, our algorithms, we have several that we use, and they'll they'll change and adjust based on the way the data comes in what the dataset looks like. So they auto adjust, we don't require anyone to tune anything. And they will start producing potential anomaly detections within, you know, say, depending on the on the frequency, which you collect data, most of stuff in our system, most data is averages five minute collections, you're going to start to see anomalies after a couple of days. So once you get three, four days of raw data, assuming that that there is anything anomalous, you might start seeing some anomalies, but it could take, it could take weeks, before you start to see something. And then obviously, in some data patterns, you might never see one because there aren't any them. They're very predictive. And they do the same thing all the time, which is also very common in a lot of systems.

Greg Irwin 13:49

Got it? And let's take on a couple of the comments. And thanks to me, and thanks, Jonathan, and others, please drop your comments in what's the data by your Pat.

Pat Vogelsang 14:02

So the data we're looking at is any time series data that can be collected. So for ScienceLogic, that's the angle we've taken on machine learning some of our some other people in the AI space, are looking at events or alerts and they're looking to child see patterns in alerts, logs, alerts messages, to look for patterns across multiple devices to try to derive conclusions from that and that's another problem that in our industry, people are trying to apply machine learning to it that's an angle we've decided that this moment not to do in some of that is because we we platform tends to integrate with other products where we send our events up to them and invested here we have a focus so we're primarily looking for us we look at time series data.

Unknown Speaker 14:55

Got it?

Greg Irwin 14:57

So me Why don't you jump in? Good to speak with you as always Give a give a little intro here.

Sumit 15:06

If I can double unmute myself. So I submit that well, I've worked for Broadridge I have done for all my career. I've recently switched from managing DevOps for Broadridge are the DevOps tools to an enterprise monitoring role of what that involves, I'm still getting to understand so. But yeah, one of the big initiatives is AIOps. And I'm still getting my head around that in my kind of limited understanding of what what it means. Yeah, most of the diamonds, anomaly detection, forecasts, etc. So that's where my questions were coming from.

Greg Irwin 15:49

What are the apps? Are we talking mainframes, client, server, machine, everything? micro service? what's the, what's

Sumit 16:00

the entire I guess mainframe doesn't come into my world. The mainframe monitoring is being handled by the mainframe teams. But on the I guess, mid range comes into it a little bit AI X, power series, etc. There's there's on prem, x86 window, Linux, Windows, there's cloud, there's micro services, there's containers. So and with cloud comes serverless, as well. So yeah.

Greg Irwin 16:36

What's the budget quite a wide in terms of like looking at the operations? mean, is there a problem or problem

Sumit 16:44

to be solved? is reduction in the number of critical incidents? Were getting? And also mean time to resolution?

Don Keninitz 16:53

Yeah. Is that? To what extent

Greg Irwin 16:58

I'm just wondering right now, is there an issue with critical outages? Where if you look, look in the last year, there's more downtime than you should reasonably be having? Yes. Do you know where it's where it's starting? Maybe it's database? Maybe it's, you know, you know, where we're in the state?

Sumit 17:17

I think the the technical issues are kind of wide. There's more people process issues as well, in terms of engaging the right. Technical owners SMEs in a timely fashion. Kind of something Patrick talked about in terms of mean, reducing that mean time to root cause. It's quite interesting in terms of Yeah. How do we take that metric time series data to where, where a service has maybe interdependent services, it's distributed? How do we know it's the database on service, three of that, of that whole chain? Rather than? Yeah, you start going down, you have a problem on service one, you now figure out, okay, I need to actually talk, figure out what's wrong with service three, and then you go down and pick it out. It's the database, but already, it's not the database, it's the storage or something like that. And by the time you get down to that level, you're a few hours in because every, every team you need to engage is a page. And yeah, and kind of it's it's interesting that Brad mentioned, about kind of event related. That's, that's the likes of pager duty doing, I guess, event correlation and getting AI on that. Versus I think, because of our main framework, we were engaged in the Broadcom AIOps solution, I think, which is also doing similar stuff to what trans logic is doing in terms of taking time series data and also claiming to do the same stuff. I haven't seen it in action enough to be able to

Greg Irwin 19:16

do you systems currently at in play, trying to help you with some of the correlations. There there must have you must have a dozen observable, you know, observability platforms out there. And why don't we go into

Sumit 19:34

a dozen monitoring systems? I won't say a dozen observability systems because, yeah, that's why it takes so long. If we had observability systems, I guess we would be much quicker at finding the problems and diving into where the problems lie.

Attilio 19:54

Interesting.

Greg Irwin 19:56

All right. Awesome. Stay, stay involved. Let's go already our group here a little bit more. Jonathan, also, thank you for turning your camera on a nice to meet you. Would you be able to share a little story with us some of the successes or challenges that you're you're working on?

John 20:16

Yes, sure. So I'm John Pickering. I'm the VP of Product Management for systems connectivity and Informatics at Safford. And we're, we're a manufacturer of PCR test equipment. Recently,

Greg Irwin 20:35

very, very popular guy. Right time

John 20:41

that we we've shipped 10s of millions have to be careful. I say this, because

Greg Irwin 20:51

what you're doing it's, we've been talking about having an impact on the world what

John 20:55

Yeah, well, the business ramp has been absolutely incredible. And yeah, we've shipped probably over 50 million COVID test cartridges in the last year. So we manufacture the cartridges. I'm not sure how familiar you are with the industry. But the The analogy I like to give is that the you know, a test cartridge is about the same size, but it's more than a Hewlett Packard print cartridge. And our typical systems, the one we sell most often around four can run for couches in parallel parallel processing. So all the magic happens in the the assay Karcher, that's where all the biochemistry is. And then we have a module which has an optical detector, except exotic site detect box to, to look at the to do the chemical analytics, and then a heater and some mechanical components. And these are all module modular, so you can run the system, you can run, you know, parallel process for different modules at the same time with different different types of tests, we got 32 different tests from COVID being a big, obviously a big runner right now. But we also were big provider, major provider of HIV and TB testing in the developing world to one of the challenges is the reliability of the module that drives that test cartridge. We have an installed base of, of modules which are in in the hundreds of 1000s. So it's a big, it's a big running component, we've recently just just in pilot, we've been looking at developing machine learning algorithms to predict failure, because if we can predict the failure, then we can move out of a, you know, from an unscheduled maintenance event to schedule maintenance. And ultimately, you could, you know, the model could be we could go to cell service model just like a Hewlett Packard printer where they where we can predict when the module is going to fail precisely

Greg Irwin 23:14

do their data indications that can reasonably be registering it at one and you look at the diagnostics that you can reasonably forecast the likelihood of

John 23:24

Well, yeah, we've been using machine learning algorithms to test. I haven't got all the details in front of me. But there are teams who are working on this detecting, you know, crunching through hundreds of you know, data points over 1000s of cartridges. And we've gotten to a point now working with a third party where we can predict with over 90% accuracy or a module failure within three weeks of it failing. So it's been really nice. Yeah, I mean, this is over, you know, three weeks in the context of, you know, six to 12 months. So it's quite, quite significant life of a cartridge? By Yeah, it totally depends on the utilization rate of this, but it's in there. It's in the two in six months and two years. So

Stephen 24:16

that's really cool.

Greg Irwin 24:16

So yeah, it's really old, you say we go in proactively and say, We think your system is on the verge of failure. So we should order it now. Otherwise, you're going to struggle with down

John 24:28

Yeah, and ultimately kill order for them and replace it automatically. But obviously, we will, you know, obviously prove the fundamental reliability. One, the challenges were the first I had two questions, you know, you know, I'm used to this panel on the team on the panel. Just curious is this is something that you've seen in analogous situations and then the other challenges that we're looking at, obviously, you know, it's kind of a brute force. We've taken some time Hello, today's hear off these systems. It's pretty data intensive. And you know, when we go to developing nations, the bandwidth isn't that high. So we, you know, we connect and put the data in the cloud and crunch it. But, you know, the next level of refinement would be to go to be more selected to the data

Don Keninitz 25:16

parametrics,

John 25:18

which, you know, are key contributors. So, those are, those are the things that we're we're noodling on, it's still, again, it's still early days, we're not. We're reviewing this and trying to assess the value it can bring. But these results are pretty new. And it's very promising, very interesting.

Greg Irwin 25:39

For you. It's exciting. I'm going to start back with Pat, pat here this way. And, but also others, I'd love to hear stories where predictive signals, I have been used to, you know, circumvent any, any issues, not just physical failures, but others. Pat, what have you seen in terms of any scenarios you can think of.

Pat Vogelsang 26:07

So we have a customer, that's a learning data center poster, not a big one, but it boutique. So they're competing against the big clouds. But staying ahead of capacity is really urgent. And though especially these days when it's not, supply chain is not exactly easy to get materials always when you want them. But so they took a very similar approach that Jonathan was just talking about. And while I described what we're doing with machine learning, in our platform, we weren't really we're looking for buy behavior, what they wanted was predictive analysis. And, and while we have what I would call, static, predictive analysis, being at any given time, you can stop and go look to a dashboard and, and look at any time series data and run some different predictive analytics to try to get a glimpse of what that measure is going to do in the future. And that's interesting to look at. But that doesn't tell you in real time when you need to take action. So they did very similar, they took the data from our system, because we're collecting all the data from their data center. And they took it to a third party system and did their own machine learning algorithms, to then bring back the predictive analytics back into ScienceLogic, and then using our software, then to take that information, and then run it through the workflows that we provide, to alert and detect and, and that's helped them to get ahead of storage capacity. You know, they're their big server blade server platforms, when you add another blade to the, you know, when do you add capacity for them, it's really important because, you know, it's expensive, and they don't want to add capacity, unless they have use for it. And but you have to add the capacity at the right time, so that you can bring new customers on and grow your business. So it's been, it's been critical for them to get their business to be successful and predictive, so their customers can grow. without them having to, you know, watch their system that tells them when they need to go out and actually do purchase orders and bring in more equipment to their data center. So very similar use case, but what I'm Jonathan's talking about that applies with data center, again, using using our software as the collection method to get the data.

Greg Irwin 28:34

Right. What about the question, and by the way, I want to encourage let’s see Davis Trecon Robert Altilio. Bluff to get you guys involved, if y'all if y'all are able to, because we just want to share the stories. But, Pat, back to the question on data. In other words, once you have the the model, you have learned, can you really skinny down the amount of data that you have to collect, so that it's not just overwhelming, you know, upstream data,

John 29:05

no, no trick my colleague, because we don't really have truck loads of data coming out each system, it's just, you know, you got 10s of 1000s of systems, you can scale that up, and it's, you know, if you don't have good bandwidth, and just another thing to add, I mean, this is obviously only works if your instruments you can only be predictive if the instruments are connected and in the medical device industry, it's not you know, connectivity of all systems to the internet is sort of way off especially when you get out in the middle of Africa or or India. So, connectivity is a requirement for obviously for this type of machine learning predictive analytics to work and then secondly, you have the constraint on the the data the you know, the data export, so,

Pat Vogelsang 29:56

you guys need the low orbit satellite network tomorrow. Yeah, that's gonna, that's gonna be revolutionary for what you're talking about. Yeah,

Greg Irwin 30:06

you know, the one, the one analogy I had just hosting these sessions was on HD mapping for vehicle for autonomous cars. And it was such a problem in terms of the amount of data that needed to go up. Yeah, to do local process, they actually had to go so far as to by, by and design their own local processing machines. So they can run their analysis locally, to make their decisions. There's just too much data to be to they have to basically pre-process it.

John 30:43

Yeah, I guess that's what I'm getting at Greg, because, you know, the, you know, there's two routes, you can just export truckloads of data into the cloud and then do the analytics, or you could do a skinny down data and do it on the instrument itself. But

Greg Irwin 30:56

then you need a processor, you need to be need to quit. Yeah, so

John 30:59

it's just trying to find the right, right balance. I guess it depends on the use cases. And, you know, for our situation, it's, it's complex. I just wanted to highlight this data, because it's really,

Greg Irwin 31:13

if it's going to struggle, you know, widows and orphans, it's important in justifies, you know, but if it's less important than, you know, you can learn it over a longer period of time, or or deal with some failures, excepted failures. And what have you seen in terms of data consumption?

Pat Vogelsang 31:34

Well, in our, in our industry, it's fairly similar that most, most solutions similar to ours, we do put, compute the collection out is close to where the data is being collected as possible. So that the collection happens as local as possible so that we can do things to the data to improve the efficiency of it coming back. Sometimes it doesn't mean it's less data, but the data is more efficient on how you deliver it back because you're using compression and other outreach. It is a challenge with what we find is people want more data. So quick, you know, no one seems to want to no matter what the cost or the hassle, nobody ever seems to want to give up data. And think it's a it's a beautiful scheme by these cloud guys. Because the picture the data is cheap, but it's really not cheap. And

Unknown Speaker 32:30

yeah, it's a good deal. If

Pat Vogelsang 32:31

you're a storage guy, you know, we we have a sweetness, we have a satellite vendor, that customer that has to deal with similar things, because where they're getting their data from a golfer, sat links. And unfortunately for them, what they have to do is they just have to change the frequency. And they can't get the data as frequent as they want. It just there's just not enough bandwidth. So they unfortunately, they would prefer to get more data, but they have to just down and get a little bit less data just because of the round trip delays on delivering data across the satellite network.

Unknown Speaker 33:06

Yeah, it's too intense.

Greg Irwin 33:08

But let's, let's broaden it a little bit. I want to invite some others. So Srikanth Srikanth, Can I Can I invite you to share a comment or question here with the

Unknown Speaker 33:18

group?

Greg Irwin 33:23

Tree County, check you with us. How about how about Davis? Davis? Are you on the line with us? Yeah, I'm here. Hey, nice to meet you. Today. You took some time to join and listen. I'm curious, what's your what's your angle and interest in in AI out?

Davis 33:48

But I am still a student. So I'm in school learning back in machine learning. Right. And? Yeah. It's been interesting listening then to the problems that we currently face. And but unfortunately, I don't think I have much to contribute in that area of

Greg Irwin 34:10

no worries. No worries.

Pat Vogelsang 34:12

No pressure Davis.

Greg Irwin 34:14

It'll be a test after this. Afternoon. Yeah, Davis, I just I have your background here at Boeing. Are you working with them? Or is that is that a mistake?

Davis 34:26

No, that is correct. I am also working with them at Boeing. Today, we're taking my organization as usually involved with quality data that we're taking over from airplane barrels with this, our Sonic scan done them. And we want to take them up to the cloud to Google Cloud. Do some analysis on that overnight. We're just keeping it for regulatory purposes. But eventually, we want to get some good data from it that meaningful insights from it, but not at the moment.

Greg Irwin 35:15

On it. All right, David, thank you. Yeah, let's, let's try Attilio Attilio. Are you on the line with us? I am. Oh, nice. Nice. Nice to meet you. What's your interest here in the AIOps conversation?

Attilio 35:32

Yeah, currently, I'm an Application lead for BBVA. For now, case called NetCast, which is Corporate Online Banking, where we recently just merged with PNC Bank. And we're going to converge right now to from Metcalf to pinnacle. So currently, I'm in the process of learning Pinnacle application, and how that works on different from net cash, corporate only banking. So pretty much we handle all the wires and anything for corporate-based companies. You know, we got to make sure everything's running and working wires transferring and everything like that are pretty much I just saw, you know, listening, you know, listening in trying to understand, you know, dabble a little bit machine learning in AIOps, in previous roles at at&t. But I'm currently, you know, just active listening right now.

Greg Irwin 36:27

You got, I mean, question I've got for you is uptime. Is that? Is that a critical concern? Is that, uh, things are fine, where it's managing. Okay. I mean, that was the urgency of, of what, you know, what these topics are for you and your group? Yeah,

Attilio 36:48

pretty much, you know, it is it is big deal for us. You know, we have to make sure, it's, you know, it's running at full capacity, and with no issues. But, uh, yeah, it is a concern. It can be, I guess, you know, granted some issues with that, and, you know, the database and stuff like that. Got it.

Greg Irwin 37:08

Got it. All right. Any questions here for

Attilio 37:11

for the group? No, no, not at the time. I'm actively active listening. And I'm enjoying this conversation so far. So thank you for having me.

Greg Irwin 37:22

Thank you. Thanks for joining in.

Stephen 37:24

Hey, Pat,

Greg Irwin 37:26

quick, quick question I want to come back to you on here is cloud and cloud apps. Particularly as things move to the cloud? What does that do in in terms of your ability to properly diagnose? Particularly is, I'm not even going to talk about serverless, but just containerize it and and moving it to somebody else's infrastructure stack? What are you finding from your clients in terms of, maybe it's making things easier? Maybe it's making things a whole lot harder? Well, I think

Pat Vogelsang 38:03

what they what they come to us with is in finally we see ourselves as our own our own product we have gone from, or moving parts of it into the cloud as well. So there's, like think we all know the cloud is a wonderful thing, because it obfuscates a lot of things. And stuff just seems to go in quotes seems to just work, which is great until it doesn't. And the Coronavirus can be very complex. So where we try to try to help because you lose some you lose some visibility when you go to the cloud. But then you also gain a little, a little bit more information from the standpoint that there's a connectedness that we that we take out of the cloud. And, and we, we try really hard to bring the connectedness or the contextualization of how are the components in the cloud working together with each other. So it's pretty easy for a user to identify all the cloud configuration pieces that make the application work. And when you start to look in the cloud and in like microservices. And so you're spinning up some sort of elastic container service of some sort. If it's Amazon or Sydney or some Kubernetes service, there's lots of moving parts, you may not have the same visibility and the underlying architecture. But then there's lots of parts and then the cloud providers give you lots of different inputs to those parts, between security groups and, and other things that that are related that can cause really a lot of problems, including things like DNS and other things that can get your application really problem. So we we bring all those components into our database so you can see them so they can be added contextualization. And then back to that same story that we try to tell everybody, we try to look at things from a service delivery perspective, so that you can see the things that are tied to the service delivery, which, you know, we take advantage of maybe the greatest computer on Earth, which is the human brain, and we try to put the information in front of the operator, so they don't have to hunt for it, they can see it and quickly deduce where the problem is coming from, and get to that root cause faster. And that's, that's how we're trying to help solve these complex problems. If things go to the cloud, you lose some visibility, but you get some other information you don't get as easily as you would if it's, if you're building it yourself on prem. So we take what we can get, and we bring it to that centralized view, we call service view, so that you can see what's going on. And you can see what other parts of the environment are related.

Greg Irwin 40:58

Kind of the one thing I want to that I think comes up in these is you go to the, the IT team, and they say, Well, we've got ServiceNow and serves as a great coordination point, a great ticketing system, and they have some clearly have some tools to be brought to bear here. Without without the sales pitch, how much incremental is a third-party tool on top provide in terms of value on top of ServiceNow? Typically, in terms of limitations? Are you finding most many of your clients can address a lot of use cases of ServiceNow? Or are you finding note, most people really need special tuned, observable observe observability components or application monitoring components beyond and you know, you think data dog and New Relic and the basic guys on the application layer? I'm just wondering, what's the normal course in terms of the companies here and benchmarking? So the

Pat Vogelsang 42:08

normal course would be in our enterprise business, the percentage of our customers that use us alongside ServiceNow. Is, is astounding. It's almost it's almost every enterprise customer almost Yeah, I mean, yeah. And ServiceNow is great at incident management to the point that we used to have a ticketing feature in our software, and it's still there, but we don't advertise it, nor do we show it because our customers all do that primarily with ServiceNow. Some with, with other products share well, and a little bit with remedy still. But the value, the value is that our the monitoring solutions are talking to the devices directly. And so there's this relationship that your monitoring tools have with the infrastructure that your service now and your your your products that you bought to help you with your ITIL processes, they don't have that intimate relationship. And so what we try to do is bring the intimacy we have with the infrastructure in the in our relationship with the infrastructure, and put as much as we can up into the incident. So when there's multiple events, multiple problems coming, we want to try, we try very hard to put that information in the world, the holistic of incident in ServiceNow, along with as much triage data as we can collect, because we have a connection to that device. And we can talk to that device. So we try to talk to the devices at the time of a problem. So we can do all the triage, we take those run books, in those instruction sets, everybody has their ITIL process that says when this happens, do these 10 things, we try really hard to allow those, it may those those things to be automated, collected automatically at time of incident, and then put that information up into ServiceNow. Pros and Cons. I mean, the pros for us as a vendor is that it helps our customers reduce the time it takes them to solve their problems. The cons is sometimes they're never the other users don't even know that ScienceLogic is providing the data, because they're doing it all inside the their service now, and sticking with their ITIL workflow. So you know, it's um, that's what we're trying to help them. That's how we help them. So

Stephen 44:23

got that.

Greg Irwin 44:25

Hey, let you know. We have a couple others who are just joining in. Stephen right. Nice to Nice to see you. I hope you're doing well. We've been talking AIOps here for 45 minutes or so. Curious, what's your interest questions here around around AIOps. I want to make sure we're we're focused in the discussion.

Stephen 44:47

Yeah, well, thank you for acknowledging me. The question well about AIOps is the what type of information you're collecting from the servers. And once that information is collected, and you take off certain automation scripts are providing the follow up data to see if it resolves the issue or what?

Unknown Speaker 45:23

Oh, yeah,

Greg Irwin 45:26

that's a perfect. Pat, I think that's a great question. How does it work?

Pat Vogelsang 45:30

That's a great question. So the follow up question, did that, did that information provide the answer? And that's, and that's a great question. And we try to track that. But this is where I would say that we end up leaning more towards the ITIL products like ServiceNow, where they have this feedback mechanisms to to identify did that solve the problem or did not solve the problem? That is currently not something that we are tracking in our software today. Other than we, you know, we we can we provide the statistics that show we show that the improvements, we can show you the improvements in incident and event reductions as it relates to actions that are taking place. And we look at that as a as the maturity curve, and we help customers look at the maturity, as they use our products to help them see that the generally they can see that incidents reduce as they bring in automations. Because as they bring the automations, what tends to happen is you tend to move from being proactive, or from being reactive to being proactive, and we've become proactive, that's when you actually start to see the number of incidents start to go down. Because your automations are going up. And that's how we try to try to correlate it, but we don't correlate it to the exact action resulted in an exact solution, if that makes sense.

Unknown Speaker 47:12

So is it manual

Greg Irwin 47:12

to know I think it's a fair question, how do you know then you've cleared it's this the signal from ServiceNow or from your eye from

Pat Vogelsang 47:22

the way our software to work in either the signal from service now that that some of them ServiceNow, so that they fix it, or it would be that our product, and as a monitoring product is constantly looking at the problem. And if the problem goes away, we will auto clear it as well, in which case we auto clear it. So if the action did clear the problem, then the then through the way, our software is always looking at the problem, the problem goes away. Our product also then will tell the ideal, you know, the service now this incident has been resolved. And we'll resolve the incident and service now as well. So there's a closed loop between the two systems will tell the upstream and the upstream can tell us and either way that'll auto close if the if the remediation fix it.

Greg Irwin 48:09

Yeah, yeah. Got it. That makes sense. You know, that the one thing I always look to see if something's working, the best way is just talk to customers, and see what they're doing. Um, so your your clients, I think growing with you, you know, manage one environment and manage another one, that's the

Pat Vogelsang 48:29

best. Yeah, a lot, a lot of things that we see is, with monitoring, if I'm sitting here actually, on my desk, I have like case studies and I are so you know, I'm not trying to be like the smartest guy, a smart guy reads what's been written, but it's interesting. All of them have one thing in common, and it's they all talk about tool reduction. You know, I, one company here got rid of was able to to shut down seven of their monitoring tools, we have a, we have one of the major airlines that was able to shut down 3033 monitoring tools. So just by moving our platform, they were able to shut down 33 monitoring tools. And so you think about the headache of running 33 monitoring tools, and then reduce that down. Because you get a single platform that can monitor the majority of your infrastructure. That's where you start to gain efficiencies. And that's where you can then bring automation to bear because you you're doing automations in one platform or a handful platform, not trying to write and build automations with 30 plus, plus tools. Yeah, that's so so so they tend to so the answer is our customers tend to start with a problem because they usually identify one thing they want to replace. Because we all have that that one thing that's the bad thing, we're gonna replace the bad thing. They replaced the bad thing then they realize, oh, we can replace other things and we can consolidate And then there are things that we don't. We're always honest with our customer base. And if there's things that they we think they should keep, then we're all about integrating with the tools that they should use and open and honest with the tools that can be replaced.

Greg Irwin 50:14

So are you typically working alongside a data dog or Dynatrace,

Pat Vogelsang 50:20

oftentimes, we are working with not so much data dog, we've not run across data dog as much with our customer base, but AppDynamics New Relic and Dynatrace, we see quite a bit. And that's an area that we don't get into the application performance monitoring piece of the market. So we don't play there. So that's where we rely on those tools. And then we take telemetry from them, and bring that into our system. So you can bring it into a, you know, you still that other tool, but we're gonna take the data through our engine, and then run that up into the service now. So it gets going through a single a single place. And that's how you can correlate and build a contextualization between the infrastructure and application that really helped the user visualize and see where the root problem is. That's how you get through problem you bring the data together, let the human the smart person take a quick look, and can quickly see oh, that's where the problem is, because they have all the data in front of them.

Greg Irwin 51:21

Right. So you're really more on the incident side. And incident resolution? Yes,

Pat Vogelsang 51:27

we're trying to help the incident resolution. Yep.

Greg Irwin 51:32

Well, hey, we've gone around here quite a bit, I want to make sure we're covering the key points. And so I'll turn to you and to the others on the line. Pat, the kinds of projects you're working on here, what, what's worth highlighting, for the group in terms of trends on outages, trends, on on, on, on deployments? And, yeah, any any closing content? Sorry, a big open ended question here. But I think what I want to promise for the group,

Pat Vogelsang 52:08

I think I want to just convey everybody that AIOps is not a thing. AIOps is a is it's a process that you're going to go through. And it's constantly improving, using the using technology, and then using software, to start to identify the repetitive things that people are doing, that you can automate and looking for where you can take time out of problems. And, you know, that's what AIOps as we talk about when it's a journey, and don't get in a blood people get I'm sure he was coughing Gee, I need I need to ask and, you know, I would tell you, you have AIOps, probably, but our definition, you may not be the most advanced in AIOps. And I would say maybe none of us are probably one of the yet because technology hasn't evolved in, I think in this industry yet to take advantage of the data we have the way we'd like it to, but it will because we know how technology goes. So you guys just jump in and you got to start, you gotta look for those opportunities to to make life easier for the people. Because we all know, I know our company, I'm sure yours all the same. It staffs aren't big. I remember back in the 90s, late 90s, early 2000s. It stands for gigantic, and they're just hiring at a massive rate. And then the 2000 citizens in the world changed a little bit and we saw it staff start to produce but our customers all run lean, I don't have any we don't have any customers that have excess headcount. And that's nothing to do. I'm looking at all these case studies. I don't see anything here where it says we've reduced headcount. I see where we reduced some outsourcing where they had to outsource some stuff but they were able to insource and save money I see some of those kinds of things. But that's what the if journey is a valid point. It's helping to identify those problems that the you know that you have and then how do you automate them make your people more efficient? And let your give your people the tools to see the problem and fix it version happens at having to try to find it that's which idea is that point where the problem is and the person is going to pick up the problem right away solve the problem,

Stephen 54:32

like my question quick and fast. What about AIOps we have is what is the automation foundation tool that is universally used and can be used at this point in Junction and better words are what is the The application tool to use for automation is two processes. You know, everybody likes using PowerShell for automation, but in that sense, you still have to have, if then execute type of thing. That's independent from PowerShell. At least, you know what I know so far. But if there's another tool that would incorporate the automation factor, and also the alerting factor, to as one solid platform, you know, life would be kind of easy, you know, because, you know, you need a tool to to be, I'm going to give me a real use case, I needed a tool to read the event logs, and then from the event logs, hey, if you see this dismiss, in sequence, or you see this, this and this, or you see this constantly, re coming up morning, morning, you know, type of thing, you know, then you want to kick off this program, or you want to send out or create a ServiceNow ticket asking for this server to be restarted, you know, or something like that. But you kind of understand what I'm saying, is that that base? Yep. That base foundation, I think that's the core to AIOps. And so what So what should we use?

Pat Vogelsang 56:39

Of course, the answers ScienceLogic, that's, that's easy enough, I'm just getting your question is awesome, by the way, because it, it identifies the the problem there, what really makes it hard and what

Stephen 56:56

I'm living in,

Pat Vogelsang 56:58

of course you are with. So when we tried to do and I'm gonna say we try to do this. And there's nothing perfect in the world, by the way. And what we try to do is just like you talked about PowerShell, well, there's PowerShell, you know, sometimes it's going to be, it's going to be an SSH session. Sometimes it's going to be a database query, sometimes it's going to be a REST API call, it could be a graph, QL API call. There's all these different mechanisms, what we try to do with our platform is, well, we have to use all those methods, because the things you're talking to don't all talk the same language. And not everything talks PowerShell. And not everything talks SSH, nothing has a database. So you, you have to be able to communicate across lots of different languages. But what we try to do is give a platform that the method is the same whether I'm doing a PowerShell to a Windows Server, or I'm talking SSH to a Linux or an AI X server, we try really hard to extract as much as we possibly can from you, and then give you in what a what we would call a low code to note code method where we can so that you can just insert your knowledge in let the platform take care of all the other parts of it, how to get the data, how to make the day look good, how to put the data in the right place, all you need to know is I you know, I need to run this command on that thing to get that data. And we're trying to, we're trying really hard to make that as simple as possible. So that you can take from the examples that are provided from the examples that others in the community have put together, and you're not reinventing the wheel. And you're just taking what you now and putting it into our system, letting our system collect the data, put it where it needs to be put up in the incident, and reduce the amount of time it takes you to build these things. You still have to do stuff, you know, you still got to be smart and educated and all this kind of good things. But we're trying to make it so you're not having to write a script.

Don Keninitz 59:10

That makes sense that that's what we're trying to do. Now.

Pat Vogelsang 59:21

It's not easy, and it's the constant curve. And, and but that's, you know, that's why we call our software platform. And it's meant for the user to customize it to what they need. And everyone's a little bit different. But we try to start you off in the same place. That's pretty, pretty equal and then you can tailor it and then we're trying to make it so you don't have to be a scripting expert.

Stephen 59:47

Right? Yeah, you got that. You got Wireshark type of things you got. Like I said again, you got the event loop. type of things and you got let's stop Citrix. You got Citrix monitoring tools and was that Director and now they split director with Citrix Cloud so you don't have the full unless you pay for it. functionality with Director and it's it's crazy. I mean, it's like one step forward, two steps back. And we got all these other attributes out there that you have to monitor. And you know, go pure example Cloud Connector they Citrix monitor, sir, is that there's no alerting they send you alerting after the fact. Yeah, so I'm trying to be proactive about there's is one thing, but if you have all these other tools, that's, that's providing you this information, and they're not talking to each other, and there's no conductor to say, Okay, I got this, this and this. And all right, I see this will kick off this to resolve that issue, and then send out an alert saying, Hey, we're seeing this, and trying to resolve it before, you know, impacts the user experience. You know, that's the i That's a basic idea. You know, and I always tell people, I always tell them, I'm just gonna end in this. I always tell people, just because we make it look easy, doesn't necessarily mean it's easy. Okay, thank you.

Greg Irwin 1:01:35

Stephen. I'll tell you, maybe we'll do some follow up with you here. And you can see a little bit directly of what what ScienceLogic is, you know, can do in terms of running that orchestration. We're going to wrap up our session here, Pat, thanks so much. Great, great hour. Want to thank a lot. I know. We're down to Attilio. And Stephen, and thank you both. And it was it was an interesting session all around. So with that, we'll do some follow up. And Pat, thanks so much for taking the time. Yep. Thank

Pat Vogelsang 1:02:09

you very much. Thanks for time.

Greg Irwin 1:02:11

Thanks, guys. Thanks, everybody. Bye

Read More
Read Less

What is BWG Connect?

BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution. BWG has built an exclusive network of 125,000+ senior professionals and hosts over 2,000 virtual and in-person networking events on an annual basis.
envelopephone-handsetcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram