Building the Case for a Data Warehouse Modernization
Sep 28, 2021 12:00 PM - 1:00 PM EST
What are the challenges of on-premises data storage? How can you effectively migrate data to a cloud-based platform with the push for faster, accurate, and remote data?
With the recent push for DSP, it is daunting to start and implement a serverless structure. Shifting to a cloud process allows for scalability, instant access to data, and the ability to recreate that data at any point in time. How? By building data architecture within your brand's cloud server, the automated process of accessing data is secure and is capable of being done remotely.
In this virtual event, Greg Irwin sits down with Michael Tantrum, National Sales Director for Teknion Data Solutions, to discuss adopting cloud data management for modernization and scalability. Michael Tantrum discusses the opportunity to modernize cloud data tools, have encryption and governance over cloud data, and trust documentation and quality.
Resultant is a modern consulting firm, focused on technology, data analytics, and digital transformation, with a passion for problem solving.
Connect with ResultantNational Sales Director at Teknion
Michael Tantrum is the National Sales Director at Teknion Data Solutions, a team of data professionals that specializes in designing, building, and implementing data and analytic solutions for global organizations. He’s also the National Sales Director at Validatar, an automated data quality and data testing platform under the Teknion brand. Michael has 30 years of data experience and has worked with some of the largest data warehouses across industries. He earned his bachelor’s degree from the University of Auckland and his MBA from The University of Manchester.
Co-Founder, Co-CEO at BWG Strategy LLC
BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.
National Sales Director at Teknion
Michael Tantrum is the National Sales Director at Teknion Data Solutions, a team of data professionals that specializes in designing, building, and implementing data and analytic solutions for global organizations. He’s also the National Sales Director at Validatar, an automated data quality and data testing platform under the Teknion brand. Michael has 30 years of data experience and has worked with some of the largest data warehouses across industries. He earned his bachelor’s degree from the University of Auckland and his MBA from The University of Manchester.
Co-Founder, Co-CEO at BWG Strategy LLC
BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.
Senior Digital Strategist at BWG Connect
BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution.
Senior Digital Strategist Tiffany Serbus-Gustaveson runs the group & connects with dozens of brand executives every week, always for free.
Greg Irwin 0:18
My name is Greg and with BWG and I've had the chance to meet many of you in the past, we're partnered here with Teknion to talk about some real-world stories around data warehouse modernization, and all the different things that that means. My goal, my goal is to drive some real stories, the complexities of moving or standing up data warehouse, and actually realizing some tangible business outcomes from that. So that's going to be my push on those best stories, I promise are going to come from others around the group, by digging in doing a little bit of interviewing and understanding some of the challenges of going to a cloud or handling governance or, you know, putting together you know, a lift and shift of all of the legacy integrations that were built over a years and years, and see how people have have navigated all of that, I'll ask everybody to make a personal goal. The best way I can think that you'll get value out of this session is to one to walk away with a new idea. And to to walk away with a new relationship. And if we can help in either of those fronts, just ask those relationships, we we've been very fortunate to pull together an outstanding group of professionals. So anywhere across the grid, you can go over LinkedIn. And if you need some help you just ask us. I'm co-hosting today with Dave and Mike. Oh, Michael, over at Teknion. So Dave, do us a favor, and grab the mic and give a little intro on who Teknion is.
Dave Haas 1:59
Hey, everyone, my name is Dave Haas, I've got 25 plus years, get some show my age of development bizdev sales marketing experience leading in the technology space, and just superexcited for this discussion today.
Greg Irwin 2:14
So we're gonna keep Dave and Michael involved as we go. And I'll ask everybody to take the spirit of a group conversation, which means feel free to jump in and ask questions on what somebody else is sharing. It's much richer, much better when that happens, as opposed to just looking to me or Michael to kind of go deep in the stories. That's how we're going to uncover the good stuff. So I'll ask everybody to take that and follow that spirit. Michael, great to great to be co-hosting with you. Please give your intro to the group.
Michael Tantrum 2:52
It. Sure. Thanks, Greg. So I I started in analytics back in 1989. Back when these systems were called decision support systems before they became data warehouses, so 30 plus years of doing data and analytics, worked for companies, building analytic systems and teams, worked as a consultant now at Teknion really as an advisory capacity, just helping companies figure out how to get value from data and how to how to drive business decisions of reliable trustworthy data. And so take me on that's kind of what we do. We have, we have four pillars, we have a data, pipeline practice and data visualization practice the data governance practice in a data science practice. So it's all about analytics. And it's all about how can companies derive good value out of the data that
Greg Irwin 3:50
they have? Awesome. And give me some real basics here, Michael? How many? How many employees? Are you? Are you really tuned to the fortune? 500? Do you spend all your time in the mid market? Or are you and is there a technology? singular expertise? Maybe you're the world's experts in snowflake or in? I don't know, Tara data, but a little bit more on the organization?
Michael Tantrum 4:16
Yeah, sure. So we've got 65 employees. All of our consultants are employees. We don't have contractors. So we're not a body shop. It's very family feel to the firm. So when people partner with us, it does feel like you're getting people who care, the type of technologies we're doing so much cloud work right now, when people talking about data warehouse modernization. One of the big questions is, you know, how do we what is modern look like? And so the cloud is the obvious place to go. Because the big elephant in the room is snowflakes. Are we doing a big, probably 90% of our projects right now as snowflake projects. Historically, we've had a very strong technology focus around tableau, and they also We have automation tools in the data warehouse space. So things like wherescape, metallian, route data vault and things
Greg Irwin 5:07
like that. Hill tricks. So this group collectively here is going to help us to find the conversation. And Michael in our prep, I love I, I have a written down right in front of me, it's everybody really needs to define for themselves. What does that monetization really mean to you and your organization? And I love it. I'm going to spell it out for you. These are Michael's words, not mine, but I'm stealing it. Is it governance? Is the is the problem around governance. Is it architecture and go into Cloud? Or is it strategy, and basically how the data pipeline works, or maybe something else. So what I'm going to do is I'm going to go around the group, I'm going to ask some really high level stories. And maybe we can start understanding some of the journeys that people have. By the way, as we go here, if you can turn on your camera, wonderful, that's great. It makes it a little more personal. And also as we go not to embarrass Michael, but let's use the check throughout. And first person in there, please give me a guess of where Michael is originally from. Just to to get the chat going. But let's so let's let's get things going here. And I'm interested to invite Vikas into the session. And Vikas I with you. Well, john, sorry, we got a little echo there. Because we had you in the past during one of our sessions. So I'm interested in speaking with you again. Would you do us a favor and give a quick intro to the group? Absolutely.
Vikas Ranjan 6:46
Let me try to turn my camera on.
Greg Irwin 6:51
Super awesome. Everybody. Great. Great to speak with you again. By the way, Anthony. Nice work. Good gas.
I guess yeah, please, quick intro to the group.
Vikas Ranjan 7:04
So my name is Vikas , I'm a senior manager on the data analytics team here in TMobile our primary focus is basically building anything in anything that you do network analytics, for improving the customer experience as well as improving our network and making it the best and us so data warehousing is has been there for many, many years, as somebody talked earlier, you know, pdss, and everything else. in our, in our space, we don't really leverage data warehouses because we deal with a lot of network logs. But in my past experience, I've worked extensively on data warehouses. So one thing I'm definitely interested in is building strategies around how do you move from your customers who are using traditional data warehouses? Like Tara data? So the world's into more towards this cloud based data warehouses? I think the strategy and you know, the, how do you find the path? Is what I'm doing more about?
Greg Irwin 7:54
Can I ask you the so alright, let's talk about migration and architecture, how to architect how to move from an on prem to a cloud? First question, why? Why do
Vikas Ranjan 8:04
it? Why is excellent question. speed to market time to market.
Greg Irwin 8:10
And agility is, so I would love to put to the group. And I'm going to try, I'm going to try and with all my effort to get everybody actively involved. So Can somebody tell me definitively that they've seen that acceleration? Vikas I agree with you, in terms of the story, it's about agility and speed, has somebody made that journey? And I can say unequivocally, that after making that move from a Teradata, on prem to a snowflake, that they're actually able to run queries, stand up new queries faster, bring in datasets more quickly serve the business more quickly, can somebody just say, I am proof that this thing is actually a better mousetrap?
Rameesh 8:53
I can speak to that great. I'm actually from Berlin, we are basically a data stewards for all the railroads in North America. We actually our cloud cloud migration initiative. And our data science practice, I think, like you said, has seen significant lead in terms of time to market as speakers pointed out, I think it has given us more opportunity to reach out to our third party logistics companies and you know, resell the data. But I know, we all talk about lots of benefits of cloud migration, but I think the cost is also something that we need to be aware of. Somehow I feel every day just on my own personal opinion. I feel that you know, when we do use snowflake, like what Microsoft and Adobe also, you know, migrating all our warehouse or their platform to snowflake and AWS DMS, but I just wanted to highlight that one thing we may lose sight of it is the causes by by with on premise, I see them and I run my According I don't really care about the cost as much, because we already do the lease or pay for it. But in cloud, especially snowflake is just like, you know, drinking, you know, the whole slide just like, keeps me burning a lot of credits. And you know, so that is fairly I think it's challenging. I don't know how many people have observed that.
Greg Irwin 10:19
Let's find out. Let's do let's, again, in the spirit of interactivity, everybody find your mouse. If you just put it over the grid, there's something called reactions, that little smiley face the thumbs up, we're going to use that thumbs up. The question is, has your organization or is your organization, your Oscar, you can't already raise a thumbs up before we even have a testing. He's testing. Alright, the question is, is your organization in the midst or completed a move of a reasonable amount of your data warehouse to the cloud? Please raise your hand if true. 123456? Yes, yes. The answer's yes, everybody. Okay, let's let that clear. Now the next question. Was it a good idea? In other words from the C suite level, your CIO, your you know, they look at it and they say, Yes, we went and spent it could be millions to rewrite and move wasn't worth it. Pretty hard. Thank you for the for the IC. IC. I see cries from John. All right. Let's go to a story. John, why are you crying? Or maybe you're crying with tears? Tears of laughter. JOHN?
Anthony 11:47
Oh, yes, sir. Yes. Well, we, we went from a single source CRP system to a wonderfully working data warehouse with a full dashboard and analytics and high level reporting laid over to a news.
Greg Irwin 12:05
Flash aurania we lost, you know, just like a second ago. No, yeah. Let's, uh, we'll try one more, please. Go ahead. Now, all right, John. Sorry. You're on me now? Yes. Oh, sorry. I
John 12:24
don't know what it was. Yeah.
Greg Irwin 12:32
Johnny was loaded all into a wait, I'm sorry. your mic is it was like, it's painful. I want I'm sorry, we want to hear you. We do. But I'm gonna need I'm going to leave the Saudi house, come back in and I'll bring you back in. Let me let me invite in pre og ready, Priyank, I saw your hand up right in there, real, real realtor.com. So you got to you got more data than you may actually win the award for the most data on this on this grid. But tell us what your modernization delay. So
Priyank 13:08
in the past, we have been on the SQL Server, Informatica is a serious thing, it's about four or five years ago, then we made a move to the cloud, everything to AWS. But when we moved to cloud, we, we went to the strategy of, you know, going with the data lake condition, real time, near real time condition, and bad conditions. And a lot of data ingestion and ETL processing has been through, you know, built in, you know, Python frameworks that we have custom build ourselves. And, you know, was kind of our warehouse sitting on top of it. But obviously, they it had given us a lot of challenges in terms of processing, maintenance, cluttering and all of that. So, right now we are shifting towards, you know, moving towards, you know, snowflake DBT. I don't know if this team is aware of DBT. But I'm super passionate right now after learning about it. But
Greg Irwin 14:08
DBT what is DBT? I'm
Priyank 14:13
sure I'd say it's a data build tool. It's a it's an analytics tool. Basically, in the past you, you would have those Informatica SSIS, whatnot, failing to process the data to create the pipelines. What DBT does is, is the same but it actually it's more, it's more based on the software engineering principles. So it's more like the programming side. But But actually, it's not, it's more it's more to say that, hey, you have a git repository, you you persist, you have models, which are basically the SQL files in your Git repository, and you use them basically you completely templatized right, and it's also open source. They also have a cloud based We are we are going to open source. But it's more powerful than any other tool that I've seen out there. And I've been in the data warehouse space for 12 years. I think it's going to be the next revolution. But anyway, so now we are moving towards the snowflake. snowflake airflow and I mean, we had a for in the past now we are moving towards the manager flow from AWS. And of course we have the tableau and MicroStrategy as well. I mean, my
Greg Irwin 15:31
goodness, the amount of change your whole data architecture is entirely new over the last couple of years. Is the change of worth it? Oh, definitely. Yes.
Priyank 15:42
I mean, we have seen huge improvements in terms of usage, you know, and the speed, right, I think, because was talking to the, the agility and all of that, that actually changed the complete game for us. But again, I think I think we are moving on to the next level, I think snowflake is gonna even add more power to us. So I'm ready to. So there are some problems usually in the technical, you know, warehouse space, do do the architecture, data modeling that you choose all of that, right. So some of these tools like Athena and s3 file based processing, there are no easy inserts or updates, or sorry, easy updates, or deletes, right, with the ccpa coming in, you're supposed to be deleting something, or you're supposed to be updating something all the time. So those dad kind of gave us a lot of challenges. So we started looking into things which would enable us rather focus more on the business use cases rather on the back end processing. So snowflake is a I think snowflake is is is very good. So far, we're just starting out. So snowflake has been good. DVD is also much good than you know, the tools that we have used in the past. So our lot of our resources have been in the past focus more on a building the frameworks, customizing those frameworks, adding more features, you know, resolving those bugs. Now, my team can focus more on the actual business needs, rather than on those great work. So
Greg Irwin 17:16
I love it. What what percentage of your data warehouse is going to ultimately sit within snowflake? When you're done with a couple of big releases here, let's say three years down the road.
Priyank 17:28
I think 100% I would expect 100% of our data warehouse? Yeah.
Greg Irwin 17:31
Are you worried about the cost? Ramesh call. Ramesh highlighted it. And it's right. I think there's a fear of the fact that they could do some form of utility based licensing on this. So that you do have to start thinking, Wow, that's a big query we're going to run, maybe we have to rethink running it.
Priyank 17:53
Yeah,
cost is definitely was there when we were evaluating it. But more than the cost, we were looking at our features, you know, we realtor.com as part of the news car. And I think there are other companies, sister companies are already on the side of the snowflake. So but I think yes, we do have some monitoring and alerting things we have. We have teams to you know, come up with these utilities to make sure that we are not crossing her monthly or yearly budget. So
Greg Irwin 18:24
very good. Priyank, last last one, you're you're in a group of a lot of data warehouse experts. What's one thing that you're working on and one challenge that you want to overcome, maybe not on an individual feature perspective, but maybe in terms of an operational challenge, that you and your team are working on your over the next 12 months. So
Priyank 18:48
I'm just getting started on the I mean, I've been on the data warehouse side, and just trying to explore these new tools that we are planning to use and and we now have an opportunity to basically drive the direction in terms of warehouse architecture modeling, you know, all of that great software engineering principles and stuff. So we are kind of laying the groundwork right now. challenges that I've seen in the past is that I think, I think a lot of the a lot of the data warehousing depends on the data modeling. The way the way we structure our data warehouse. And and, and also I think there has to be, I have many, many thoughts on this. Now that I'm kind of deep diver a deep dive into it. But basically, there has to be there is always a, you know, distinction between an operational data store versus a data warehouse and I think that's the kind of highlighted usually within the warehouse space. And there are a lot of on the warehouse side. I think, I don't know if you guys are familiar with the data wall modeling. Yeah, that's a new sugar for me, but I It's, I feel it's pretty good. I have to see and implement and see whether it's really truly works. And if it is as good as you know, people say that's something that, you know, I would like to see some, you know, evolution in that space. I mean, I think data what is one good step? But I think one of the reasons is also is that a lot of this data warehouse, ingestion and processing should be automated to a point that, you know, the business use case, right. So how do we get there, how fast we can get there, this is probably the next challenge.
Greg Irwin 20:37
A lot of you gave us so many, so many good things to discuss. I let me thank you. Let's use I mentioned in my first thing I said, Let's use the chat throughout. So as somebody is talking, and you have an idea or a question, don't let it sit, right, drop it into the chat. And we can always roll back and find it and raise it as a as a topic. All right, let's let's use that chat throughout. Priyank. Thank you very much, Mike. Michael, I'm going to bring you back in here, I want you Right, right side by side here is we're going what guidance do you have for Priyank and others as they're doing putting together their strategy? around around snowflakes?
Michael Tantrum 21:23
Yeah, so I mean, that the things you're talking about create a very important so you've done data warehouse version one, the opportunity of modernization is an opportunity to look at not just do I go in the cloud? Or do I stay on premise, I see this comments in the chats like what do we do about PII data, security and things like that over important concerns, but also an opportunity to look at tooling. So you've gone from a traditional ETL tool, I can Informatica, and so there's lots of interesting cloud native tools for populating data. So you've talked about DBT, and the Chileans, one that you often hear a lot about, etc. These SAS based products, and there's new ones coming out all the time. The next thing you mentioned also is modeling. And so you know, you ask about data of all your color, if you remember back the the Kimball m&m wars that happened in the early 2000s. But lindman, with his third normal form, corporate information factory was the original standard. Ralph Kimball came out with a staff steamers and, and you know who was right, and we're kind of seeing a little bit of that with data vault as well. So is data vault, the new thing is that the giant killer, they have all this really good use cases for it. And in other cases, it's massively over engineered. But if you have really high volumes of data, if you have lots of changing data sources, where you're adding new sources, or if you're a company that grows by acquisition, you buy a new company, and you take on whole new DRP systems and need to add that to your data warehouse. Or if you need, what we call ultimate time periods, meaning I need to be able to recreate data as it was at any point in time data vaults really good for that. But if you just have relatively straightforward data, your source system holds all the history. And then a traditional Kimball star schema. confederated data mart model is actually more than sufficient, and not over engineered. So there's lots of good choices there. But you're absolutely right to the point of saying what I'm choosing to modernize my warehouse, I get the opportunity to say do I change my modeling style. And so, yeah,
I have lots of thoughts. I'm gonna have hours of discussion around pros and cons of models. I'm not going to take that all up here.
Greg Irwin 23:43
We'll keep on bouncing around here. I want to bring in others and put more stories on the table. So I'm going to I'd like to invite Anthony to share here Anthony, thanks for your comments in the chat. Are you in a spot where you can share a little bit of your story?
Anthony 24:02
Absolutely.
I'll even turn on my camera even though I'm not dressed for public. Hi, everyone. Pleasure to meet you all. So I'm the director of being an analytic services for I've been with for about eight years, came over and focused on what we call health plan partnerships and products in 2016. So I've been separated from the hospital space for a little while. And what we're focused on our dictionary
Greg Irwin 24:27
got that the plus is the payer side. That's correct. Okay. Guys to pay the bills for those for those clinics and hospitals.
Anthony 24:37
We're working on it. We're working on it. We definitely, you know, seen as a growth vehicle, and we have a unique opportunity being much smaller than proper. We can try a lot of these things at a much smaller scale. So you can kind of think of us a little bit as an incubator for the system as well. So started off very similar to where Priyank was a few years ago. That's where we are today. A SQL Server SSIS pipelines have way more than I want to mention. And we've really, you know, hit a point where scalability is a problem for us. And that's when we started looking at the cloud. So I did a large pilot for value based analytics last year. So there's an analytic component, as well as the scalability and just kind of testing natural infrastructure. That was very successful. And so I've recently pitched a project for 2022. To complete that migration. We're at a point where scalability is a little hard on that Microsoft SQL Server infrastructure. So that's what we're looking at for 2022. fairly confident, I'll get that project approved.
Greg Irwin 25:36
I also want a sample why'd you go Azure? A
Anthony 25:40
few reasons, one, long history and relationship with Microsoft, both in my career and . And then just the skillset of my team.
Greg Irwin 25:51
Yeah, how have you found the how feature rich there, the Data Factory, is,
Anthony 25:59
I've watched it grow quite a bit, when we first started playing around, there was a lot to be desired and continues to grow at a favorable rate. And one of the things I haven't made a solid decision on yet is whether we're going to lift and shift a lot of our SSIS packages and run them in Azure, or if we're going to take the time to completely restructure them and build them in Data Factory, obviously, new new pipelines for building a Data Factory. But I still haven't made a solid decision on some of those others.
Greg Irwin 26:22
Definitely a learning curve. Got it. All right, what's what's a question you've got here for the group? I heard PII data.
Anthony 26:30
Yeah, that was one day. I'm not a security professional. But we definitely go through a huge security review when we did the value based analytics. So we are approved, we do have phsi sitting in Azure. So I was very curious. I believe it was Robert, on let some PII concern. So I was curious as to your cases, and why that was a concern for you.
Greg Irwin 26:51
And Anthony, sorry, just to follow up, is it? What aspects of security? Is it just knowing and identifying where the sensitive data is being able to report on it? Or is it actual encryption and control of the data? I think it's a question and
Anthony 27:06
controlling risk.
Greg Irwin 27:08
Okay. So let's, let's take that that's a that's a huge point is Michael to start but I'd love to hear from others who've gone through that journey of how much is enough? In terms of, you know, getting governance and security control over a, you know, built into the data architecture. Michael, would you start us on that maybe with a story? Yeah.
Michael Tantrum 27:31
So the difficulty with this, of course, is is when you do things like data security and things like that, it not only has to be done, but it has to be seen to be done. And so how do you give comfort to your end consumers to your audit people to your, you know, security people that their personal data, particularly in healthcare, but to see the same in financial services is safe. in the cloud. The cloud vendors have spent a huge amount of time answering this and trying to, to get all the certifications I know as your for example, in the Azure SQL DB Azure synapse, I've done a lot of work getting HIPAA certified, and things like that. But I think it's as much it's more of a competence problem and being able to demonstrate that things are good, rather than whether the technology can handle it or not. All of the cloud vendors actually have the, if you're willing to pay for it a virtual private cloud, which means your data is ring fenced in your own space. Not everybody needs to go to that extent, there can be a little, you know, because of the cost. But you know, if you have really stringent audit requirements, that that's a way to go. And be curious actually hear other people's stories around that. People in healthcare or financial services.
Greg Irwin 28:51
Let's Let's do it. We've got a bigger boss, somebody to volunteer, how they are managing the governance aspects of moving to a cloud. Ew. If somebody wants to jump right in, or I'd like to, so I'm, I'm going to jump right to Mark Mark J. at third. I'm sure it's something that's core to what you do so it's okay. I'd like to invite you to maybe share a story with the group. Mark, are you with us? Nope. How about Themall? Themall? Are you on the line with us? That's a lot of quiet. Yeah, sorry. Hi, can you hear me? Yes, crystal clear the mall. Nice to meet you.
Themall 29:49
Nice to meet you. Sorry, I was in multitasking. So can you say your question again?
Greg Irwin 29:53
Yes, first. Thanks for joining a real quick intro and what we're talking about you Data Governance and security. So particularly within the data within your data architecture, I would love just to go back and forth just for a minute with you in terms of what you're doing, and what works, what doesn't work in terms of data governance.
Themall 30:18
As I said, we've been recently moving towards the cloud data warehouse. And in that context, what we're doing here is governance is a part of our end to end game. So what I call is continuous governance. So we get engaged on security and compliance to not only security, but compliance as well. And then any of our existing infrastructure, which is on prem, we just try to put together what I call the data literacy and going on literacy together and get all this team work together to get us on the right track when we moved out of town. So we create a separate work stream for every small enough sprint to make sure that that is part of our you know, that particular sprint, and that is always the focus. So
Greg Irwin 31:16
if you see governance team that you have involved in this process, what they put in place, as you're designing it, in process or technology, that is is any different from what you are doing on prem?
Themall 31:33
Not really, I would say that thing different is more about getting them engaged from the beginning. Right, instead of that come later on, and have more focus and more engagement compared to, you know, the regular on time, right? Because on prem is done deal for a long time. Now, when we move to cloud, many times I've experienced in last few years is this pace, if we don't do from the beginning, it is giving you a lot of trouble later on, you know. So recently, I would say like, let me just think I'm looking at the chat and that, you know, looking at all these DBT and all stuff. So that is also one of the initiatives we have started you know. So what we are trying to do is any of such tools like five Tran DBT, or, you know, HBr V, we just want to make sure that whenever we are doing even PLC or beginning to evaluate any product, we get their security team engaged, right, and then
Greg Irwin 32:35
raised any meaningful concerns that have changed either tools or processes that you are planning to say again, second, as the governance team raised any concerns that resulted in you changing your your your tools or your processes. Not I
Themall 32:57
think the concern is mostly I would tell you is always whether we are sending our data through HTTPS or something like you know as your or AWS private link. Right. So that is the biggest thing they want to always see. And I don't know the nitty gritty as good as my team. But whenever we try to send something over HTTPS or you know, regular protocol, we're not using like TLS latest and greatest version, they are the first one to jump into enough stoppers to, you know, get us on the right track. Did I answer your question? Sorry, it was a little more technical to me. But I know
Greg Irwin 33:37
No, you absolutely did. I love to flip it around, though, and ask you to ask to tell us what's a question you have? Or maybe the my standard question, what's the number one priority your team has, that you're working on over the next 12 months? I would say the number one priority is continuously documenting anything, you know, like all the coding and anything we change, we want to make sure that we document that automatically, you know,
Themall 34:10
that way to reduce my time for my data engineering team end to end as well as what we call the analytical engineering team as well. So just do one quick exercise that anyone can share the experience in recently. And I when I got an I heard Priyank was talking about DBT. So that is one of the thing that I'm trying to learn from every one
Greg Irwin 34:37
of them all. I appreciate you jumping in thank thank you very much.
Priyank 34:41
I'm sorry. Sorry, sorry to jump in here. Yes, I think Yeah, even I forgot to mention about this documentation. But I think that that has become a huge part. So we want some self serving documentation with minimal, you know, oversight as to you know, continuous lab rate. So in the past, we have had You know, Wiki pages, you know, blogs and everything, which supports a lot of the data warehouse, you know, use cases explaining through what is what. And that's where we also ended up with DBT. One of the primary reasons is that it has this, you know, documentation feature, which, you know, you can publish adopts internally, like a website, and all of that documentation and the metadata related to all of these, you know, stable structures, and everything is part of the development process. So any schemas, any models that you build within the DVD, you can provide the documentation there, and that automatically gets rendered on the website, so you don't have to create these other Wiki pages, which are completely, which could completely go out of sync so easily, right. So that has been our major problem, because one of the things that we wanted to do in our company was to basically bring that trust in the data because, you know, some product analysts, you know, will probably trust the data, some, just by looking at one issue, which could be completely orthogonal to what we are reading, but, you know, one issue could just lead, you know, remove the trust in the data. So, it was important for us to keep that documentation in sync. And, and we have seen that, you know, with the DPT. So, we are excited about that, but yeah,
Greg Irwin 36:23
we are going to make sure I get it. What's different about being in the cloud and needing to document your, your code and your processes? How is it? How is it more important in this environment versus your legacy environment?
Priyank 36:40
It's not,
I mean, it's a documentation is always important to where we stay in the sense that you have to have, let's say, there's some dimension fact or, you know, some sort of an aggregator, you know, that you have built in, within the warehouse, there's always, you know, constant inflow and outflow of all these users who are familiar with it, and are not familiar with it. So it's always very tricky and hard to, you know, onboard every user, unless somebody you know, goes into the nuances right at the surface, it's all, you know, hey, this is just an aggregate you do this, but under the hood, there is there are a lot of these nuances behind these data sets. And a lot of those nuances cannot be documented so easily. So there's, there's always a subject matter expert involved, you know, having a session with them, you know, basically that means taking away their, you know, development time or whatever, right design time. So that has been a problem and as your number of data set grow as your warehouse grows, like, and that, that problem just, you know, amplifies, right, so we have had this issue, because we were very limited team, and the number of users of the data warehouse have been growing. And we were not able to address so a lot of these questions would come to some sort of a Slack channel, people pinging asking, Hey, what is this? Where is this data set? How do I do this? How do I do that. And it was another a major pain point for us to solve for. So you have been looking into some sort of documentation solutions, but any of the solutions that's out there, that we have seen is
Greg Irwin 38:14
similar to this wiki, I mean, wiki is a good place. I think a lot of companies have this internal wiki where you, you know, persist a lot of the documentation. But in our experience, it easily goes out of out of sync. Yeah. So especially in the effort, yeah, I get it. It's and I love this word, trust, Michael, as I get it as these data, let's say were successful. And you build this, this very dynamic, highly valuable, expansive data warehouse that everybody in your organization can use. Trust, how do they how do you build the trust? Yeah, the data that's in there is the data that they can and want to use for their analysis.
Michael Tantrum 39:01
Yeah, and that that is a a theme of modernization, as people are saying, I want to be able to trust this data. So documentation of courses is step one. And that also falls into business got her glossary, data cataloging. But there's another element as well, which is data quality. So one of the big problems I see when people do this big modernization event, as they build this beautiful thing, right? You build it, and they will come when you build it, and adoption is not high, why are people not trusting my new, my new thing I built? And so how do we, how do we build trust? And so you've talked about automation, and brand documentation? Can I encourage people to think about automation around data quality? How am I able to safely user that data that we have all the data from the source that we are processing and according to all the business rules that you have found, but also when we discover and surface data quality issues, how do we show you culturally that I have concern, I have put in place something to track it. And we will throw an alert if that ever occurs again. So building confidence in the solution is key. And often as engineers, we focus on that I will build it. And we forget about the end consumers and what it feels like for them to adopt it. And again, if they don't have confidence, you have managers not using your data or using it with caution to make decisions. And yeah, that so I'm curious to hear if other people are striking that I see it a lot on my projects,
Greg Irwin 40:34
firing. So I was involved in a number of eirp deployments in my career, and change management was always the hardest. When the engineers are building it, it's fine. When the business owners or operators actually have to use it. It's it takes it takes an effort to get that to get people to learn new, new processes, especially if senior management isn't there driving it down into the organization.
Michael Tantrum 41:02
And we all know people that have their their trusted spreadsheets, right, their crusty old spreadsheet, that's my that's my safety blanket. If your numbers don't match their I don't know, do I like your new thing? And so how do you never get the whole thing? That's an interesting problem.
Greg Irwin 41:16
Oscar, let's get you in. The Great, great to see you do us a favor and start with your intro Oscar. Thank you
Oscar 41:24
breakdown. Hi, everyone. My name is Oscar . I'm a VP, business engineering innovation and technology at one biopharma company that we focus on the neuroscience space. I've been working here for almost four years. For my background, I have been working on the data for more than 25 years. Over the last seven years, I have been focusing primarily on data science in you know, that's my about myself and again, happy to be
Greg Irwin 41:51
here. What's what have you done? And what it is? So we've talked about a lot. Yeah, we've covered a lot of things at a high level, and some things very, very well. But we've covered a lot. We're talking about trust, data trust, data quality, and really adoption of these shiny new platforms. Is that a focus for you? Or maybe your head is and time and energy is spent somewhere else? But can you can you share a comment on how you've instilled data trust across your organization? Yeah, I think,
Oscar 42:28
first of all, I would agree with all you know, everything that people say the last 25 minutes, I totally agree, this is nothing that I assumed it would be my miscellaneous or a couple of things that I will add one thing is there not trust is critical. But there are trust is not just the quality is not just getting the data to the right users also is also is related to the adoption, right? Let's say you build it like a you build a car, or a boat, if no one is using the car, it doesn't really matter, right? You have the right data. So in my experience, is why trust is important is getting the users to use the system that's maybe even more important. So that's number one, you have data quality issues, as soon as you can share those issues people recognize and they start, you have uses banging the system, right? That will help a lot in my opinion. That's that's something I learned over time. This is super critical. And even today, in my current environment, we're working with data scientists, we're working with scientists, academic partners in doesn't really care what is the data they care about, just give me the results that I need that does come in the second part, which is when we talk about modernization is like a like a big animal is like I think about is your you have a house or you move into a new house is my opinion is moving everything modernizing everything maybe is not the right is approached. If your company is pretty big, right? You have a pretty large system article in a few years ago was working with pretty large financial system where they have now 800 Hadoop clusters right now by bringing a large, large, the largest mini Hadoop implementation in New York, right? So for them, they were trying to figure out how do we move it to something like a BigQuery or snowflake? So the approach that we come up with them is we have to go and you'd have to like increment on I think, Michael, you mentioned I have to go take incremental. So once in that I probably I would love is that on this morning session incremental something that we came up with, you know, how do we do Federation with the data? Because the users, they were, you know, overly desperate or we're bringing new technologies for them is the trend is to keep running, right? You cannot say okay, the platform is gonna be down for six months, right to your child. So how do you federate? How do you build a system so The issues list the migration or democratization today and find out users that's on demand is super important. The third part that I think is, is also critical is is not happening right now is everyone's talk about this new concept lakehouse employee, everyone on the call probably you guys hear of all these right? And the same people, they were driving their warehouse in 20 years ago, like a built in mode, and otherwise, now they are driving the new term, right. So my opinion, this trend will continue for the main issues that are there, like data quality, data, adoption, those issues. They don't go anywhere. Right. I think it's my point in Greg and team is up. While technology is important, why we can talk about lighthouse monitoring. What is more important is the issues that are there. We have to find a way to listen to the end users and figured out what isn't working, how do we fix it right. That will help them move faster. I my opinion, this is one of the reasons why there are breaks as an awfully they're kicking us right? Because they're focusing more on they're gaining momentum with the final end users.
Greg Irwin 46:14
OScar what's your top priority for black Blackthorn?
Oscar 46:21
Is? One is speed. So there are scientists and again, again mentioned or were on the r&d drug development is we don't we don't have any commercial pros or so we're bringing data science to make things faster. So scientists asking what patients we need to have for a clinical trial. So how fast can we answer those questions, right. And as we go through a traditional approach, where we have to know analyze the data, bring the data to one lens, it doesn't work. Right? is more the approach that I'm doing right now is, and I'm writing a blog as we speak is, so what is the base approach to get the data and the answers in a model is maybe a couple of hours or days, right is so that's my mean, to be able to ingest a new data set within hours. It's not just the ingestion, the part is more it's more data, think about data scientists what they need. This is just one example is. So initially, we start looking at how we build a data lake, and we build an enterprise data lake or they will say, No, I use need to have access from my machine or from PC to machine. Right. Is that right? That's it okay. To How do you get there the faster way they won't, they won't care? No one would care, right. So that's kind of the again, the speaker inside my opinion does, at least from my experience, is one of the main opportunities that we have within the company.
Greg Irwin 48:01
Excellent. Excellent. Oscar, thank you so much. Really appreciate it. Alright, folks, we're 10 minutes left in our session. Let's, let's finish strong here, which means we got 10 minutes together. Let's take full advantage. Brandy, I absolutely want to get you involved. Robert, I want to get you involved in Aaron, I want to get you involved in Michael, I want you in the middle of all of that. So let's see how we do. Hey, Brandy, good speed. Good to speak with you again. Please give your quick intro.
Brandy 48:34
Hi, my name is Brandy. I'm currently the Data Services Manager at . We're a subsidiary of Linda. Primary. We're a DMV data data medical services company. So sort of healthcare but not quite. kind of in the middle.
Greg Irwin 48:56
A lot of healthcare on today. I like it, healthcare is well represented.
Brandy 49:00
Well. Yeah. Security is a hot topic.
Greg Irwin 49:03
What's it? What's your priority? What's the one thing if you could get input or feedback from others on? It'd be valuable to you? Right?
Brandy 49:13
So we are we have not even started moving anything to the cloud. We're currently working on a customer portal. So we our data warehouse is still on prem. And it's not really a data warehouse in any way, shape, or form. I call it the data puddle. They call it a lake but it's not. In my experience,
Greg Irwin 49:35
do you need a full blown data warehouse?
Brandy 49:38
Yes, it is. We're in dire need of a date something in order for people to be able to make those business decisions, whether it is a data warehouse, you know, in the classic term, or it's something similar with big data. You know, doing analytics
Greg Irwin 49:53
is a very dumb dumb question. How do you know what what's a question that you get the You can't get an answer to.
Brandy 50:02
No, we don't know how many customers we have, I mean, basic information like that. That's rudimentary stuff that you need to know. And because you can't you don't know how many customers we have, you can't go anywhere, right? We think is
Greg Irwin 50:20
the obstacle then to putting this in place I, you're on a call about enterprise data warehouses, so you care about it. Alright, why has one? What's that? What's what's slowing things down?
Brandy 50:33
I think adoption from upper management is really the key, having them understand the need, and know that the, if the business has information at their fingertips, they're able to make better decisions, especially operations, especially finance, you know, and marketing mark, and our key business key areas, if they don't have information, how do we know where we're going? How do we condition, right? Or where we been? How much are we making?
Greg Irwin 51:04
It's pretty, pretty straightforward. So you need you need, I don't know what, how can we help you?
Brandy 51:14
I'm here to gather information and understand what's going on in other industries, other companies to know, where do we gauge in regards to haven't even started to? I'm on the cutting edge? And in know, you know, where do we fit in the industry? And how do we get there, you know, what works for some people, what works doesn't work for other people, and just gather that information and bring it back and say, Look, this is what I'm seeing. And I want to try these things, you know, kind of recommend these things to my to my upper management and say, you know, what do you think about this, because cloud is definitely a hot topic now. So how do we get there? And what's the easiest way to do a PLC?
Greg Irwin 51:55
So I'm going to volunteer the group in this network. And if it's another organization, or Teknion, or anybody or BWG, let us know how it can be helpful for you. Thank you. Thank you. Thank you, Brady. Hey, Aaron, can we bring in Do you have advice here on how to connect the dots and an organization to show them that this stuff can result in business outcomes, improve business outcomes? Um,
Aaron 52:24
I don't really have advice were probably different from most of the people on the call or a Johns Hopkins School of Medicine spin off in mobile health. So helping through mobile applications and patient outreach, helping people do better at medication adherence. We were created in 2016, I think. So from the very beginning, we were in the cloud on AWS. We never had anything on prem. The one thing well, the one experiment we did is with our data warehousing and dashboarding analytics approach, we, we first tried something where we hosted the server, we installed the software. And then, you know, started bringing in data from our Postgres, our Aurora, Postgres, AWS hosted transactional database. But we're such a small company that even that maintaining a Windows Server in AWS was more than we wanted to do. So then we went with a completely serverless option where we, we know absolutely nothing about what's happening in the background. We just have our ETL tool, we have the data warehouse, we have the dashboarding. We have the pipeline, software, and we just do we do our stuff, and we don't know anything about the infrastructure at all. It's been great. Yeah, so for us
Greg Irwin 54:02
on serverless databases, and I've heard I've actually unprompted heard quite a lot of interest in serverless databases, not just in the analytic side, but you know, across different data stacks die. It's a great point.
Michael Tantrum 54:19
We're hearing a lot of people say, our core businesses in your case, you know, serving patients and things like that. It's not running IT infrastructure, and they're saying, We don't want even know about it, we just want to turn it on and have it work. So that's, that's another theme that I hear a lot. Now, when people are saying modernization.
Greg Irwin 54:38
Michael, how much more does it cost to go from? It's one thing to run, you know, a data warehouse in the cloud, it's nothing to go serverless how much more incremental is there to put to go serverless
Michael Tantrum 54:51
Yeah, it's really weird, a different place to spend your money. So you're classic on prem, I've got a big capital costs for right now and then are depreciated over the next year. Several years, whereas a serverless, or a cloud and managed structure is an operational cost, they pay a set amount each month, you know what I breathe in breathe out depending on how much I consume. And so it's an expense as a capex argument. And actually coming back to something Brandi said, sometimes going to the cloud is not actually the right thing to do just now. You know, maybe you've actually got enough investment already that you need to see the venue from that. So the costs, it's really a different different timing of costs, and sometimes different budgets have we get the cost from?
Greg Irwin 55:34
Yes. But in terms of serverless versus cloud, you know, snowflake? Yeah, what, how much of a, of a team at team savings is it to go serverless? Yeah, and
Michael Tantrum 55:49
it's one of the things to think about, you know, we've been talking about things like security, if you had to employ somebody to focus on security, you're going to get an you know, an average really good person, the best people who work in security work for Amazon, or Google or Microsoft, and they spend far more on security than you ever will. So that's where you get the scaling. So you end up spending money on the things you actually need to spend on and not having to spend on all those extra services that someone else who has worked out how to commoditize, it has already done, you know, like the big cloud vendors
Greg Irwin 56:21
were Michael and Dave, and everybody, we're going to be wrapping up here in just a minute. And I know, people here need to head off to other meetings. Let's do this, let's just do a wrap up. And a reminder, I'm going to send around a list of everybody's names. And I'll encourage you all to connect across the group. I said, two goals for the meeting. One is one new idea. The other is one new contact. So please take that to heart. And I really get the sense the community that comes together here, you can tell people are a this is this is an open friendly, supportive group. So you know, engage is would be my my suggestion, Michael and Dave, hos guys, give us some closing thoughts here for the group before we before we hang up our video call?
Michael Tantrum 57:12
Yeah, what I would say is, these projects can be daunting. Getting Started or trying to prove them out can be really tricky. I would encourage people to, to reach out for help and guidance, we've got lots of stories of how people have been able to sell the ideas to prove out ROI, or even just, you know, knowing which pieces to focus on next, because often, you know you've got the ocean, you're trying to bail it out with a teaspoon, and how to how to focus how to how to get value. The reason we do this is for our end consumers. And so how do we get our business users in a place where they can do their job really happy? And they're delighted in us rather than they see us as a necessary evil to supply data? So that would be my takeaway thought.
Greg Irwin 57:59
Let's do it that Dave. Sure.
Dave Haas 58:02
Yeah, echo what Michael said and anyone that would like some of the resources that we have, we have some great papers happy to send that to anyone here in this group and just take advantage of it. It's it's great information.
Greg Irwin 58:16
So that's, you know what actually use cases you can gloss over them. But there's a person behind that. Who had a project often who gets on the phone and says not only was it okay written out, let me tell you out of my own, my, you know, my own mouth, why I did what I did. So I'll encourage that. Dave, thank you so much, Michael, and everybody. Really, really interesting session. There's a lot going on here. I look forward to to all Thanks, everybody.