Overcoming Challenges in Data Warehouse Modernization

Dec 2, 2021 12:00 pm1:00 PM EST

Request The Full Recording

Key Discussion Takeaways

Data and business needs are constantly changing. Companies are modernizing their data warehouse infrastructure to increase their capabilities for processing data quicker and improving their business intelligence. The problem is that data warehouse modernization comes with unprecedented challenges.

There are complex considerations that come with deciding whether to modernize your data warehouse. These include which cloud solutions to adopt, the technological and business case for your preferred provider, what your expansion and growth plans look like, your particular competency needs, and a lot more.

In this virtual event, Greg Irwin is joined by Michael Tantrum from Teknion Data Solutions to discuss the challenges of data warehouse modernization and how to overcome them. They discuss the business case for data warehouse modernization, choosing a cloud solution provider, what to expect, cost-effective migration, and more.

Here’s a glimpse of what you’ll learn:

 

  • Why companies are modernizing their data warehouses
  • Comparing top cloud solutions providers and how best they can serve your needs
  • The business case for data warehouse modernization
  • What to expect from a modern data warehouse
  • Choosing a provider for your company beyond the technology
  • How can you prepare for the transition?
  • Michael Tantrum discusses cost-effective data migration planning
Request The Full Recording

Event Partners

Guest Speakers

Greg Irwin

COO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Michael Tantrum

National Sales Director at Teknion

Michael Tantrum is the National Sales Director at Teknion Data Solutions, a team of data professionals that specializes in designing, building, and implementing data and analytic solutions for global organizations. He’s also the National Sales Director at Validatar, an automated data quality and data testing platform under the Teknion brand. Michael has 30 years of data experience and has worked with some of the largest data warehouses across industries. He earned his bachelor’s degree from the University of Auckland and his MBA from The University of Manchester. 

Event Moderator

Greg Irwin

COO at BWG Strategy LLC

BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.

Michael Tantrum

National Sales Director at Teknion

Michael Tantrum is the National Sales Director at Teknion Data Solutions, a team of data professionals that specializes in designing, building, and implementing data and analytic solutions for global organizations. He’s also the National Sales Director at Validatar, an automated data quality and data testing platform under the Teknion brand. Michael has 30 years of data experience and has worked with some of the largest data warehouses across industries. He earned his bachelor’s degree from the University of Auckland and his MBA from The University of Manchester. 

Request the Full Recording

Please enter your information to request a copy of the post-event written summary or recording!

Discussion Transcription

Greg Irwin 0:18

Nice to see everybody I know I know a couple of you will meet, we're going to meet others. And that's a big part of the, of the purpose here. Again, my name is Greg. Greg Irwin. I'm one of the partners at BWG, we run these executive thought leadership groups. And it's a pretty basic model. The idea is bring to bear and bring together some like minded people doing some similar things, talk about the trends in the market learn from each other's experiences. So what I'm going to be pressing on today are the stories of what different organizations are going through. And I really believe there aren't a lot of silver bullets out there. These are real complicated, messy issues and messy environments, and the experiences of others go an awful long way. So we have a partner on this it's Teknion a huge thanks to Dave Brown, Michael Tantrum Dave Haas, we've been doing a whole series with Teknion truthfully, these guys are the Guru's of data warehousing. And it doesn't matter whether it's snowflake, or, or Azure, or Google, or, you know, pick your flavor. These guys have seen an awful lot, and they're a tremendous source of wealth and experience. So a couple points of per the agenda. First, I want to encourage you all to make one new contact through this session, not specifically Teknion and BWG although that's, that's wonderful. But across the group, reach out to one person who's not your organization, expand your personal network, and I promise you, you'll be you'll be better off for it. Secondarily, let's use the chat. The chat back channel is works so well once it starts happening. So just you know, warm up your fingers, drop your questions and comments. And it's really fine and good for people to respond and have an individual back and forth. I promise it will make it just an interesting, multifaceted conversation. So please use it, use it throughout. I'm going to start off with the guy's team over Teknion. But then, if you've been on my sessions, you know, I'm going to start bringing everybody into the forum and really pressing towards those stories, The Good, the Bad, and The Ugly. Michael and David, you guys are, you know are co hosting with me Do me a favor, one of you grab the mic, and give a quick intro for who Teknion is

David Brown 3:07

I'm happy to do that. My name is David Brown. Nice to meet y'all. I'm Chief Revenue Officer at the Teknion Data Solutions. We are based out of Dallas, Texas, we're a boutique consultancy of about 65, folks. And everything is based on serving serving our clients, helping them organize their data, understand their data, draw their data to the business side so that people can make better decisions from it. We do things every day in and around data pipelines, data warehousing, migration to the cloud, data governance, data strategy, data visualization, tools like wherescape, and snowflake, in Metallian, and five Tran, if any of these things are ringing, these are the types of tools that our folks use each day to help serve our clients. Because at the end of the day, you want your data to be access, you want your data to be trusted, and you want the business to have the data they need to make pragmatic decisions. And that's kind of why we all exist, right? We serve our organizations to deliver data and insights so they can make better decisions. So that's, that's a little bit on Teknion.

Greg Irwin 4:11

Excellent. And David, I'm gonna ask, do you have I understand you're fairly agnostic, but do you have a special skill regardless? Are you can you say yes, I'm, I'm agnostic, but we just happen to be, you know, double black belts in, you know, in Yeah,

David Brown 4:27

that's fair. I mean, at the end of the day, you have to be agnostic, because one tool does not fit all organizationally. But we certainly have opinions. Like we don't we don't show up without strong opinions of the technologies that are best in the marketplace. Probably 90% of our data warehouses we built last year were on the snowflake platform. So we're very, very strong and cloud migration. And then you know, Tableau are very, very strong in a tableau front. So those are probably two that I would point out but at the end of the day, we have a lot of projects and Power BI we have a lot of projects on print. And so it just what suits the client best? To

Greg Irwin 5:04

David, thank you. I look forward to the conversation.

David Brown 5:07

Absolutely.

Greg Irwin 5:07

Michael, give give a little bit of your own intro here, please.

Michael Tantrum 5:12

Yeah, sure. So I'm one of the on David's team here at Teknion. But my background is very, very strong and data. So I started out my career as a statistician, and doing analytics for companies that I got really sick of bad data, so moved upstream to solve the data problems and into the data warehousing. So I've been doing this for far too long, I did my first analytic database in 1989. And have just been ever since living breathing the world of how do we acquire data from operational systems? How do we assemble it? How do we prepare it? How do we enable our users to be agile to make good business decisions with data as they need it? And so everything? Everything I've been doing in the last 30 years has been trying to move the needle for people running businesses with data.

Greg Irwin 6:10

All right. All right. I'm going to get into some stories here with with you guys. But while we're doing it, I'm serious. I want everybody to do me a favor, drop into the chat. The one thing you want to hear about specifically, you can be really itchy. There we go. Arvind, thank you very much. There we have now now, that's exactly what we're looking for. Maybe it's Hey, I really want to hear about ETL into snowflake, maybe it's Hey, has anybody really played with a full blown enterprise class E DW in BigQuery? Let's put them in here, because it'll give us kind of the mile markers for the conversation. But David and Michael, tell us about one customer and one story. Not not the shining example. But kind of what you're really seeing of what one organization is doing. What are the real challenges that that they're seeing? And in standing up there. Edie who are modernizing their ADW? Doesn't mean it has to be new. Tell us tell us one customers.

Michael Tantrum 7:15

Yeah, probably that the one that springs to mind, only because I was talking to them yesterday about it. So Texas Mutual as a worker's comp insurance company down in Austin, Texas. And they, you know, they've been around a long time. And so they're, they had done their first version of a data warehouse. So it was very siloed was old technology, the data is coming from a mainframe. And they ended up with business units, arguing all the time about who had the right data, it took nine months to 12 months to add a new set of data and, and they said, This is just killing us, we cannot respond to new market conditions. We can't respond competitively. We've got to, and also the costs were escalating this, it's too slow and too expensive. And so the first thing they said to us is what are our options? And so we help them do an assessment to say, Do we go to the cloud or not? And if we do, what flavor, and also the opportunity to modernize the rest of the tool stack? So bringing automation into the picture. So automation is not just saying, do I go from on prem to cloud? It's also saying, How do I use my people better? How do I change my culture. And so we they settled on a snowflake for the cloud data warehouse, they selected wherescape for their data warehouse automation, they selected a tool called validator for their data quality automation. They were started up a whole new data governance practice, which is another thing with data warehouse modernization, governance has suddenly become the big thing that everybody's concerned about. And so that's a whole thread that we could go down today or another time. But yeah, so the goal was, I want to be able to add new data sources and new analyses within four weeks. To my users, I want my cost to be controllable, I want to be able to measure and also allocate costs. When business users consume data. And there's a real cost, I want to be able to allocate that and measure that. And also, I want to be able to prove to everybody that when you look at a piece of data, where did it come from? What is it? And can I rely on it? And so that this was a huge success for Texas Mutual. And that basically took them about 18 months.

Greg Irwin 9:29

How did they choose? And you got the question in there, you're going through an evaluation of cloud data warehouses. How did Texas Mutual compare and contrast?

Michael Tantrum 9:38

Yeah, so with the first thing, we sat down with them, and we said, what's important to you? And so all of the different cloud flavors as well as their other alternative was to stay on premise. All have different pros and cons. And for them, they said, we wanted to be able to separate the cost of storage, away from the cost of processing or cost of compute. And so when we evaluated all the different tools, snowflake rose to the surface, we actually looked at doing a pure data lake. And they said, look, we've got too many people with relational sequels, skills writing sequel, we want to stick with that. So that was the other thing they had to consider was the skill set of their existing people. Because they didn't also didn't want to choose a technology, that it was going to be very expensive to hire new people, and or train new people on. And so those were the sort of the decision matrix we came down to. And then we shortlisted a group of technologies, did a competitive Bake Off and a little pilot. And then from the major, help them make a choice.

Greg Irwin 10:44

Alright, let's come in jumping to it. It's, you know, I'm not sure if you mentioned it during the call, or in the pre call, but you say 70% of the projects you've done over the last year, Landon snowflake.

Michael Tantrum 10:58

Yeah. And it's curious, because there was a comment I saw earlier on when somebody said, you know, we look at the, the life of the the history of, you know, these data warehouses, we started with a mainframe, and then we go to these on premise things. And then we move to these MPP appliances and into the cloud. And the, actually, if we go back one step, there was also this big idea of Hadoop. Everybody was going to go for this on premise data lake thing. And the danger is that you go chasing the latest shiny object, the latest car going past. And so how do you pick a winner? And the the thing with snowflake, as it seems to have settled out is as as one of those winners that that last, and I think the reason is, is that the engineers who built it came from Oracle, they were the guys who built the Oracle Exadata MPP appliance. And they said, we can actually build from the ground up the ground up a cloud solution, specifically for data warehousing, and it just seems to have worked, worked really well. And so it seems to have caught the imagination of people. Not to say that the Microsoft, you know, the Azure synapse product is not good. And things like that, but for some reason, it seems to have been proven out, powered by people doing real world projects.

Greg Irwin 12:24

Let's let them get a get a pick at it here, Michael with you. Alright, so if you want cloud, and you and you're committed to cloud, and and now your comparison is synapse, maybe your Microsoft shop, or BigQuery, you like you'd like some of you like some of Google's machine learning tools? What's so great, but realistically, what's, what's the specific technical capability that snowflake has, that these others aren't matching? Or maybe it's structural? What's, what's that difference that you see in snowflake?

Michael Tantrum 13:06

Yeah, that's a that's a really good question. I think that the, some of the early ideas were quite inspired that the others are copying. So yeah, that idea of separating the the costs of storage, away from compute, and they just pass the cost of storage straight through from the cloud providers. So whether you're AWS, your Microsoft, your Google Cloud provider, you know, snowflakes, it's on all of them. And then the idea of the compute processing being completely elastic. So, you know, if all I use it every day, usually is just a small amount of compute. And then at the end of the month, when I run a big finance, you know, reconciliation or something, I need this massive processing, I just turn that on for the two hours, I need it. It's not like I need to have this big compute engine available all the time. So that elastic breathe in, breathe out, I think is is was a very inspired I do have these from early on. I think they also solved some of the security issues that people were really concerned about with Cloud early on as well, because that was a huge headache for people. People were saying, Can I trust it? I've got healthcare data. I've got, you know, banking, financial data, is it safe to put that in the cloud? You know, that's a that's a real concern for us. And so they spent a lot of time and a lot of effort, proving that out getting HIPAA certified from the government, and things like that. So yeah.

Greg Irwin 14:29

Much more data. Yeah. Oh, yeah. I'm sorry, y'all, I'm gonna mute your line. There we go. I'm happy for people to jump in. Now. You can wave and you can drop a note in just like you did. Yeah, we're gonna add some self and you don't have to just listen to Michael and Greg speak here. We can get some others in. Yeah.

Andy 14:49

I'd love to ask a question for something. Michael, thank you for answering so succinctly. i The two things I heard were separate compute and storage and and skill set of existing users. Fantastic requirements. I'm curious, in our world reevaluate re evaluating different cloud providers. And we're getting tech requirements, which are obviously critical, like from the, from the data engineering team. Right? How quickly can we port the ETL? Over and everything but also from the business? And I'm curious, was the were these business requirements? The business doesn't usually know what specific they wouldn't say something like separate compute and storage, for example, right. I'm curious what that conversation sounds like and what the what the what the other things on the list we're,

Michael Tantrum 15:37

yeah, if you think of your your types of skill sets here, on one end, you've got it who are concerned about structure and reliability. On the other end, you've got the business users who say, I just want, you know, businesses, I want it now. It says, You've got to get it right. And then the middle, you have these hybrid users, which are your classic sort of data analysts who have a foot in each camp. And so the, you're right, the business side, they don't care too much about the infrastructure, how it's done. But what they do care about is how quickly can you turn it around for me? And Will my people, my financial analyst, my, you know, my business analysts, what are they going to have to do to change because they've got enough on their plate already without having to do these big projects. So make their life easier? So that's that's a concern for them. The other thing is to say, you know, wider considerations, what does the organization already doing around cloud? You know, if you've already said, we're investing in the Microsoft, as your world of cloud, and they're going to host all of our business applications and things like that, then that's a strong consideration. Yeah.

Arvind 16:46

Michael, can I add something to it, Greg? This is Arvind. Can you guys hear me? All right, Andy, great question here a couple of examples, which is completely non technical, why people want to move to a cloud data warehouse, right? Being why Mellon, the largest asset management company in the world processing 45 trillion transact or $45 trillion worth of transaction, they moved all in on snowflake for one reason. So think about the ecosystem that they have around us, right. dny is an asset management company. They service very large, wealth management companies like LPL, financial, and big banks, like Citibank and Bank of America. Each one of these guys have a snowflake instance. So they can natively share data, it is not about the technology. It is about the power of data and analytics that can be shared natively in the cloud, very securely, supporting PCI PII, and other Sox compliance audits. And they can instantly turn on data sharing between entities as an organization's. So this forms a complete network of data sharing, right? Replace being why Mellon would supply chain in Walmart, they do the exact same thing. Walmart shares, data, inventory data, weather data, traffic data, all of this with their partners, their suppliers, on so on, and so forth, at a, you know, at a speed at which the data itself is created using snowflakes, native data sharing capabilities. Now, on the other hand, the new and improved, which I see quite a bit as data monetization, right, you can make money out of your data. This is what Kaiser Permanente is doing. This is what Mayo Clinic is doing on the healthcare side, again, with all the compliance that is being supported. So people are moving on from just looking at it as a technology play and moving into the world of how can I make money? How can I be competitive in the market? How can I even launch new products in the market as soon as possible? Using data as a competitive advantage? I see that quite often shifting from a hardcore technology play to a more of a business play.

David Brown 18:53

Really 1,000,000%, right, it's people beginning to understand are the greatest assets and the businesses that data? And how can they leverage that data if on the technical side, you can begin to understand and articulate what the ROI could be for that data. That's the language that gets modernization funded. That's the that's the language that speaks to the business side on why this needs to be a key strategic initiative for the organization.

Greg Irwin 19:23

But, you know, while we're going here, let's, let's shift it up a little bit. Andy, first of all, it's great to see you, you and I, we met years ago, we've met a couple times in person, what's your goal? What outside of going to the cloud and data warehouse from an operational perspective? Yeah.

Andy 19:45

Well, again, it kind of depends on who you ask. Like beyond the obvious things cost, etc. The goal is to kind of expand our data warehouse and make sure it's kind of it It's, it's, it's there. You know, it's it's future proofed for the next three to five years say, very realistically speaking, what we have right now is fantastic. But at the same time, it's not scaling. It's not taking advantage of just the entire cloud ecosystem, that and all the different things that are available, for example, on AWS or Google. And so just kind of getting getting into a place that'll kind of future proofed things a little bit and make sure that we can, you know, really leverage all just leverage cloud offerings. And I think the second thing, not so much a goal, but maybe it's a goal. But you know, we've just got a lot of legacy stuff in place right now. And just making sure that we get we're forwards compatible with every year backwards compatible, if you will, with everything that we've got.

Greg Irwin 20:53

Is there the actual is the plan the actual migration? Like a yes. thing sitting in an old Exadata not old data now. Now moving over?

Andy 21:03

Yeah. Yeah. So we've got a Vertica implementation right now, which is fantastic. It's it's suited our needs for the last several years. But we know we need to expand and so we're taking the opportunity to just kind of see what else is out there. And

Ani 21:22

you know, kind of, please, if I might jump in, what is your who's your cloud provider? It's all in on prem right now. It's AWS. AWS. Okay. If we said that was probably not gonna be the best.

Andy 21:42

We're not necessarily tied to AWS, though. Okay. Well, if we, if we have a GCP environment as well, um, okay, so it Yeah, yeah, GCP.

Ani 21:54

There's no doubt BigQuery is the best. I just did a comparison between them. The maybe snowflake will catch up at some point there, the hnbr capability between them. In the C++ platform itself is not that strong. They are using AWS or Azure for their VR capability. And some of the integration points, for example, if you have an existing tooling, that uses in our store procedures and things like that our SSIs a lot of the existing tooling. I mean, and also the API features. All that, in comparison, from a developer's point of view, the system expand. And they've already point of view, well, expandability works really well. But the pure serverless concept point of view, I would say BigQuery. Just just built in the GCP environment.

Greg Irwin 22:53

I'd like Ara I, first of all, thanks for chiming in with that. But can you just define better? Is it is it

Ani 23:02

it has a feel I learned I learned a lot. I learned a lot. But I don't know paper. Do you? unreserve my screen?

Greg Irwin 23:12

No. That's That's step two, for this conversation. Right? I got a highlight.

Ani 23:21

Yeah, I learned about I evaluated BigQuery versus snowflake on Tony parameters, including pricing hnd, our API's and all that stuff are totally different parameters. But this is this is specific to GCP though, because if you move to AWS and brown man, a big query doesn't. Here where you have to brace between the data coming out of AWS into the quarter, you have egress costs, they're coming out data coming out of AWS, so you have to take that into consideration. But if you're purely indiscipline brand, or you have opportunity, and you don't mind the egress cost of nine cents a gigabyte, coming out of AWS, then this, then be cool. It will be fine to

Harish 24:22

just add to that, you know, sorry, I joined in late. No, we had a successful implementation at Sunrun. There are a couple of white papers written by Google on the success story. So I think if you're looking at a long term perspective, right, I think so suddenly, you know, the pros and cons. There are eight features here versus nine features there. You can certainly go down that evaluation path, but also from a long term perspective. I also feel that if your company is is in the midst of doing some additional AI ml initiatives or taking advantage of the data sets what you already established from a corporate reporting or traditional reporting, right? I mean, these days, you have to stretch your envelope a little further, then you take advantage of GCP. You know, without any portability aspects, right? You could take advantage of TensorFlow libraries, that's nine yards.

Greg Irwin 25:16

You stay on that for one moment. She was really interesting. I was thinking about that. Because, you know, we've all heard and Michael and David, we've talked about this, the idea of bringing actual datasets to bear as part of snowflake strategy. And I'm not here to promote snowflake, but I love what they've been doing. But our GCP has done the same thing and saying, wait a minute, we have some proprietary information of our own. We know Glen click data better than anybody. You know, look at ways look at geolocation data. So Harish, I'm really interested to hear what proprietary data you found available through Google that made it made it more compelling than some of the others.

Harish 26:03

Yeah, I mean, geospatial was one thing, I think it's fairly rich in that aspects. But but also, if you're getting into some level of propensity models, or certain market demographics, and things of that nature, right, there are quite a bit of data sets freely available on the BigQuery. Like, you know, like, readily available for you to take advantage of it. So that is one piece, I think, we found it really rich to embed that part of our lead generation, but but also do certain analysis before we you know, do any campaigns in various market share, right? So, so our funnel got much better conversion rates and things of that nature. And so that is huge, humongous amount of, you know, public data sets, which you can take advantage of, without moving out right without getting out of the system. The other part is when you do the corporate reporting, right, your analysis of the data sets you already done, you know, 90% of your transformation and enrich the data into the right dimensions, right facts, and so on, so forth. So what the corporate get to see the numbers now, you, you're exactly taking those data sets, and taking it further, right. And so anybody who's looking at a particular department, and then say, a cycle times or certain customer related analysis, the corporate report is only interested in doing some high level, you know, corporate metrics. Here you're taking that's exactly the same data, and then for the nourishing it downstream. And so that is pretty handy, you know, kind of but yeah,

Ani 27:44

while we're on the topic, or is this is a this have been really, really been a thorn on my side. For last few few months, I'm trying to figure out how to, we have a particular web application that needs to hit my the summarized data from my big query about 1000 1000 requests per second that need 100 100 millisecond latency, even BigQuery bi engine. I have not tried that, but I don't think it's gonna be edge scalable. I'm trying to figure out a caching layer in between either Redis or Memcached, or something like that. I'm still trying to figure that part out how to solve this out to to my demanding API's and 100 milliseconds,

Harish 28:34

okay. You don't want to go down the BI engine path is what you're saying?

Ani 28:39

Yeah, be Caribbean's and I'm nervous, competent. I mean, it still can serve out much faster, but it still limits the 100 concurrent requests that there's limited to with Cory, right?

Harish 28:56

Not really, I mean, you have the option to other options to enable in case if you do need that, you can certainly run beyond 100 concurrent slots. It depends on the kind of slots you

Ani 29:09

not, not slots, 100 concurrent requests

Harish 29:12

correctly. So the PRA has slots. So slot allocation, if you want to scale up, there are options to enable where you can go beyond 100 queries.

Greg Irwin 29:23

Let's do this you guys. I appreciate it Ani and Holly, I didn't want to have it elsewhere. But I know the gallon in different environments with different requirements may go. In our limited time. I think we want to take it. I'll be happy to connect and try and help you problem solve offline.

Arvind 29:45

How notice that how the conversation changed from a technical conversation to what these platform providers offer, right? So GCP does a lot of these machine learning algorithms that's going to be helpful for you to accelerate your journey. As your integrates. Azure is more On the productivity side where they can integrate easily to your CRM to our teams, or all of these collective collaborative work can be done natively in Azure. And from what I understand they do more easy drag and drops and user interfaces they focus more on that are all the behind the scenes are left to the tech guys to go figure out. On the other hand, AWS was born with open source, they always embrace it, you can customize whatever you like in AWS to the enth level of detail, including the pricing. So the three clouds offer something different. Although they all offer everything almost the same. From a technical perspective, what I would recommend is there's also political factors that are typically involved in an organization, right? You already have some executive at the highest level know somebody at the highest level in these cloud vendors. And so they are best buddies they want to drink together, there's all of those factors that you have to pitch in as well. I think it's very, very minute when you go into just the technical details. Hareesh nailed it in the beginning of his conversation, when you said you got to look beyond just the tech side, because all of these vendors can do one way or the other.

Harish 31:09

I think I agree a second you that, you know, because most of them provide you eight to nine features. One may have 10 features, but most of them have nailed it. It depends on your cost and ability to scale and also the technical team who's going to support it, you want to minimize your tech stack complications, right? You don't want to over engineer it, right? You want to keep it simple, so that you focus on the product analytics, not towards the backend side of it. I mean, that's the best value you can provide to the business.

Greg Irwin 31:38

Let's cover let's say, I'm actually here.

Andy 31:41

I was curious, I had a real quick question. If there's time. If if based cloud provider was not an issue at all, if it was, for example, if we had our hands in both Google and in AWS and we could build out either, if that was completely off the table, and you're evaluating, say, Redshift and BigQuery. What does it come down to? I think your I, I like, I agree with your, you know, what you said about like, yeah, the features and the technical stuff, they can all be worked out, like, you know, they're always they leapfrog over each other. But just what drives a decision beyond that point.

Michael Tantrum 32:20

And the thing you want to think about, there's, it's not just the technology decision. And the there's a lot of people and culture that goes with this. And so when people look at the opportunity to, to say, if I'm going to take the time to modernize my data warehouse, it's technology certainly is one component of it. But the way we organize our people around the way we do projects, the way that we, you know, maybe it's an opportunity to consider agile if you're not an organization that's ever done agile very well. And what does HR look like in a data world as opposed to the traditional software development world. The other thing is, cost because a few traditional data warehousing has been done on a CapEx basis, I spend a big amount of money on on stuff up front, and then I maybe have a big labor cost to build this thing. And then I go down to our peaks, and it's maintenance. But on a modernization, maybe I want to fix everything. And depending on the type of organization you are, if your government or a nonprofit, often getting capex money is a lot easier than iPXE money. And so those are things to think about, you know, how do I fund my projects? So it's more than than just the technology and, you know, like, like, sort of some of the conversations here, I love getting deepen the technology. But I'm also very conscious that a lot of people here are saying, I'm starting to think about how do I do something different? How do I do modernization, and a lot of the things to consider are bigger than just technology questions.

Greg Irwin 33:56

We let's let's let's focus on two areas here. And we can cover a lot, we can go in deep, but I'm going to try and cover two areas. One is the actual migration, and how effectively we can stand up an environment that is effective works with the governance with the with the ETL with the automation that you need. So one is migration. And two is Michael's, I picked it up and I pretend it's mine, Hotel California, and egress. Because I don't think any anyone here can we'll debate for a second that data will grow. I think the question is, when the data is grown, and it's in that data warehouse, what's our ability to extend it to other applications to other platforms to other groups, to other teams to other organizations, and do it effectively and in a way that can actually be easily consumed outside of it. So I'm going to, I'm going to ask for hands. Let's do that. Okay, because I, I want to I want to broaden a little bit. And actually, you know what, I'm sorry, I'm going to be more more deliberate. CC. CC, I'm going to invite you, I know you're, I got from your comment, you're, you're an Azure environment or a Microsoft environment.

CC 35:18

Yes. So yeah, I just put a note up about, you know, we are in the process of migrating one of, you know, marketing, subscribe to this vendor to receive, you know, visits, right. And, and it's about billions of rows. And so we actually put that in a, in a managed as your, but it, of course, it ran out of capacity real quick, and they couldn't even query that the information. So we decided, you know, we are a Microsoft shop, and we do have the talent internally. So we went ahead and used as your synapse as our new, basically repository. And as we're pumping data into it, you know, we're getting a lot of problems. And just like what you said, as we grow this, this database, how? I'm kind of worried, you know, how will it? How will it? Will it? Will we be able to scale this up? I know, it's cloud, we can just, you know, add more compute to it. But it makes me wonder, Is that the best, you know, use of this data? Or should we leave it as parquet and then let them, you know, analyze through some other means to, you know, query through that, in that have to create a pipeline or an ADF to move that data. So, we went through a POC, we think it's going to work, but we're realizing now that we're pumping a lot of data in there, and we're kind of getting some, some problems, you know, connection errors, and then I wasn't even sure why that was. So just migrating, you know, one data set is already kind of a problem for us. So I want to hear from others if they've had to do something like that. We've been running this for about a week actually loading that data. Not sure if that's the best way to do this. But yeah, I want to hear from others, if they've done anything like this, migrating large data like that,

Greg Irwin 37:20

Michael, come and talk to us about the Microsoft environment, synapse, and any any integration issues that you've seen.

Michael Tantrum 37:29

Yeah. Some of it might be configuration. But, you know, certainly synapse is a significant player. And so I have heard some people when they've done a bake off hit the head of snaps against snowflake against, you know, things like redshift, or even the Teradata cloud version. synapse has had issues with, you know, sometimes with updates and things like that, I think the bigger thing to consider with migration project, or before you start a migration project is the things to think about is what has to get migrated. And the there's the three big lifts is number one is data structures. So if I'm already on an on premise thing, you know, like, let's say monetary data, or in your case, maybe you're on a SQL Server, and I've got to move data structures, I've got to get the structures out of on prem into the cloud, number one. Number two, is the data itself. Do I just do a full history load? Do I have to consider how do I move large volumes of data up? But number three, and this is the big heavy one is data pipelines? How do I change the way that my on premise or my traditional data warehouse is populated? How do I re replumb that to point at the cloud. And that is the most overlooked, but it's also going to be your biggest cost. And so just think about those three components. And because it's tempting to say, oh, we'll just change a change database platforms. It's more than that, because you've actually got to migrate an awful lot of other stuff

David Brown 38:55

with it. Surprisingly, Opie

Greg Irwin 38:59

in this context of data pipeline and data movement, we've talked about hay superior. We've talked about that a couple of ways, one better than than another. And I appreciate permissions comments there on BigQuery, specifically, but in terms of ETL, and data pipelines, you know, is there a best in class or an experience you can share with the group?

Michael Tantrum 39:22

And I would give you some principles to think about. So traditionally, people would have grabbed themselves an ETL tool, maybe a data modeling tool, I would suggest that you, you think about the world in terms of data acquisition. So how do I acquire data into my platform, and separate that from how do I transform and how do I model and the reason is, is that the cloud platforms cry out for an ELT approach rather than ETL. You want the database platforms to be doing the transform work, not your traditional tools like an SSIs or an Informatica on that.

Greg Irwin 40:00

I'm just not following the wire on that. Why do you want to do your transforms in the cloud?

Michael Tantrum 40:05

Because you've got this Compute Engine there. And so otherwise, the computers done on a separate structure on a separate machine. So you might as well use this beautiful massively parallel database that you're you've acquired. So think about those those three things separately. So the data acquisition, and the reason I say that is because the types of data that we deal with now has changed from five or 10 years ago, traditionally, it was I pull out of my ERP system. Well, now I've got to care about things like SAS applications, I've got to care about IoT data, streaming data, Clickstream data. And so I've got different types of data and different velocities of data. And so that's, I've got to think about how do I get that to my database platform. I've also got to think about the egress side when by users want to consume it. There's becoming more and more use cases for real time analytics. Traditionally, people or users have said, Yeah, I need to know as it happens. And you really drill into it, they don't, but it's becoming more common. So think about ingest Haddaway, acquire data. And then the other thing to think about is automation. There's a lot of really good automation tools out there for automating the build of your classic Campbell data Mart's and star schemas. There's some really good tools out there for that. Really good, automatic code generation things. You've got open source things like dB t, you've got players like we escape and Attunity had a really nice tool there. There's a new one out called coalesce, which is a a snowflake, only automation, modeling. cogeneration. tool. So

Harish 41:48

I don't need to be led. What was that? coleus?

Michael Tantrum 41:51

coleus? Yeah, see? Oh, ALESCE. They are they've just come out of stealth mode. They've got a big chunk of venture capital funding the base out of the Bay Area, and San Francisco Casa coalesce software.io is the website.

Harish 42:11

Messaging once you're off? Yeah, yeah.

Michael Tantrum 42:13

Okay, I'll put that in there. I've just had some really good meetings with their people. And it looks very exciting. But yeah, certainly consider automation, because the traditional thing where we throw armies of people at these problems is a bad way to use people. Using automation as the as a modern smart way to do it.

Harish 42:34

I think what we underestimate is the transformation logic, right? I think people get bent out of the back on the reporting side, on the ETL, or ELT side, right. But, but the core of BI revolves around transformation, right? If you can nail that and came in, in the product side, or the in the new generation side of marketing, ingestion, IoT, of course, the data volumes and all of there. But But to me, you know, the complexity of joining these data sets still revolved around some of the ERP side, and you know, where you're joining 50 Odd tables just to generate an aging report or an income statement, or cash balance, and so on, so forth, right? The complexity in connecting those dots, it is still left with certain SME knowledge with within folks who have the knowledge on the process side, right,

Michael Tantrum 43:25

and arratia, he did to pull on that thread. And that's a really good point. Because the thing that is actually the hassle, because that analytics, how users don't actually know what they need until they see it. And so they can't give you good requirements. So it's not just doing the transformation, but it's the change management, knowing that they're going to ask for something different. And so you've got to have a little bit

Harish 43:45

more flexible, but you know, connect these dots ahead of time, and make sure that they have the ability to drag and drop and build those. So the whether the physical model, or the logical model, which you're going to build should be really rich. Now that complexity is much more toned down when it comes to sales, marketing, lead generation and analytics. I mean, there are maybe 15 columns, 50 columns, you know, you can still arrive at any with less learning curve, you can still arrive at a conclusion to the business. But but if you did not have an SME in the SAP world, or the Oracle worlds of the, you know, you can't bring a newer person, write a Python script and expect him or her to figure it out. Right. So so the complexity has come down. But the technology gives you the additional scalability, right? You don't have to over engineer you don't have to over architect because the database in the cloud can scale up, right? You don't have to do aggregation partition, the whole nine yards is gone. So your dev cycle has reduced significantly, right? You stayed the data, you transform and you're done, right, this two step process, whereas in the past, you have to go down, you touch the factory We'll attach the dimension table. If there is a new request coming in, again, you're bending over to the back just to fix these problems. So your enhancements, you know, bug fixes and things like that would take a much longer time. And somebody who's only done it only knows that problem where the issues are better students. Well, you know, it's much more simpler, the technology skill helps you to scale those things. I mean, just to add on to what you were saying earlier, yeah.

Michael Tantrum 45:24

And I think the bottleneck should become that tribal knowledge. Because if the bottleneck is a technological problem, you've got the wrong technology. The bottleneck should be my human side, knowing, you know, the having people who know, what is what are the business trying to solve? What source data do I have available? What are the business rules and KPIs I want to develop? That's where, where the bulk of the effort should be. If you're, if your effort is in the technology side, then you're not being as efficient as you could be with that.

Greg Irwin 45:58

I had I had a Greg, I had a couple questions, students as well. Yeah. Hey, Andy, is it okay, I'd love to get got in and hear and hear your story. Scott, please.

Scott 46:13

Sure. Can you hear me? Okay, I'm getting some delays and things like that. Right. Oh, very quickly by experience, which is might be related to what Mr. is brought up earlier. The reuse synapse analytics in two ways. One is its dedicated SQL pool versus the sequel pool itself, which is a massively parallel process, right? The dedicated simple pool is the old SQL Data Warehouse, which seems like you were loading billions of records into you have problems, right? I'm not sure if you've done it or not, but your data should be partitioned. And you have to have a partition index assigned to it and you can load it in partition, without a partition and as your mat your MPP will be useless, because he cannot break the job and put it back together. Right. So it will just act like a serverless flat database. Okay, so I will look into that. Now, you also mentioned parquet files. Yes. In on the ELT side, as Michael mentioned, you know, we do drop a lot of parquet files in our gen two data lake, right. And I use the synapse SQL Servers schema and read on top of the parquet file and give it to the data scientists to do analytics for analysis, I should say early on, right? And then they can start looking at certain models, before we decide to even bring anything to the sequel anywhere. Right? So it's two flavors of it. The the sequel, the synapse, sequel poll can do schema you could do, you could do analytics. And it also takes advantage of the data lake, which behind the scene uses sort of a Hadoop type of a, you know, data structure, which would allow you to do MapReduce, you know, without you knowing it. It's sort of like that. It's not exactly that. Also, you could also throw your massively parallel processor at it. So it's two flavors, but I would recommend that you look at the partitioning.

CC 48:28

Yeah, thanks for that partition index. And I think what I was interested in is, you know, how you had a data scientist going through the parquet files just before, you know, considering creating an ELT, because sometimes you don't really need to load all, you know, sometimes you have just need aggregated information into those dedicated

Scott 48:50

the right, I mean, there are many ways I support a standardized, standardized way. I mean, people can get a RDS server in Azure and put a Python on it and get access to a parquet file. And then you can run, you know, train models, right and run feature engineering and things like that. Right. But we do do it right. I think we do. We build notebooks. And we we mount clusters. And we we ask him to use the standard SDK for Python. Right. So that would be the right way, because then it's shareable. And it's also has integration with ml ops, which you probably already have, which if you don't, you should bring it. Yeah. I mean, for data scientists, you got to get that practice of, you know, machine learning operation.

CC 49:48

Yeah. And I think that's one of the gaps that we have. I don't know if anyone else has that. You know, we don't have our own data scientists. We do have analysts, but even then, you know, I think we got to scale up our analysts

Scott 50:00

Sure, the only reason I recommend ml ops, because data scientists often get themselves into trouble forgetting what model they use, what data set they use, when they use that, and what the BNF feature results were. Whereas ml ops really historically commits all of those, and you can go back and, you know, retrain things as you need it. So that's really a powerful tool to get to get things under control. I talked about

Greg Irwin 50:29

shooting, and by the way, I'll happy to connect you guys off mine to go deeper. Yeah, absolutely. Thank you, Scott. Scott, I can you can you touch on this point that that Andy raised, it's the second item of egress? Have you found egress to be a real issue, at least through some of the the systems that you've that you've worked on, both in terms of near term? Hey, I'm having an issue now versus long term? Because I think the question is, if egress and, and and being able data mobility is one of your key criteria, is there a Preferred long term strategic vendor, who you think would be, you know, most friendly to data movement? Scott, what do you think?

Scott 51:25

Yeah, um, this in my, my experience, is that selection of a data vendor to help you with a migration on to, say, the cloud or data migration, it has to be an all encompassing, why? Because I have, I had to reject three proposals from very well known DNO partners, because they, they give you an estimate of work, without taking into consideration of all the securities cyber and, and other, you know, title issues that they will have to bring under control before you can do that. And then when you look at the statement of work, there's a whole exclusion page. That means Hey, you, you do it, and I'll do the lift and shift. Right. So I would, I will look for one that gives you a true comprehensive solution. I have found,

Greg Irwin 52:31

Scott, I'm sorry. I'm sorry, I'm actually thinking the alternative is after you've moved, now that you have your data in redshift, or snowflake, what platform is preferred. If my requirement is egress, low cost, low cost, flexible egress.

Scott 52:51

We use blob storage, if that's what you're ADLs gen two by nature is blob storage that has now you know, folder hierarchy and all the all that stuff in it, but pure blob storage with a private endpoint, which is a key thing to the security part of it. Sorry, I misunderstood the question for us I know is allows you to do three different flavors and it and it pushes it down to lower cost even archive, which is like one cent for five gigs, something really ridiculously cheap that you can archive and also you can have in a hot and cold type environment. So I would I would use blob stores. Yeah. As Azure blobs.

Greg Irwin 53:42

And I'd love to hear others as well. Scott. Thank you, Michael, your opinion others please jump in? egress is Yeah, and and where people are people are tackling it.

Michael Tantrum 53:53

Yeah. And I understand this question was around cost. And so one of the things that scares people with with Cloud is if I let my users just run whatever queries they want, they're just going to be running up compute costs, like crazy. And so how do I wrangle that? Some of the BI tools, handle that by Keishon. And so you know, you can actually take summarize datasets for more common dashboarding and things like that. I was find the worst people at the senior data scientists, because they think the rules don't apply to them. And so sometimes, those are the people you've actually got to do a report to say, this is what you've consumed in the last month. Will you just plan around or was this necessary?

Greg Irwin 54:39

Good. Sounds like governance. Sounds like your governance program.

Michael Tantrum 54:41

Yeah. And that's one thing. Yeah, and that's one thing we haven't touched on much yet. Which I'm, I'm curious about given the group and given the topic is that nobody's really said that. How do I how do I herd the cats? How do I heard the bees so yeah,

Greg Irwin 54:59

I Be curious if anyone has an answer. How do you how do you how do you hurt V's? You know, let's let's bring in one. We're just a couple minutes. All right, we will try. I'm gonna try one more Adam. Adam Cain. Adam xe, see you've been? You've been listening, I believe. Are you with us? Maybe not maybe listening quietly? How about how about Gary? Gary? Nice to see you. Are you on the line with us?

Gary 55:29

Hello, can you hear me?

Greg Irwin 55:30

We can indeed Yes, sir. All right, what's, what's your what's your two cents? You actually what's your one initiative? What's the one thing you're working on? Gary?

Gary 55:40

Um, well, no number of things, we are definitely looking, moving more toward Power BI and Power BI in the cloud. So I think that's, that's one of our main initiatives for coming here. Got it?

Greg Irwin 55:55

And what's the where's that data coming from?

Gary 55:59

Um, most of it from our internal systems, we have a variety of internal systems, we have, you know, a data warehouse, running on to do, but you know, a number of proprietary or, you know, one off systems as well. So, I think moving from the, you know, the older world of all these legacy systems, and even data coming from, you know, Excel files in SharePoint to to more of a cloud environment. So, you know, the, the data itself, you know, the cost of cloud security around the cloud, protecting your data, privacy and security, these are all concerns and issues that, that we're starting to work with. A lot.

Greg Irwin 56:46

It's interesting, we can talk about it, but the choice of the viewer in on on Power BI. I'm wondering if that it provides a major nod to to Azure as your data stack, given you already have to buy in on their governance you're buying in on their, on their security, and, and their skills? And are you just better off? Staying in the Microsoft stack? Curious,

Scott 57:13

that's my experience as well. Yeah. Yeah.

Greg Irwin 57:18

Let's let's do this. Haresh, that, thank you, I love that idea. I need to we need to create a an offline groups so that the dialogues Can, can continue. So I'm going to I'm going to, I'm going to follow up on that. I think it's a brilliant idea. Because you're right, look at us, we're, we're struggling to cover the basic questions in this one hour, we're truly struggling.

Harish 57:38

On the LinkedIn, typically, you can make a group and you know, within the same community, you can post or, you know, chat, right, which is much more, you know, I think, kind of a professional setup where you know, the folks profile and things of that nature, and that'll be helpful. Yeah, instead of pinging each one of us, you know, in some interesting talk, it'll be good to have a common

Greg Irwin 58:00

Well, this is good, because I get to, we get to see and hear and know that there's a human being there with, you know, with it builds a little bit of trust, but maybe, then we can flip it over to LinkedIn. So now that we know each other, we can keep the dialogue going. That's what I'm yeah, thank you. Let's do this. I want to thank David and Michael, in particular, these guys are co hosting Teknion is, you know, I, these guys have done dozens and dozens of these types of implementations. And they're a trusted source, we look to them to co host here. And I really appreciate it. So Michael, thank you very much, David, thank you very much. I'm always a great golfer, thank you all flew to Spain. Encourage people to connect here and take them up on bouncing some of the ideas or issues y'all y'all are thinking about with them out. Let's close with closing thoughts here for David, David and Michael. And then and then we'll wrap up.

David Brown 58:57

Yeah, you know, the only closing that I think I'd have we work with just so many different organizations and is specifically around some of the technologies. Worst thing in the world is to make an investment without a plan. And then you have a negative history of getting dollars or something that doesn't get used or does it create a solution? So a lot of times we'll start engagements with a strategic data assessment just help people understand what are the next three years need to look like? What problems are you going to solve, so that you can give a better sense to the organization that the investments you make actually going to have return on it? So just pipe you know, measure twice, cut once is a Texas phrase, and it's meaningful in the data world as well. So, yes, yeah.

Michael Tantrum 59:42

Arico that if you're having trouble making decisions, knowing where to start, you've got your toe in the water, but you're just not quite sure. We're really happy to help you just you know, find your way through the weeds and see the wood for the trees kind of thing.

Greg Irwin 59:55

Right. Gentlemen, thank you. Great, great to speak with everyone. I I really appreciate all the participation and Haresh. I'm going to take you up on trying to set up a little LinkedIn group to keep the dialogue going. Again, thanks, everybody have a great deck.

Harish 1:00:10

I also transitioned into a consulting service. So, you know, for my current job, so I'll be happy to be engaged in this forum number,

Greg Irwin 1:00:19

but that's great. Thanks for Thank you.

Read More
Read Less

What is BWG Connect?

BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution. BWG has built an exclusive network of 125,000+ senior professionals and hosts over 2,000 virtual and in-person networking events on an annual basis.
envelopeusercartphone-handsetcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram