The Rise of the Data Cloud
Jul 21, 2021 1:30 PM - 2:30 PM EST
Do you have so much data that you don’t even know how to begin organizing it? Are you having trouble sharing your data and knowing where it needs to go?
The companies Snowflake and Eide Bailly have teamed up to solve these problems. The team at Eide Bailly is pros at untangling messy webs of data, and the team at Snowflake is experts in sharing massive amounts of data anywhere it needs to go. Together, they can help you build trust with your clients by providing them with easily accessible quality data.
In this virtual event, Greg Irwin is joined by Kent Graziano from Snowflake and Nate Allphin from Eide Bailly to discuss the future of data sharing. They talk about the difference between a “cloud” and a “data cloud,” what quality data looks like, and how they can help with schema detection.
Snowflake provides a cloud-based data platform in the United States and internationally. The company's platform offers Data Cloud, which enables customers to consolidate data into a single source of truth to drive meaningful business insights, build data-driven applications, and share data.
Co-Founder, Co-CEO at BWG Strategy LLC
BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.
Chief Technical Evangelist at Snowflake
Kent Graziano is the Chief Technical Evangelist at Snowflake. Snowflake is a data platform that gives organizations seamless access to explore, share, and unlock data across the cloud. Kent has been with Snowflake since the first GA release of the company in 2015. He has been in the tech industry for over 40 years and primarily focuses on data warehousing and analytics.
Principal at Eide Bailly
Nate Allphin is the Principal at Eide Bailly. Eide Bailly is a top 25 CPA and consulting firm helping the middle market grow and thrive. It helps companies navigate compliance requirements, optimize operations, and invest in digital transformations. Nate has 20 years of experience in the data and analytics world. He co-founded Xerva, which helped companies that felt buried in data chaos. In 2019, Xerva was acquired by Eide Bailly to enhance its services.
Co-Founder, Co-CEO at BWG Strategy LLC
BWG Strategy is a research platform that provides market intelligence through Event Services, Business Development initiatives, and Market Research services. BWG hosts over 1,800 interactive executive strategy sessions (conference calls and in-person forums) annually that allow senior industry professionals across all sectors to debate fundamental business topics with peers, build brand awareness, gather market intelligence, network with customers/suppliers/partners, and pursue business development opportunities.
Chief Technical Evangelist at Snowflake
Kent Graziano is the Chief Technical Evangelist at Snowflake. Snowflake is a data platform that gives organizations seamless access to explore, share, and unlock data across the cloud. Kent has been with Snowflake since the first GA release of the company in 2015. He has been in the tech industry for over 40 years and primarily focuses on data warehousing and analytics.
Principal at Eide Bailly
Nate Allphin is the Principal at Eide Bailly. Eide Bailly is a top 25 CPA and consulting firm helping the middle market grow and thrive. It helps companies navigate compliance requirements, optimize operations, and invest in digital transformations. Nate has 20 years of experience in the data and analytics world. He co-founded Xerva, which helped companies that felt buried in data chaos. In 2019, Xerva was acquired by Eide Bailly to enhance its services.
Co-Founder & Managing Director at BWG Connect
BWG Connect provides executive strategy & networking sessions that help brands from any industry with their overall business planning and execution.
Co-Founder & Managing Director Aaron Conant runs the group & connects with dozens of brand executives every week, always for free.
Greg Irwin 0:18
I'm thrilled to have some, some co-hosts here with Kent Graziano and Nate Allphin Kent's joining us over at Snowflake and nature joining us from Eide Bailly. And we're talking the growth of cloud databases up many of you know how I like to run things, which is as interactive as we can, with a sense of kind of what people are doing with their own environments. What are the real use cases that you're trying to deploy? You know, let's clear the hurdles, let's maybe share some information on the successes. And and basically try and get towards the things that you think somebody in your seat and another shop really wants to know. So as always a little bit less sales and little bit more operational, you know, operational successes and challenges. I do love the chat. So please use it throughout. And also, I'm going to ask everybody to make a personal goal group that we pulled together here is is awesome. It's really a group of people who deal in the details and, and have responsibilities to deliver platforms that that work at scale for their organization. My point is, I look across the grid, and try and make one new contact out of this group doesn't have to be Eide Bailly or BWG or Snowflake. But look for one person across here to make a new contact and expand your network. So with that, let's let's get it going. Nate, you helped us pull this together. And let's have you come in with an intro. Please. Tell us about your team. over at Eide Bailly.
Nate Allphin 2:07
Great. Thanks, Greg, appreciate you. involved in pulling all this together and engraved for Snowflake for their partnership with us on this. My name is Nate Allphin. I'm a principal with Eide Bailly. We are a top 20 CPA firm based out of the upper Midwest. We have we are we are about 50 offices west of the Mississippi. And we have a large technology consulting presence which are which our data and analytics and data warehousing practice Poland are our tech consulting umbrella. Within that tech consulting umbrella, we have approximately, you know, 250 technology individuals and we deliver solutions all the way from VRP of NetSuite and CRM Salesforce Comm. And then certainly, in more in our practice, we deliver data warehousing solutions utilizing Snowflake, Azure, AWS tableau. And we also do some work with alteryx as well. So you know, we're we're a large technology consulting organization within a within a very large CPA firm. And we're excited for the, you know, the partnership that we have with Snowflake We are, we are a partner of Snowflake, and we deliver those solutions on a good number of our engagement. So excited to talk more with the group today and learn more.
Greg Irwin 3:41
Let's do it. Let's do it. Hey, thank you so much. Kent let's get you in here. Please give a give a little bit of your background. And really, the quick 30 seconds on what Snowflake is. And I know this group knows to a certain extent but at least check the box.
Kent Graziano 4:01
Sure. Yeah. So hi, everybody Kent Graziano. I'm the Chief Technical Eangelist at Snowflake. been with the company for a little over five and a half years now joined in the fall of 2015 after the first GE ga release of Snowflake. I've been in the data industry for over 30 years actually been in in tech for 40 now primarily focused in data warehousing and analytics for the last 25 or so years. co authored a book with Bill in back in the early parts of my career and have implemented data warehouses and analytics platforms and a whole bunch of different industries prior to coming to Snowflake and have experienced even more now that I have been at Snowflake and working with all of our customers all over the world. So you I'm happy to be here and looking forward to a great conversation with everybody. For those who don't know about Snowflake, I just say, you know, we're the Data Cloud, right?
Greg Irwin 5:09
Yeah. So I love it. Actually, that's perfect. Because there's a cloud, there's the sense of a distributed cloud database. And then there's the Data Cloud, which is quite intentionally different. Can you maybe explain what that difference is?
Unknown Speaker 5:27
Oh, yeah.
Kent Graziano 5:28
I mean, Snowflakes platform allows us to make the location of the data basically invisible to everyone, regardless of the cloud platform, we're supported on AWS, Google, and Azure. And our idea of the the Data Cloud is being able to eliminate all the silos that an organization has, have all their core important data in one place that they need, whether it's for analytics, or data science, machine learning, data engineering, you know, dashboards, all all that standard stuff, but across the whole breadth of it, you know, well beyond data warehousing, and be able to have that connected globally. Now, we have snow grid, now underpinning our entire platform that allows us to make the data visible in any region, or cloud, regardless of where religion originated. And that's the, I'll say the, the concept of the data clouded our founder Ben wall one time put it this way, he, he called it the worldwide web of data. Yeah, so and I really liked that idea of being able to connect it. And so it's a, we've always had this concept of federated databases and federated data warehouses and things like that. But it was, there was a lot of mechanics and a lot of work involved in trying to make that happen, and connect people globally in that manner. And Snowflakes, taking all of that away with our with our platform, it's it's virtually seamless for people to do that. The data sharing feature makes it very easy to even share with external organizations. So not only do you have the ability to share your data within your organization, but to be able to share it externally, with partners and customers, or even to the point of monetizing the data through our data marketplace. So it is a it's a it's a drastic change from where we were,
Greg Irwin 7:42
can I think from a technologist perspective, or somebody who has to implement this, I think probably everybody on the Hollywood Squares here can can think about 17 things 17 problems with extreme? Like, my first is governance. And, and where is the data really coming from and data ownership GDPR, I can think of lots of problems. But forget all that for now, let's let's put that to the side. And let's talk about how this actually, you know, opens up new opportunities. Let's take a more positive bent on it. And then we can talk about Oh, and here's how you deal with governance and authentication authorization. I'm sure there's a lot there. But let's forget about that. Tell us about what this means.
Kent Graziano 8:32
Well, it's enabling all kinds of, of use cases that people have not really thought out. Let me I'm going to bring up one, one graphic for us. To show you that. It's, it's real, right this, this is a picture of the interconnected data sets now in Snowflake across all of our customers. And the biggest one and on top of mine for everybody, of course, has been COVID. And you see the little blob there in the middle. And that is one of our partner companies went and curated and gathered all sorts of COVID data from the CDC, World Health Organization, various government agencies all over the world, and put it into a database inside of Snowflake and has made it available free on our data marketplace. So we have those dots are Snowflake customers that are accessing that data free of charge to augment their analytics, whether it's, you know, a government agency, trying to figure out, you know, how they're going to manage getting the vaccines out to people, if it's an educational organization figuring out when they were going to bring kids back to school. Private companies figuring out their supply chain And they're returned to the office or not all kinds of use cases just with that one little data set that you see there.
Greg Irwin 10:07
And I gotta say, first off, this is a beautiful chart in terms of data visualization. Anybody who likes the art of displaying data, somebody did some work here.
Kent Graziano 10:17
Our product team did that. And this this is this is this is data that's coming right out of our our internal data warehouse in Snowflake we call snow house. And so this is, this is live, this one was updated back in in April. And so yeah, no, it's a it's a great, it's a, you know, basically a knowledge graph of all the connections. And so we all the lines you see in there we call active edges. And the definition on that is it has to have had somebody consuming data, at least 20 jobs in the last three weeks and 20 jobs in the prior three weeks. So this is over a six week period that some that we've actually got people actually using it and querying the data from a shared source. And some of this is free, like the COVID data, some of its being monetized. So certain industries, we had this in healthcare, we've got a couple of partners in the healthcare space, that are making, you know, metal medically related data available, sometimes directly to their partners, not through the Marketplace, of course, because of privacy, but through private exchanges, and being able to charge a subscription fee for it. So it's it's opening up all sorts of new new use cases where companies that never thought they would monetize their data, because it's so easy and doesn't involve data transfer. They're able to maintain control and governance, as you mentioned, but get it out there and, and augment their ecosystem this way. It For Me, is evident in data warehousing, you know, for over a quarter of a century now, and this was this was the ideal dream for us was to be able to augment our analytics with third party data. But it was either very expensive, or very time consuming, or both, because you had to figure out how to extract the data from a source system from a partner somewhere, put it on secure FTP, and then you've got to bring it in and build the model inside of your infrastructure and do the data transformations and all of that just to join the data together. So with this, you've basically got a curated data mart that somebody else put together for you, it looks like a read only database to you. So you can just start and you can just start querying it and joining it to your data, there's no restriction on that. So you can, you can truly augment your analytics that way,
Greg Irwin 12:46
can one, one last one on this, and then I want to go too deep with the group we've got here at some of their focus. But one more on this, which is how pervasive is data sharing across Snowflake so that we can just get a sense of maybe it's maybe this is 5% of your customers that are now sharing data across the Data Cloud. So kind of where we are today. Your optimism in terms of how pervasive it can become. So let's look ahead a good three to four years from now. Yeah, well, I mean, hey, what do you think it's going?
Kent Graziano 13:26
Well, today is this sorry, says it's, there's 1000s of organizations already doing it. So you've got to at least 50% of our customers are doing data sharing in some form. While writing it might be just, it just may be a direct share with another division, there might be a private exchange with a supply chain provider, or it could be like the, you know, the COVID-19 data set that there's 1000s of people 1000s of organizations actually just accessing that data set, the all the little blobs you see on there. That's that's what all of this is. And it's I was involved very early on the very first release of data sharing we did with a health care provider. And they immediately they actually pushed us into creating the things that are called reader accounts. They wanted to be able to, they were collecting data from clinics that you know, subscribe to their service. And they wanted to be able to share that data back in an analytic form, so that they could do analysis on what was happening at the various clinics. And none of the clinics Of course, had the expertise to be building their own data warehouses or anything like that. And so they created reader accounts for them, and just send them a URL. And the clinic can click into that. And they did actually did a pre formatted a tableau dashboard for them. And they had 120 clinics, immediately accessing data. And this was like I said, this was probably the Three and a half years ago, I think when I first when we first came out with a really released data sharing, so it's, it's becoming more pervasive because once you get your data and you've got your data consolidated in Snowflake, then a lot of other things become much easier, you can start thinking about stuff like monetizing the data, or simply just sharing it with your ecosystem, and improving the communication and the time to value with the data. So I, I think that at some point, every Snowflake customer literally every Snowflake customer will be doing data sharing in some form in the Data Cloud.
Greg Irwin 15:39
Awesome, man. Arquette great, great intro. Let's go around our group here. So to figure out how to, here we go. We're clicked back. Let's bring some people in. And by the way, so there are some questions here that that have come in sabotage. And the thank you for those know, Andy, let me bring it to the near us. Yeah, and to hear their favorite Chuck, jump right in and give him a real quick intro. Mason, I'd love to just talk a little bit about one use case here. So okay, I'm gonna Okay, I'm gonna pull you into the conversation. Here we go, then, Senator. Okay, nice to meet you.
Andy 16:20
Okay, nice to meet you. So I work for Emory University, and the School of Nursing in our data science center, and I am the director of database development, which is kind of new, I've been there for about five years, and the director of the databases are building, you know, a lot of people do research, a lot of people put them in spreadsheets, we're trying to teach them put data in red cap, or something that's a little bit easier to use is developed for that. But we also have this project that I'm the lead on, which is called Project now. And what that is, is it's a database from the electronic health records of the identified data from 1 million patients. And the purpose of it is to teach nurses how to like, if nurses are at the bedside, they get this gut feeling about, you know, this, this would be better if we had this other protocol in place, instead of protocol a, we had protocol B. But you know, they go and they speak to someone and they go, Okay, yeah, fine. And you know, unless they have data to back it up, they really don't have a whole lot of like, many have a leg to stand on. So the idea behind this is to teach nurses, while they're having trying to learn everything about medicine, to also teach them about data science and the use of data. You know, they're the people who put data into a into the electronic health records, they can see how you can get some value out of it. So this is designed to be able to put, you know that you can begin to ask any question, basically, any research question, and then they can get it out of that. So anyway, so it is built on a MongoDB database right now that's on AWS, and we're all newbies to this, this is all a new big thing. This was supposed to be a small little project. And now it's, it went from 20,000 patients to a million. And, you know, it became crazy overnight. It's not real time or anything, but it's you know, we're getting there and about ready to be deployed. Now. It's been a four year process to get data cleaned, and figured out what we've got there and build, you know, nice generic web pages. So it's just point and click that nurses don't have to learn how to relate data.
Kent Graziano 18:32
They don't have to write it, they don't have to write any SQL themselves, no SQL
Andy 18:35
for them. They're, you know, they, I they Yeah, and everything, you know, it's very, you know, we have the same ID you know, even though let them counter ID in a patient ID in every single table. So all they have to do is, but look at it by patient, you want to look at it by visit anyone to look at this data, or what do you want to pull out of there you pull anything out of it, obviously. So no way. And we will be hoping to share this with other schools at nursing schools. We also have had, of course, and lot of researchers who are interested in it, because there aren't a whole lot of these available. So yeah, there's a lot of interest in it. And so anyway, so that's it. And this, I didn't even know what Snowflake I've heard I have heard of Snowflake, but I wasn't sure what it was exactly. But, you know, this might be something that I have some interest in just because of if it's a platform upon which we could be able to deploy. I mean, right now we just have a web page, and they, you know, they subscribe, and then they take it from there. But is there another thing we should something else we should be doing?
Greg Irwin 19:38
Maybe from an operational perspective, that maybe it's dated, maybe it's a data issue, data movement, data sharing, or an application issue? what's what's one big hurdle that you're you're you're trying to glue? I'm sure you have a lot of them. But you know, our
Andy 19:54
our biggest hurdle at the moment? Well, we have, we have one of them that said, Well, first of all, the fact that we're new To this whole big data process is printing natalist and our programmer, nearly two of us developing this and then we have some students with it. So it's a very small team, our programmer just left for medical school. Right now our biggest problem is finding a MongoDB developer who is seasoned. And his own, we're looking to hire probably a consultant right now not not an employee, necessarily, although we found an employee that would, you know, accept University wages, that would be acceptable. But
Greg Irwin 20:36
there's got to be team talent has got to be number one on everyone's list right now. Right, that clearly resume.
Andy 20:44
Right. And then also managing the subscriptions, you know, would be something managing when people sign up for it. You know, is there is there a tool that that we should be using that like the Snowflake have that ability? within the sharing? Does it manage some of that subscription stuff?
Greg Irwin 21:01
You know, that we want to do? who's accessing? Are they the right person to be accessing? Right? And
Andy 21:06
we have that we have that set up in the system, you know, but like, for example, billing, for example, we're talking about, yeah, we can build a university, but you know, per student that signs up, or we can, but researchers who want to come, you know, how are we going to charge them? How are we going to charge them? You know, and how can that be managed? externally? Or, you know, or, or somehow internally, but without personnel, without with a modicum of personnel, because it's a university and where, you know, universities are hurting now, after COVID.
Greg Irwin 21:38
So, me, I'm not going to put that one out to be solved by the group, because I think that would just create a flurry for about five or 10 minutes. Okay, do is I'm going to pause here, I assure you that there are a lot of people who have a lot of ideas on how to solve those types of problems. I'll open it up to follow up. And thank you for joining. And thank you for sharing the story. Okay. That's, yeah,
Kent Graziano 22:08
I'll give it the very, very short answer, Andy, is that yes, Snowflake could help with that. So we'll leave it at that. Thank you. That sounds good. We have a we have a lot of help. We have a lot of healthcare customers, we do have quite a few integrating EHR data to do analytics and building applications on top of it like you're describing.
Greg Irwin 22:34
Perfect. I'm going to go to the other extreme. I want to go over to Greg Irwin dieleman. And, Greg Irwin, I have a feeling that you may have a broader purview in terms of the data platforms that you're managing, would you would you care to give a quick intro to the group? Greg Irwin, are you with us? I think so. Can you hear me? Yeah, we got you. We got
Hi, I'm Greg Irwin. Nice to meet you. Nice to meet you. Yeah, so I work at US Bank. We're actually in the process of migrating from a Hadoop platform to the cloud and trying to work towards everything open source. So that's, that's kind of where we're at right now in our journey. Cool. Now Hadoop for what? What use cases? is this? Is this data warehouse? Or is this a data lake? training data? All data? What's the where's the scope? Yeah, really covers everything. So we're calling in our unified data analytics and AI ml platform. So really meant to cover you know, real time and streaming from a, you know, our systems of engagement. So online banking, mobile, digital, or, you know, really any channel, as well as our, you know, just your standard reporting analytics, as well as our AI and machine learning use cases. So, trying to cover it all. How's it? How's it going? Well, probably our biggest challenge right now is getting through some of our own internal security hurdles, and getting sign off there. Beyond that,
you know, everything being open source, you know, I think some of our other challenges are more in the
you know, figuring out when we have issues how to, how to get through those, you know, whether that's community forums or you know, and just the timing it takes to, you know, get a response back or figure it out on your own. Right,
Nate Allphin 24:54
right, Greg Irwin, maybe Good question. It when we talk open source, are you are you meeting for ETL type type work and, you know, what, maybe what, what, what? What drove you towards the open source path?
Greg Irwin 25:11
You know, we've had a lot of new leaders introduced to the bank from other industries, you know, trying to make a switch, right from being a bank to being a tech company that does banking. You know, so I think that drive a couple of things, one, contract wise, but to just the flexibility to take that and do what you want with it versus being locked in to what a vendor's roadmap looks like. Yeah, that makes sense. Got it. Excellent. Well, good. Good luck. That's it's ambitious, for sure. Yes. And I'm sure the team's energized, it sounds exciting. Let's go around, wanted to try and stir the pot now. So I'm going to ask the group here, play with the chat. So as we're talking, do me a favor, if the group could share with us? To what extent? What use cases? Are you leveraging distributed cloud databases, from whomever it could be? You know, irrespective of the vendor, I'm interested more in the use cases. And I'm particularly looking in the chat to see how heavy your workload, people are comfortable looking to a cloud. A cloud service. So if you as we go here, drop some stories in on the side. And I'm sure it'll sparks some conversation as we go around. I'd like to invite Eric avoid to maybe share a story and love Eric, I, we had folks from from tell us in the past. So I've heard of some of the stories. So I'm coming at it thinking Alright, I'd like to hear more about what TELUS is doing. That's why I called Yeah, sure. Well,
Eric 27:07
I mean, first off, I would say a lot of Thomas's focus is on customer experience. And so the data that we're looking for has several different branches. One, is the enhancing the customer experience, and how do you monitor and control that from a, from a from a, you know, outsource customer support type of concept. The other is the AI space. And so, you know, the group that I'm involved with, is heavily involved with annotation or in data fabrication for particular AI training purposes. So one of the big questions is really, in, you know, I think this is not this is something that comes up all the time is understanding the business case that you're trying to solve with, with a particular data application. And incidentally, I do have some experience with Snowflake in a previous previous role, there was a fantastic ability to kind of slice our data in a SaaS model, a vendor was handling assess service for us. And we were able to slice data about our own services effectively, without it was very easy for us to engage with the Snowflake interface. But I'm getting back to kind of the the, where my head is at, you know, is how do you how do you? How do you create good data? Or how do you evaluate the quality of data for a particular use case? You know, we we often know the old adage, you know, you go to war with the data that you have. But the question is, how do you know what that is? How do you evaluate it? How do you measure it? And often, that's an iterative process where you're experimenting with different tests, you, you evaluate the you know, can this data tell us something meaningful in this particular use case? And if it doesn't, exactly what characteristics is it missing? And how flexible is it to be able to attack that problem space with the data that you have? And if you need more data, or if you need to annotate data or connected, join it with some kind of human in the loop model, that what how do you evaluate how to do that? So that that is a that is often a tough case. And I guess the, you know, the the finishing closing thought, and I'm sorry for giving you more questions than answers here. But how do we how do we,
Greg Irwin 29:23
as a business owner,
Eric 29:25
I want to buy I want to buy insights, Business Insights, and I'm gonna have to spend a bunch of money finding out whether or not some data is gonna even suit a particular use case. How can I? How can I know that that's a good investment. As a buyer, that's, if I if I could get a handle on how best to to frame that conversation that would have both help me to sell the data annotation and data services that we provide, but also to help steer conversations about this in a constructive manner.
Greg Irwin 29:57
Eric, I love that question. I mean, we talked about databases with the, but maybe don't spend enough time talking about the quality of the data and the certainty in the data and building trust in the data. And its process. Right. So let's, let's, let's pick that one up. And that's something What can a Nate, let's go into what have you seen some of your clients do to help improve the trust with, with data or new data sources?
Nate Allphin 30:30
I was I didn't catch Eric's organization who's with? I'm with TELUS International. Tell us International. Okay. Yeah, I think, I think it's a great question. and quality of data is certainly, you know, becoming a big topic. And I think it's a little bit different. When you talk about, you know, going out and acquiring data and trying to determine will that data, you know, is that data going to be valuable and helping my organization? You know, we, a lot of the use cases that we have our, you know, with our clients is around, you know, controlling, trying to control the quality of data that's internal already, you know, they've got, you know, whether it's, whether it's sales, whether it's CRP or CRM information, that's maybe not, yeah, hasn't been, you know, fully baked when it was in the system? And so how do we complete the loop? You know, and, you know, when we get information into maybe an ETL type flow? How do we feed some of that, you know, partial information back to the business to allow them to, to maybe enrich and complete that information before makes its full trip all the way through the ETL process? So I think yours is an interesting one, where, you know, I think one of the things that can help, that data validation process is really having the ability as quickly as possible to bring whatever information you've acquired, or whether you whether you've paid for it, or brought it in, how quickly can I join that against what I've got internally, and the quicker that I can join that in and put it side by side and make decisions on it, the better. And, you know, the challenge that we see with 99% of our clients are just silos, you know, they've got data siloed within applications. And, you know, if you're acquiring a list, you've got to compare that against Salesforce, to determine how quality, you know, is what I've got in versus what I've already got in my Salesforce system. If you don't have that data centralized outside of Salesforce to some extent, either using that utilizing something like Snowflake, you're gonna have a very difficult time, you know, in a timely manner, really validating What do I have here? So I think the ability to, you know, take, whether it's EMP whether it's, you know, we see it most often, you know, in these scenarios, probably with CRM, where it's, you know, I've got my Salesforce instance, I've got people using it, but how easy Can I take that Salesforce information and combine it with other sources, and I believe that's where something like Snowflake gives you a huge advantage, because you've got that entire data set there already centralized ready to be, then, you know, a very easy SQL syntax can can join that up, D dupe, compare, analyze, how good is this? How much is it going to benefit us, and then hopefully have a good process for maybe potentially taking that data, feeding it back into Salesforce, and be able to, you know, make it actionable for the business. So those are just some of my thoughts on where I think cloud data in the cloud data platform provides you the flexibility and the ability to react very quickly. If you if you all of a sudden get a big data set, and you've all of a sudden got to go extract data from 10 sources and join that together, that's going to be a, that's going to be a very, very onerous process. But if, if a lot of that centralization and if you have that ETL process down, where, you know, we know that our critical business data is being centralized inside of it, have a centralized repository that we can work with it that makes those processes left on,
Eric 34:01
I can totally just endorse that value. I mean, I'm very familiar with that. I mean, if you don't have a data warehouse, and you're trying to do queries across different data platforms, it's a tangled mess, and it's a load a system. It's inefficient, you don't know what you're going to get. It's it's hugely beneficial. Bring all the data together, try to structure the the inputs so that they're at least somewhat standardized, and then be able to handle it and query it and work with it, model it outside of the repositories that system, so you just kind of use a bus to transport and synchronize, but you don't try to query across different platforms. That's a nightmare. Very, very bad idea from an operational standpoint. So I'm 100% on board with the data warehouse concept.
Kent Graziano 34:45
You know, one of the things that I have definitely seen for people who are looking at data quality in particular, is really coming up with some actual KPIs around that is like what do we mean by data quality? You were talking about, you know, how do you assure your customers of the veracity and the quality of the data you're providing them? You know, having a dashboard that has that on there to tell us things like how current is the data? What, whatever metrics you might have over over duplicate, it's magic matching or not matching changing from prior iterations of the data set, you have to come up with that, from a really kind of a data steward perspective, what what would our KPIs be? How, how would you use data to improve the quality of your data? Right, and develop a Data Quality Dashboard? That is visible? No. So you have the transparency. And I'm seeing a big increase in folks taking a DevOps or data ops approach to this. Implementing automated testing, whether it's data quality testing, or even automated regression testing on the on the data pipelines, right, as the data is coming in, you know, how often do we do we check that? Do we have regression testing going on, on the pipeline code itself? So as the developers are finding new sources, or maybe you're finding new business rules, and you do different transformations, so taking, taking a note from the test driven development, world of Agile in software is, if we know we're going to apply these rules, well, let's develop the test to make sure the rules are being applied. And that, you know, if we had 1000 rows in source system, a that we ended up with 1000 rows in the data warehouse, right? It's just simple things like that, right. And if it's a million rows, or in a billion rows, that's equally that's important. Because you don't want to be constantly check, having to check all these things, but you want to make sure it happens that the process didn't fail. And that's one of your metrics is, you know, how many of our, our load processes completed last night? How long did they take? What's the average time of completion? Are we seeing seeing that go up? Because we're getting more data, or something else going on? All sorts of things that you can you can think about doing and putting in place to, to verify the quality of that data.
Eric 37:19
Yeah, those are excellent ideas. Some of them I'm familiar with and and like the idea of writing your test case, before you even start is an excellent, agile based model for understanding the problem that you're trying to solve and verifying that you're solving it with the process or the whatever the whatever you're building to deliver that. So. Yeah, an excellent idea. And now anyway, I don't want to monopolize.
Greg Irwin 37:47
That's great, Eric, thanks for let's, let's keep going. I'm going to try and drive across a couple more use cases if we can. Aaron, Aaron Sorenson. Let's pull into the mix, do us do us a favor, real quick intro to the group. Tell us a little bit about some of your journey here around Well, we'll call it data clouds. Door. So I work
Aaron 38:14
for a mobile health company called the MOCA and we help patients improve medication adherence. We also were involved heavily in COVID, with like return to work, learn return to work efforts. So you couldn't to allow employees to monitor patients and I'm sorry, employees and students remotely via their cell phone to make determinations about if they you know, quarantined long enough to return safely to, to an in person setting. And so we are in a situation where we do data analytics, both for internal use. So we have obviously, we have all the normal things like marketing and sales and accounting and all that type of thing. But then we offer we sell our software for others do use on their own, but often they tell us that they don't have the human bandwidth to do so. So then we offer to do the work for them so that we engage with their employees or we're engaged with their students or we're engaged with their patients to try to help those people hit certain health goals and targets. And so, um, we use a dashboard, a dashboarding system called Domo. And when, and again, we create dashboards and analytics that are consumed internally, but we also have a web portal that our customers log into. And we use that to create analytics that are then embedded inside and white labeled inside of our own portal. And so we were, you know, looking for a solution A while ago, and we ended up with this with this system called Domo, which has a very robust ETL type of data transformation and large number of connectors and all that they do the dashboarding the way Tableau does, and then they store they have their own internal data warehouse that they store everything in. So when we were comparing them to others, you would it would be like a monolithic kind of one stop shop with them as compared to like three different products, though, for example, we were looking at Looker. So we would have had to buy we would buy Looker. And then we would probably go with like BigQuery or Snowflake as the data warehouse. And then we would go with the product like five Tran as the ETL tool. And so the interesting thing was so 111 of the one of the main detractors about the Domo approach is like, Oh, it's this proprietary database, and you have access through to as an API, but you can't just hook up other tools directly to the actual database and like write SQL against the actual database from outside tools. But um, I've heard rumors and they're totally unfounded, I have no idea if it will actually be true, I heard rumors that they are potentially looking to make Snowflake, their internal database, and that in the future, we could actually have the best of both worlds. So
Kent Graziano 41:36
yeah, and I can confirm that rumor. To a certain extent, I've actually been in conversations with Domo over the last year or so. And they definitely have a, they have a connector for Snowflake. Now. They've done some partnering with us. And that is, I know, it's definitely under consideration for them to switch over, we had the similar situation with thoughtspot, as well, because they used to have their own kind of proprietary database, and they switch to, you know, almost exclusively connecting to Snowflake, instead of that, because it's a it's an issue of scale. Right. And, you know, for certain use cases, then the the built in databases are great. Makes it like you said very, very simple for you. Yeah. But at some point, you know, I mean, I'm sure Domo would like to have larger customers with larger datasets, and they're not a database company. Right. And so it makes sense for them to partner with someone like Snowflake for the back end and make it even easier for you to to Walter not have to worry about the scaling.
Nate Allphin 42:46
Yeah, so that's, yeah, we've heard we've heard similar things as far as, you know, normals here in our back end based in Utah. So they're almost there in our backyard. But they there's some there's some interest in opening that up and making it more more accessible. It hasn't happened yet. But but I think you described the demo platform very well, which is, it really tries to put the pieces together versus versus, you know, looking at, you know, like you said five Tran Snowflake tableau, domo really tries to wrap it up and, and create it into one, one single one single kind of one stop shop.
Greg Irwin 43:25
Excellent. Aaron. Aaron, thanks so much. Nate, I'm going to come back to you for a moment if that's okay. I'd like to hear from you some of the projects you're working on. And the particularly in the context of I don't know if it's necessarily returned to work, but maybe some some things you're seeing a lot more frequently this year compared to years past.
Nate Allphin 43:49
Yeah, we saw some, we saw some return to work. We had a big project with the University in the upper Midwest, that wanted us and we actually utilized Snowflake to do tracking for it, this was on a college campus. So allowing the college administration to really see, you know, based on based on the location of students in where they were, I mean, really, you can think of it in terms of, you know, it wasn't down to like cell phone tracking, to see where people are walking around, but it was, okay, this student tested positive, they live in this dorm. If we pull this information together, we can start to get this heat map of the of the campus that shows us where we've got things that are happening, and we were able to pull the data into Snowflake. And then, you know, very quickly without, you know, I think one of the advantages I see with Snowflake, you know, I come I come from, you know, similar to can years of a data warehousing experience using, you know, whether it's OLAP cubes, things like this, that the amount of work that it takes to stand up that kind of infrastructure was quite onerous. So we could take some, some data in you know, Second some some what I would say pretty Raj transactional type information into Snowflake. And then you know, this scale ability allows us to visualize that inside of, you know, whether it's Tableau or Power BI very quickly generate, you know, visualizations without a tremendous amount of data modeling that needed to happen with that information. So we were able to use, you know, we were able to pull that information in and provide, you know, campus administration with, with these daily snapshots of how are we tracking in terms of, you know, positive tests and where people moving around. So that was a, that was a great project. For us, I would say another. Another really common thing that we're seeing more and more is just, you know, I'll give an example of a of a large organization that has been running a SQL Server on premise, data warehouse, very mature SQL Server Data Warehouse environment, probably one of the more mature installations that I've seen in my career, and they approached us, we've done some work with them in their SQL server environment, we, you know, we've, we've, we, you know, my private prior life worked a lot in that environment. But this particular organization said, We are long in the tooth on our technology, we need to migrate to the cloud, we don't have the expertise in house to do this. We need we need some help. And so, so we brought our, our team in with you know, and then the effort is really, it's not lifted shift. It's not like we're just spinning, spinning up VMs, we are creating a cloud infrastructure for their ETL and their analytics. And it's really cloud native. But there's work because they've got, you know, they've got OLAP cubes, and their people are used to using them. And so it requires some migration to make sure that we don't lose functionality for the end users. But, but that, but it's really this idea of, I understand data warehousing, I've been doing it, but now I need to do it differently. Because I've got to, I've got to get to the cloud, I'm done with my buy cycles with my servers, and I just don't want to do it anymore. And, and so that that process of not lifting and shifting, but migrating to the cloud has been one that we've seen, I would say more often than not that in the in the last
Greg Irwin 47:19
year taking longer to get to this, but they're logical center, like I think we all understand the benefits of having an amazing scale of the cloud, which, you know, provides an opportunity for, for, you know, more flexibility with your day to day job and get that and then the separation of compute and storage, right, enables that to basically scale cost effective. It's brilliant. The question that comes up a lot in my sessions is alright, but what about intensive workloads? What about if I really need to query huge databases? You know, what kind of performance or processing load is it gonna take to really deliver another? Let's, let's get let's get down to specking? And are you seeing, you know, what, true AWS is getting stood up at scale? Or are you seeing more data like, is there use cases that you're seeing that are kind of defining the kinds of projects that are landing in snow?
Nate Allphin 48:29
We're seeing both, we're seeing both and, and I, you know, data lake and I'm sure, Ken's got some thoughts on this as well. But we're seeing data, Lake, Snowflakes, a very easy place to centralize information. But we're also seeing those heavy analytical workloads that were traditionally, you know, either whether it was being done via, you know, on premise with Teradata, or whether it was being done with, you know, a lab we are seeing those workloads coming into Snowflake, and we're seeing organizations that that can migrate, you know, complete, you know, complete I mean, the end in those situations where we're migrating, we are not, it's not just a data lake infrastructure, I mean, we're migrating the entire btw structure to, you know, a cloud provider. And then most often, in our case, it's Snowflake. And, you know, you're obviously there's things that you've got to, you've got to take into account with, with those migrations, especially when you're talking about big, big, big data. But in most of the cases that we're dealing with, we're moving the entirety dw into these. We're not, we're not leaving behind legacy, you know, on premise systems to handle the Big Data pieces. Very cool. And then certainly Snowflake is not the only you know, solution there. But But there but but that's what
Greg Irwin 49:56
we're seeing a lot of Nate Thank you, Kent, I bring you in on this? Yeah, wants to know, hey, I have time series, I've graphed data I'm looking for, you know, this ternate is a brilliant database, and you can tune it. And I think it scares people that idea that they can't do the tuning. Tell us a little bit about new style. Yeah,
Kent Graziano 50:19
yeah, I know, having come out of, you know, near 30 years in the Oracle world, I definitely relate to that. And I spend a lot of time in my role at Snowflake talking to DBAs. And say, laying their fears. That Yeah, you know, you don't have to, there is no way to tune Snowflake. Effectively, it's a surface, right? But why is that? Okay? Well, we've got the metrics to prove it. We have massive companies like like Capital One, which has petabytes of data. And they very similar to what Nate was describing, they've migrated from Hadoop and Teradata and have data lake functionality going on. As well as data warehousing analytics, data science. They, at one point, they were streaming a trillion rows of JSON data into their data lake in Snowflake a day. Now, they're down to just doing about a trillion a month, now that they've loaded in the history. And so the scale is unprecedented. And they're able to get the compute resources they need to make it happen and the time they want it to happen, rather than, okay, we throw it all at the server, and we peg the CPUs, and then we just sit back and take a coffee break, because we're gonna wait, why, like churns, and you know, a couple of days later, you know, hopefully it finishes and doesn't fail. But in Snowflake, they're able to say, we need this to go fast. And at one point, our largest cluster was called a four XL, which is huge. And they came to our CEO and said, anyway, we can get a 5x. And now a couple years later, we now have a 5x, l and a six Excel, and a couple people always ask, Well, that's going to be pretty expensive. Like, well, yes, it is going to be pretty expensive, for the little period of time that they need to run it. But the point was, we're talking about business value now, right and time to value of the data, is they made the calculation on the ROI and said, if we can get all this data in, we can do provide better service to our customers doing true customer 360. b, they were loading in all the web logs, basically, from years of interactions with Capital One customers. And they made that calculation said, yeah, we can achieve our goals. If we do this, you know, we don't have to go spend a couple million dollars to buy the biggest piece of hardware ever, just to do this load, we can spin up before Excel analysis, Excel and Snowflake, run it for a couple of hours, get everything loaded, and then go about our business with the data science and data profiling, and getting stuff into dashboards in a much more cost effective manner. But I am seeing that just and I've seen this from day one. But it's even more so today, the hybrid environment where it really is, you know, data lake functionality, and then curated data warehouse functionality with a high level of governance and applications and dashboards. And it really is a emceeing a lot of really multi tiered architectures, right. And from the data warehousing world, we would call it a persistent staging area. Well, that's the data lake is the raw data from the source systems. The difference now is that in, in our world, it's not just structured data, you can use the semi structured data and the unstructured data, and load that all into Snowflake into that data lake concept, right. And then as you do the profiling, and the data scientists do their investigations. You find the data that's of value to the customer and to the business, and then move that into a curated data warehouse type environment, for the long term analytics, and for the governance and all the things that go with that. So and they can do it iteratively, right. There's no longer you don't have to, like do the whole thing. And then and, and then try to do one big bang, like Nate said, it's, it's not it's not a lift and shift. It's a it's a migration, I call it it's a lift and pragmatic reengineering of the of the original environment. We've, you know, we don't throw away stuff. That's good stuff. That's working great. You just need to scale you can move to the cloud stuff that wasn't working so good. Okay, let's figure out what's the right way to do that. But this ability to have the raw data and have a data lake here first, that gives people such a great jumpstart, because you always have to start with the base data, right? If you're going to have data quality, you've got to be able to trace back and you got to have that audit ability, which means you need the source data.
Greg Irwin 55:14
Count, I'm going to pause and your percent, we've got just five minutes left. So a reminder, this isn't the last chance to have a conversation, right? We're going to send around a list of everybody's name. And I'll encourage you, as I did at the beginning, to to make some contacts. And of course, if Eide Bailly or Snowflake can be helpful from that's, that's a big part of the reason we're here. But for last, last couple of minutes, let's field any last questions. Let's let's take use of the people that we've got here. And do me a favor, share with us a question you've got for Kent or Nate. And maybe we can do two strong questions. And we'll wrap up. So I'll ask the group to just jump right in. You don't have to use the chat, you can just turn on your mic. And I'm not afraid to pull people in. So I know, I can do that as well.
Aaron 56:12
I have a question. So you have a data lake? I'm glad Kent you offered some detail on how you define it. Because I think it's a term that a lot of people throw around. And if you were to ask them to actually define it in an intelligible way, they'd be unable to see your definition. So, so so does Snowflake have. So JSON can be like this terrible thing to deal with because it can be extremely hierarchical. And then you can have repeating values, and it can just be a nightmare to try to wrangle into a standard table structure. Does Snowflake itself have tools like snow, you could point Snowflake at a few representative JSON files, and it'd be like, Oh, it looks like this is your structure, you should create this type of table with these columns. Or would when you when someone like me have to rely on third party tools for that?
Kent Graziano 57:12
Well, that is actually coming we just announced, dropped a blog post the other day about our schema detection. And it's available now on parquet. Or an Avro JSON and csvs are coming. So exactly what you're you're describing to be able to load a file, load a JSON file into Snowflake and have it automatically detect the schema and build the table. And it will give you a it's gonna give you options on Well, do you want to go with this? Do you just like when you load a CSV into Excel, you know, align the data types, you can you make some changes, and then push the button and it generates the DDL and we'll do it all. In the meantime, though, I very first ebook I did for Snowflake several years ago was on analyzing JSON with SQL. And our syntax is some SQL extensions, very, very simple to use. I didn't know hardly anything about JSON, honestly, when I joined the company. And I learned about everything you just said arrays and nested arrays and all sorts of things. And in our sequel makes it very easy to pull it out. And many of our customers are just creating views. On top of JSON inside of Snowflake, we have a datatype called variant which is a smart data type that optimizes queries against native JSON. So you could just you just write a SQL statement case it in the view, and make it look like a table for your BI tool. And so you don't even have to do the ETL. So we really have, I think nailed it on making schema on read actually usable for also mere mortals. That definitely me. Yeah, well, I was in that category too. Because then I was, you know, I was a traditional relational database dude. And got into it when I when I joined Snowflake and as just an awesome feature. And that's I said, I ended up doing an ebook on it because I had to understand this. And it's, it's very, very easy to do. Do you learn the syntax in really literally about 10 minutes? So we do see that. But yeah, the schema detection is coming up for JSON specifically, but it's out in, I think it's public preview. Now, I was trying to remember because I just saw the blog post the other day for it for the other data types.
Greg Irwin 59:48
Guys, we're at our hour. So we're gonna wrap it up here. A huge thanks to Nate and Kent and again, please reach out to me These these guys in organizations for, you know, to go deeper on all of this. And, you know, it's interesting for me because we really are talking more about Data Cloud more than data lake and Ed Wu and try and redefine how how these systems are being architected is pretty exciting. Yeah,
Kent Graziano 1:00:20
I really do think of it. It's, it's, we got to talk about enterprise data platforms now. And enterprise data hubs. Yeah. To your point, Aaron, data lake is not a technology. It's a concept. Right. And that's where people mess up. They think it's a specific technology. And it isn't same as a data warehouse, right. It's a concept. It's an approach. It's an architecture, it's a framework, whatever you word you want to use. But really, we're getting yet with the Data Cloud into the really getting all the data in one place in the most useful form
Greg Irwin 1:00:57
for delivering the value. Excellent. Thanks so much for your time, mate. Thanks so much for your time today, Nate. Thanks, Kent. Thanks, everyone. All right, everyone. Have a great day, and I look forward to the follow up. Thanks, everybody. Thanks, everyone. Bye bye.