Motivations and Customer Use Cases, Apache Kafka

  • December 22, 2020
  • In Kafka

          Let's look at some of the motivations people have for using a system like Apache Kafka, and what are some use cases? What are people in different industries doing with it? (upbeat music) Now, if you're watching this course, you're probably aware that there is a paradigm shift under way in favor of event-driven architectures. The basic idea here is that the way we've been building systems has been focused on state, on the current state of things, and really a focus When we design systems, we focus on data as things. There's very much a paradigm shift underway from that to a perspective of data as events. There are things that happen in the world, and our primary purpose is to process those events. This relates to a lot of different other architectural trends that are happening at the time, but it's a very real thing. And you can see by way of analogy, something like a newspaper, which is a static snapshot of the state of the world as of press time versus like a news feed, all the news tweets if you follow news accounts on Twitter, all those news tweets that might be happening in your timeline. Those tweets are more like events, little discreet descriptions of things that have happened rather than that summary that you'll see in a broadsheet newspaper, which is going to cause us to want a new kind of data infrastructure. 

We're gonna want a platform to help us manage this, not merely a file system to put things, not just somewhere we can store events, but really there are a lot of needs that grow up around that central storage that really comprise an entire platform. And that platform will unfold as this introduction proceeds. We need to be able to deal with these in real time. So our new computational tools that we build around that new data infrastructure are going to be focused on real time processing. Also, we're going to take this view that we want to store events for potentially a long time, maybe forever or as long as regulatory considerations will allow us. That idea that we're going to take events, store them in a log is going to become the new organizing principle of this data infrastructure with all of these other components built up around it. And lest I bury the lead, it is Apache Kafka that has become the defacto standard for real time event streaming. It provides all these things. Right, at it's foundation, it's a distributed log. So we can gather events from the outside world, store them in that log. And they're stored in a scalable and replicated and fault-tolerant way. We also have the need to integrate those logs with other systems. Not everything is an application built on top of Apache Kafka. And if you look at the things on this slide, there are these other systems in the world that need to talk to Kafka. 

Finally, when your data is stored as a stream of events, the way you run computations over that data is going to look different. We're used to this idea of let me execute a database query, get a results set back, do some things to it. Now it's gonna look different. We're going to write stream processors that transform events one at a time. And if we want to aggregate events and create some kind of snapshot of staple, right, stream processors that do that kind of thing too. And it's not just me saying this. Many of the largest companies in the world with some of the most complex and daunting information processing requirements, and in some cases, the longest history of legacy systems stretching back decades. These are all people who are on this same event-driven journey. You see some of the logos here on this page, and really you'll see a mix of traditional companies, airlines, banks, some of these companies who have been using computers for as long as anybody has been using computers commercially, and also some newer companies, some of the first generation of classical internet companies like LinkedIn and Netflix, and even some more recent entrants like Uber and Lyft. These are all companies who are heavy users of Kafka and event-driven architectures. And it's 35% of the Fortune 500, as of the time of this recording, use Kafka for mission critical applications. So that's not just a group of forward-looking developers in a skunk works playing with open source somewhere and doing some cool thing. This is real line-of-business applications that are carrying critical customer data, executive sponsorship, budget, visibility across the company. These are very much real applications of Kafka. And let me give you an idea of some of those use cases. Now I carry a credit card that has a logo on it that you saw on that slide, a few slides ago. It's a financial services company. They issue credit cards, and I'll tell you their real time fraud detection is something else. And it actually does use Kafka to get that done. And it really is real time. This used to be a thing that they would find out the next day. Oops, we had fraud, and now I get a text message or a notification with the application within seconds when there's a transaction that they think might be fraudulent. And even occasionally one that is. I'll get a notification, and I'm sitting at home on my couch, not using my credit card. So this is a good thing. 

Automotive, now cars are increasingly internet of things devices, right? High end cars these days, again, as of the time of this recording, will typically have a 4G radio in them and connectivity to a data network. And the many computers that are in that car. Yes, folks, your car is itself, a distributed system, and it still works. Those computers have lot of data that they're acquiring and things to report. And so often there'll be telemetry going back to a central headquarters. And even sometimes bi-directional communication to enable services that the car provides that actually make driving a more pleasant experience. But all of that is fundamentally evented data. And that stuff is typically built on top of an event-streaming platform like Apache Kafka. Real time e-commerce, now what's the story here? Of course, commerce is also full of events. There are people clicking on websites, searching for things, putting things in carts. Those are all events and you want real time analysis of that to know how products are performing. You want a real time comprehensive view of each customer. And that kind of thing is hard to do with traditional databases. Of course, it's doable. You can build anything with just about anything, but the problem lends itself to an event streaming platform, much easier, and people are finding that they're able to build systems like that with less engineering effort. And if you're the one building the system, that means you get to deliver cool functionality faster with less pain to get that stuff built. And that's huge. Customer 360 in general, even outside the world of e-commerce, I'll just stop and ask you. If you're a part of a large company, how many customer databases are there in your organization? Maybe you don't even know, maybe you don't know. Maybe you do know, and the answer is like six or seven. There are these different lines of business that all have different views of the customer, and different applications and different databases that are stood up around managing customers in some way. And they all have different views of things. If the business wants a single integrated view of that customer, of course, this is a thing. People have been talking about for a long time with like master data management and things like that. But we're finding that event-streaming provides very interesting solutions to this problem. And by the end of this course, you'll see how some of those pieces fit together and how that kind of thing might work. Banking, now maybe you're thinking I wired money to somebody recently, and it took two and a half days to clear. You're not going to tell me that's event-driven. 

Certainly, Banking is one of those industries that has legacy systems stretching back to the beginning of commercial computing. So there's a tremendous tradition of batch processing and antiquated data communication protocols. But this is changing. Some of the largest banks in the world are deploying Kafka at the core for payment processing. You'd be surprised how much of this is happening. And there's some really exciting standards being developed, so that it takes two days to transfer money in a bank situation. That's a little embarrassing these days, pretty soon that may not be true, and systems like Kafka will be at the center of that. Healthcare, a modern hospital is increasingly bristling with IOT devices. There are lots of medical devices that are connected to a network and are reporting back there. There have been some really, really cool Kafka use cases. For example, there was a hospital in Georgia that used Kafka and Confluent KSQL to process data from pediatric intracranial pressure monitors, a really, really neat use case, all Kafka-based because there are all these devices generating events and sending them somewhere to be processed and understood. And by processed and understood, I mean, we want to create information that helps healthcare professionals take better care of patients and lead to better health outcomes. And that's a real thing that's happening. Online gaming is another fundamentally event-driven thing. If you think about every player who's doing anything, they're moving a control, tapping on a screen, hitting W-A-S-D, whatever it is that they're doing to play the game, those are all events. And other in-game things, it could be in-game AI's that are moving around, and moving within proximity of a player. That's an event. All these things are events. And regardless of how the core game engine works, you need intelligence around those events. Whether you're just trying to do things like optimize in game purchases, by figuring out who's doing what right now, or it is a part of the application that drives the core game, a lot of this stuff in online gaming's fundamentally event-driven, and you have potentially very large numbers of events if it's a game that achieves runaway success. So this is another key use case for Kafka. Government, there are many, many public sector use cases for Kafka. That's not just the traditional three letter agencies that are processing events for the sake of national security or law enforcement, but really anything in government. A lot of the same things that drive Kafka adoption in the private sector like migration to microservices, those same drivers exist in the public sector. 

Financial services broadly as distinct from banking proper, again, tremendous drive to be real time and event-driven here. People are expecting to interact with financial services on mobile devices and whether you know it or not, you expect your mobile device to be a thing that responds to you right away. And a thing that in your pocket vibrates and tells you, Oh, there's a thing that has happened in the world that I need to respond to. None of that kind of thing happens in batch mode. Those are all events that need to be processed right away. Kafka lies at the core of a lot of the businesses that are providing financial services that are doing that real time migration, which these days is really pretty much all of them. So that gives you an idea of what people are doing with Apache Kafka and what kinds of things different industries are doing with it.

About Home Study

Technology and Life