Metamorphose your business to Kafka?

Event streaming and fast realtime data storage explainer.
11 July 2022

Get real-time data from multiple sources with event streaming.

If you’re thinking about using Kafka and getting into event streaming, it can feel a little daunting at first. After all, Kafka is technically a whole different way of thinking about information exchange and manipulation than the last handful of decades has taught you. The recent past in programming terms is not exactly the Jurassic era, but it is an age when databases strode the Earth. Like dinosaurs, they came in a whole range of sizes and friendliness levels, but the big ones were lumpy, clumpy, fairly set in their ways, and altogether less efficiently shaped than was ideal.

The dinosaurs, of course, went extinct, unable to deal with a radical change in the world around them.

That’s as far as we’re stretching the metaphor, we promise, but if the dinosaurs had been engineered more like Kafka, they might have survived. Kafka’s a shift away from the monolithic transfer and interrogation of data-sources into something altogether more fluid and flexible, and able to adapt to your particular needs in real time.

The Way Kafka Works

Instead of monolithic databases, it breaks information storage and manipulation into events, rather than “things with states.” Events of course have states too (and they’re not necessarily what you’d think of as “events” in the wider world), but in Kafka, the events and their states are stored not in monolithic databases, but in highly expandable, low-drama logs, which are, pretty much, plain text files. And whereas with databases, you need fairly large programs to update, manipulate and output data on things and their states, with Kafka, you can use lots of much smaller programs to do precisely what you want to do to the events stored in your logs.

That’s where a lot of companies are going these days, because the clients are usually easily downloadable, and save you writing tons of code to get your event data from A-B and back again. In fact, more than 80% of Fortune 100 companies now use Kafka, and its benefits are giving a wake-up call to the SME sector, too.

Event Streaming: The Basics

Event streaming is a fancy name for capturing data in real-time from any event-source you choose – databases, mobile devices, cloud services, software applications, and more. But it’s a fancy name for an impressive process – being able to capture all that data from multiple sources (hardware, software, cloudware and more) is a significant step forward from the age of the database. Because data is quick to record (a line of text in a log file rather than executing a SQL statement, for example), its granularity can be down to the nanosecond, in theory.

Any time you capture multiple pieces of data-with-states – multiple events – from several event-sources, what you have is an event stream. You can then store your selected event streams durably for later use, you can manipulate the events, process them, even have other hardware or software react to them, all in real-time. Or, with the option of durable storage, you can capture them in real-time now, and work on them retrospectively – the model most commonly deployed.

The Beret Example – And Beyond

So, if, for instance, you had an inventory system and you wanted to check sales of raspberry berets within a particular time period, you could easily use a small program to isolate sales invoices for raspberry berets in the last week. With the right modular programs, you could add almost infinitely to the complexity of your interrogation. Ralph Lauren raspberry berets, but not Chanel raspberry berets? Easy – you use the programs to interrogate and manipulate the identified raspberry beret sales events within the time period of interest. Ralph Lauren raspberry berets with the promotional shades? Again, the process is simpler and more fluid than interrogating an old-style database, potentially overnight, for sales data reports.

That’s a peculiarly beret-driven example, but the flexibility of event streaming in Kafka means it can be used to do a whole range of things for an enormous variety of businesses:

  • You can use it to process payments in real time, which makes it useful for stock exchanges, insurance transactions, etc.
  • The logistics industry uses it to track vehicles across a fleet in real time, allowing companies to make time and money savings throughout the logistics chain
  • Internet of Things (IoT) devices can supply continuous data for analysis through event tracking. From factories to wind farms, that’s useful in understanding the states of equipment and the productivity of locations, especially in settings where highly precise measurements are required according to a highly detailed timescale
  • The hotel and travel industry uses event streaming to capture, analyze, and react to customer interactions. The real-time capabilities of event streaming make this a particularly useful application, as issues can be analyzed and resolved fast, often while the customer is still present
  • Hospitals and other healthcare facilities use event streaming to monitor patients, using the real-time event streams to predict changes in condition and necessary treatment
  • It can also make data available from various departments within an organization, much more rapidly and accurately than an old-style database could do. Combining lots of data sources into one stream is a big plus in many settings.

The variety of use cases in which event streaming can bring productivity, analysis, or even profit advantages means it’s a technology on the rise. It’s easier, more flexible, more fluid, and more scalable as a method of storing and manipulating data than old-style databases. And of course, you can get your events from a whole range of sources, rather than from a single monolithic database – with the bonus that the chances of getting more accurate and up-to-date information on which to base your business calls goes up massively.

Event Streaming As A Nervous System

With event streaming, you not only get to take data from a wide range of sources, but you can also output it to a wide range of places simultaneously too. That’s part of the power that event streaming represents. It helps you build a live, real-time, always-on information flow about almost anything you could possibly want to know. Event streaming has been likened to a human nervous system, and it’s a comparison that bears examination. If you step on a Lego brick in the middle of the night, you don’t wait 12 hours for the information of pain to travel to your brain, and then be emailed to your mouth before you say “Ow!” With event streaming, you have the business-focused version of the lightning reaction to an unseen Lego brick.

Event streaming helps you pull, contain, store, manipulate, and output the data you need at any time, from a wide range of sources to a wide range of locations, and so, usually gives you much more fluidity and reaction-speed in a business context.

See? Kafka would totally have outlived the dinosaurs.

The Technology of Kafka

Kafka itself is a clever way of handling the event streaming process. It’s made up of a distributed system of servers and clients, communicating through high-performance, fast TCP.

As with event streaming itself, Kafka is flexible, in that you can use it on recognizable hardware, on virtual machines, on-premises and in cloud environments. Your Kafka implementation can be run through various data centers or cloud regions, increasing that flexibility still further. Usually implemented through a cluster of Kafka servers, the principle of scalable logs repeats in Kafka’s architecture, too – Kafka clusters are designed to be highly scalable and fault-tolerant, meaning if you suddenly get a failure on any server, the others will take over, maintaining your integrity with no data loss. Anyone who’s attempted to rescue data from a crashed MySQL database will value any alternative that’s more resilient.

Clients are those small programs that let you write applications and microservices, so you can that read, write, and process streams of events, both in parallel, and at scale.

The Third-Party Option

If you go for Kafka, you get some of those clients included as standard, and the Kafka community has also published a whole lot more in a whole range of languages.

The downside, if there is one, about event streaming and Kafka is that you have to know what you’re doing when you set it up. Yes, the system is designed to be as intuitive as possible, and the library of available clients helps boost that intuitive feel, but lots of companies host their Kafka setup in the cloud, and bring in a third-party company to manage the Kafka service, from setup to maintenance.

Whether you go it alone or employ a third-party expert to handle your Kafka implementation, there’s little doubt that it’s a technology to which many businesses are turning because of the unique approach it has to data manipulation, the real-time data-capture feature, and the highly distributed network it uses to process events.

Is Kafka for you? Probably, at some point. The real question is whether you can make use of real-time event data now, or whether you’re going to be fashionably late to the Kafka party.