Humans are now producing more data on a day-by-day basis than ever before. Every minute, users on the internet generate 2.5 quintillion bytes of data on average. The data varies from selfies and pictures of food, to network data created every time we visit a website or get a DoS attack.
As humans become more ‘datafied’ the cyber security industry needs to ingest and make decisions on this vast information every day. With the pace of of data creation showing no signs of slowing down, this means that the cyber security industry needs to adapt to identify threats faster and respond more quickly to more events that could make or break a business.
As the amount of data grows, we need to be able to better understand this information and make decisions from it. Promising technology like machine learning and neural networks dominate every sector of business today, but the problem is the very fact that they are still promising — not yet delivering in a practical sense.
It still takes a lot of work to create good insights from the information, often employing the need for a data scientist. And, while the data science field is growing at an exponential rate every year, it takes time and money to build and perfect these algorithms that many organizations just can’t afford.
Even after organizations invest in all this technology, it doesn’t solve the root problem of how we as humans interact with our data.
Today many interfaces are built around the technical challenges that developers encounter. To be frank, they look like they were created for computers, not humans. These interfaces are either a circus input elements like checkboxes and text boxes where you don’t even know what to type in them…
…or so complex you have to be a database developer to write a simple query.
As humans, we don’t communicate this way. The above query is actually a simple question you might walk up and ask a colleague: How many sales did we close each day last week? Asking this question of a computer is clearly more complex.
Although software engineering is becoming a more common skill, it’s not a valid reason for us avoid the root problem: Why can’t we ask simple questions of our data like we do of each other?
Another old but new technology emerging to deal with this issue is called Natural Language Processing (NLP). This technology literally dates back to the birth of computers in the 1950s, but until recently we have not been able to harness it well. Even with all the technology at our hands today, much like machine learning, it’s still emerging and relatively buggy.
Just try and ask Siri ‘What time does Best Buy open today?’ That’s a relatively easy question to process and understand but, still our technology is not quite there.
Now imagine trying to use a methodology and ask a open ended question on a petabyte of data about your network traffic today as you work to understand if there was a network attack. The level of complexity cyber security teams require of NLP goes above and beyond what our technologies are able to deliver today.
So it’s easy to see how human and technological limitations are currently holding security teams and analysts back from asking direct questions of our data. But how do we as an industry move forward to overcome these challenges? We’ll explore a variety of approaches and new developments in Part 2 of this post.
Austin McDaniel is a Software Architect with deep experience in building enterprise cybersecurity and data visualization platforms.
He’s a globally recognized leader in the software industry speaking around the world and authoring popular open-source projects used by some of the largest organizations in the world.
Austin has helped build cybersecurity organizations from the ground up and worked with some of the most prolific software and security organizations including Google, RSA, Department of Defense, MasterCard and one of the founding members of cybersecurity Security Orchestration and Automation (SOAR) platform Swimlane.