While artificial intelligence and machine learning are far from new, many in security suddenly believe these technologies will transform their business and enable them to detect every cyber threat that comes their way. But instead, the hype may create more problems than it solves.
Recently, cybersecurity firm ESET surveyed 900 IT decision makers on their opinions of artificial intelligence and machine learning in cybersecurity practices.
According to the research, “the recent hype surrounding artificial intelligence (AI) and machine learning (ML) is deceiving three in four IT decision makers (75 percent) into believing the technologies are the ‘silver bullet’ to solving their cybersecurity challenges.”
The hype, ESET says, causes confusion among IT teams and could put organizations at greater risk of falling victim to cybercrime. According to ESET’s CTO, Juraj Malcho, “when it comes to AI and ML, the terminology used in some marketing materials can be misleading and IT decision makers across the world aren’t sure what to believe.”
Looking past the hype cycle, IT teams can achieve real value from machine learning and artificial intelligence available today.
Types of ‘Learning’
Despite what marketing-speak says, there are different ways to implement machine learning – supervised or unsupervised learning.
In supervised learning, specific data is collected and defined output is used to create programs. This requires actual training of the system. In other words, a human must provide the expected output data to make the system useful. Most IT teams are reluctant to do this because it doesn’t remove the human from the system.
Unsupervised learning is what the market is looking for, as it does remove the human. You don’t need the output in this model. Instead, you feed data into the system and it looks for patterns from which to program dynamically.
Ask the Right Questions
Most IT teams want to simply ask broad questions and get results to queries like, “find lateral movement.” Unfortunately, this is not possible today.
But you can use ML/AI to identify characteristics of lateral movement by asking questions like “Has this user logged in during this timeframe?” “Has the user ever connected to this server?” or “Does the user typically use this computer?” These types of questions are descriptive, not predictive. They infer answers by comparing new and historical data.
Analysts follow an attack down a logical path and ask questions at each step. Computers identify deviations from baselines and determine the risk level tracing the anomalies. This is the intersection where machines and humans come together for better results.
What Can Be Done Today With ML/AI?
In reality, you must identify a strong baseline of the data structure to get value from ML/AI. Only then can you evaluate input data and make associations between the input data and the normal state of the network.
Here are threats that ML/AI can identify:
DNS Data Exfiltration
While this is difficult to prevent, it is easily detected because the system can examine DNS traffic and know when DNS queries go to an authoritative server, but don’t receive a valid response. When queries like 0800fc577294c34e0b28ad2839435945.badguy.example[.]net are sent many times from a given network machine, the system can alert IT professionals.
Credential Misuse
According to Verizon’s 2018 Data Breach Investigations report, humans are one of the biggest problems for organizations. Ninety-six percent of attacks come from email. On average only 4 percent of people fall for any given phishing attack, but a malicious actor only needs one victim to provide credentials.
Machine learning is useful here because the users have been baselined. Those users connect to and log in to a set number of devices each day. It’s easy for a human to see when a credential is tried hundreds of times on a server, but it’s hard to catch someone that tries to connect to 100 different machines on the network and only succeeds once.
While we are far from a type of artificial intelligence that can solve all cybersecurity problems, it is important to understand what’s real and what’s hype. As Malcho stated, “the reality of cybersecurity is that true AI does not yet exist. As the threat landscape becomes even more complex, we cannot afford to make things more confusing for businesses. There needs to be greater clarity as the hype is muddling the message for those making key decisions on how best to secure their company’s networks and data.”
Ultimately, the best solutions will be a combination of both supervised and unsupervised learning models: leveraging supervised learning to identify granular patterns of malicious behavior, while unsupervised algorithms develop a baseline for anomaly detection. Humans will not be eliminated from this equation any time soon.
(Justin Jett is director of audit and compliance at Plixer with roles ranging from system administration of web services to technical product marketing. He is a graduate of the University of Maine at Farmington and is an avid learner of all things security, with a particular interest in TLS and DNS attacks.)