Artificial Intelligence: A Cybersecurity Tool for Good, and Sometimes Bad

Attractive to both white-hats and cybercriminals, AI’s role in security has yet to find an equilibrium between the two sides.

Artificial intelligence is the new golden ring for cybersecurity developers, thanks to its potential to not just automate functions at scale but also to make contextual decisions based on what it learns over time. This can have big implications for security personnel—all too often, companies simply don’t have the resources to search through the haystack of anomalies for the proverbial malicious needle.

For instance, if a worker normally based in New York suddenly one morning logs in from Pittsburgh, that’s an anomaly — and the AI can tell that’s an anomaly because it has learned to expect that user to be logging in from New York. Similarly, if a log-in in Pittsburgh is followed within a few minutes of another log-in by the same user from, say, California, that’s likely a malicious red flag.

So, at its simplest level, AI and “machine learning” is oriented around understanding behavioral norms. The system takes some time to observe the environment to see what normal behavior is and establish a baseline—so that it can pick up on deviations from the norm by applying algorithmic knowledge to a data set.

AI for security can help defenders in a myriad of ways. However, there are also downsides to the emergence of AI. For one, the technology has also been leveraged by cybercriminals, and it’s clear that it can be co-opted for various nefarious tasks. These have include at-scale scanning for open, and vulnerable ports – or automated composition of emails that have the exact tone and voice of the company’s CEO, learned over time by 24-7 eavesdropping.

And in the not-too-distant future, that automatic mimicking could even extend to voice. IBM scientists for instance have created a way for AI systems to analyze, interpret and mirror users’ unique speech and linguistic traits – in theory to make it easier for humans to talk to their technology. However, the potential for using this type of capability for malicious spoofing applications is obvious.

[Please also see our sidebar to this story Pumping the Brakes on Artificial Intelligence, which discusses artificial intelligence as an expanding threat surface.]

And meanwhile, the zeal for adopting AI across vertical markets – for cybersecurity and beyond – has opened up a rapidly proliferating new attack surface—one that doesn’t always feature built-in security-by-design. AI has the capacity to revolutionize any number of industries: offering smarter recommendations to online shoppers, speeding along manufacturing processes with automatic quality checks or even tracking wildfire risk and monitoring, as researchers at the University of Alberta in Canada are doing [for more on this, please see the sidebar for this story].

This dual-nature aspect of AI – a force for good, a force for evil – has yet to find an equilibrium, but interest in AI for security continues to grow.

AI has received plenty of hype when it comes to its applicability to cybersecurity. Because AI relies on analyzing large amounts of data to find relevant patterns and anomalies, it can be asked to learn over time what constitutes a false positive and what doesn’t within the context of a certain prescribed set of policies. As such, it can be an immeasurable boon for intrusion prevention and detection, for instance, along with fraud detection  and rooting out malicious activities such as DNS data exfiltration and credential misuse.

AI algorithms can be applied to user and network behavior analytics. For instance, machine learning that looks at the activity of people, endpoints and network devices like printers in order to flag potential malicious activity of rogue insiders.

Similarly, AI has a role to play in web behavior analytics, which examines user interactions with websites and acts as a complement to online fraud detection.

Click to Expand

For instance, if a user logs into a retail application, searches around the site, finds a product to learn more about, and then either saves that product to a shopping cart or checks out. That user now fits a behavior profile as a buyer. In the future, if that user displays wildly different behavior on the same ecommerce site, it could be flagged for further investigation as a potential security event.

On the DNS front, an AI system can examine DNS traffic to track when DNS queries go to an authoritative server, but don’t receive a valid response. “While this is difficult to prevent, it is easily detected,” explained Justin Jett is director of audit and compliance at Plixer, in a recent column for Threatpost. “When queries like 0800fc577294c34e0b28ad2839435945.badguy.example[.]net are sent many times from a given network machine, the system can alert IT professionals.”

Identifying credential-stuffing and misuse is another good example.  This type of attack is becoming more and more common as people’s emails and passwords flow to the Dark Web from data breaches. The Equifax breach for instance resulted in millions of valid emails being exposed; and in 2016, attackers made off with credentials for 500 million accounts in the massive Yahoo! data breach. Because people tend to re-use passwords, criminals will try different sets of emails and passwords on random machines in various contexts, hoping to get a hit.

To identify this kind of attack, “AI is useful here because the users have been baselined,” explained Jett. “Those users connect to and log in to a set number of devices each day. It’s easy for a human to see when a credential is tried hundreds of times on a server, but it’s hard to catch someone that tries to connect to 100 different machines on the network and only succeeds once.”

AI also can be used to automatically evaluate open-source code for potential flaws. Cybersecurity firm Synopsys for example is using AI to automatically map publicly known vulnerabilities to open-source projects, and evaluating the risk impact for companies; for instance, it automatically analyzes hundreds of legal documents (licenses, terms of services, privacy statements, privacy laws such as HIPAA, DMCA, and others) to determine the compliance risks of any detected vulnerabilities.

Yet another application on the vulnerability front is for retrospection and prognostication. If a new vulnerability is announced, it becomes possible to go back through log data to see if it’s been exploited in the past. Or, if it is indeed a new attack, the AI could evaluate whether the evidence deterministic enough to see what the next steps could be for an attacker.

AI also works very well for tedious, repetitive tasks – such as looking for specific patterns. As such, its implementation can alleviate the resource constraints faced by most security operations centers (SOCs), according to Greg Martin, CEO and co-founder of JASK. SOC personnel are fielding hundreds of security flags every day – not all of them actual attacks of course. This requires them to do things like alert triage, creating false negative/positive decision trees, swivel chair tool correlation, and implementing RSS and email list intelligence, he said.

“Security teams have always been overwhelmed by information,” said Scott Crawford, a research director at 451 Research, in an interview. “Information about what adversaries are doing, the latest attacker tools, malware variations and the ton of information generated by internal resources. In the intrusion protection space along the amount of log data and alerts that are generated is overwhelming. The SIEM market arose in part to address this, surfacing stuff in principle only when there are things that actually need to be dealt with—but it hasn’t been enough. So now we see the rise in new techniques for handling data at scale and getting meaning out of it with analytics and AI.”

Despite all of its utility in the security space, companies should be careful to understand AI’s limitations; these engines are only as good as the data that goes into them, and merely imputing data into an algorithm will tell an analyst what’s unusual, but not if it matters. The data scientist that establishes the parameters for the AI needs to know how to ask the right questions to properly harness the AI’s capabilities. What is the AI supposed to be looking for? Once whatever that is found, what should the AI do with it? Often, complex flow charts are needed to program the AI for the desired results.

To put it in concrete terms, it’s easy to train an AI to, say, pick up on the fact that an asteroid in the asteroid belt is moving oddly and anomalously. But if the goal is to know whether it’s headed for Earth, that requires fine-tuning.

And, with so much valuable company information flying around today’s digital workplace, a failsafe in the form of human supervision is a good idea. Simply setting the AI on network watchdog duty could have unintended consequences, such as overzealous quarantining of documents, the deletion of important data or mass rejections of legitimate messages – something that could significantly hamper productivity. For instance, in the previous log-in scenario hypothetically flagged by the AI, the employee could simply be traveling, so closing down access is probably not the best idea.

Click to Expand

“No machine can be perfect and account for every potential possibility of behavior out there,” said Nathan Wenzler, principal security architect with AsTech Consulting, in an interview. “Which means it still requires human attention, otherwise you may end up with a lot of legitimate things being flagged as ‘bad,’ or malware and other attacks getting through because they’ve been coded to be seen as ‘good.’ The algorithms involved can only be so good, and yes, refined over time. But, attacks will get smarter as well and find ways to circumvent the learning processes in order to still be effective.”

And, because there will still need to be people who can make sound judgment calls about the anomalies that come up, reducing the areas where human follow-ups are needed should be taken into consideration too. A manual investigation might be launched in the form of a quick email to that employee—which sounds like no big deal until one considers that there could be hundreds if not thousands of these types of anomalies happening every few minutes in a large corporation.

“The best approach to making the most of AI in cybersecurity is a combination of “leveraging supervised learning to identify granular patterns of malicious behavior, while unsupervised algorithms develop a baseline for anomaly detection,” Jett explained. “Humans will not be eliminated from this equation any time soon.”

The other side of the AI story is that as these engines’ capabilities become more powerful and widespread, cybercriminals have copped on to the fact that they, too, can leverage the technology — specifically to carry out cyber-attacks cheaper and easier than ever before.

For instance, AI can increase the effectiveness of attacks through, say, the automation of spear-phishing, using real-time speech synthesis for impersonation attacks and fraud, or for carrying out activities such as packet-sniffing and vulnerability-hunting at scale, according to the Malicious Use of Artificial Intelligence report. The report also noted that AI could also be used for exploiting existing software vulnerabilities on a mass level (e.g. automated hacking of tens of thousands of machines per day).

“The use of AI to automate tasks involved in carrying out cyber-attacks will alleviate the existing trade-off between the scale and efficacy of attacks,” the report authors said.

None of this is merely theoretical. In 2017, cybersecurity firm Darktrace an attack in India that used “rudimentary” AI to observe and learn patterns of normal user behavior inside the network, for reconnaissance. The activity could also start to parse specific users’ communications patterns, to be able to mimic his or her tone and style. This could be used for the automated composition of business email compromise messages, for example, that would be much more effective and convincing than the standard social-engineering attempt.

On a similar note, the Malicious Use report also noted that AI can be used to automate tasks involved in analyzing mass-collected data, expanding threats associated with privacy invasion and social manipulation and more.

“We also expect novel attacks that take advantage of an improved capacity to analyze human behaviors, moods and beliefs on the basis of available data,” the report cautioned. “These concerns are most significant in the context of authoritarian states, but may also undermine the ability of democracies to sustain truthful public debates.”

Another AI-driven development is the rise of botnet swarms, as seen with the recent Hide and Seek botnet. Hide and Seek is a self-learning cluster of compromised devices that is the world’s first to communicate via a custom-built peer to peer protocol. Traditional botnets wait for commands from a bot herder; swarms are able to make decisions independently.

“They can identify and assault – or swarm – different attack vectors all at once,” said Derek Manky, global security strategist at Fortinet and FortiGuard Labs, in a recent Threatpost  interview. “Swarms accelerate the attack chain – or attack cycle. They help attackers move fast. Over time, as defenses improve, the window of time for an attack is shrinking. This is a way for attackers to make up for that lost time.”

Perhaps to keep pace with the bad guys’ efforts, AI is coming into its own from a security standpoint, and companies are building it into their security offerings more frequently. Going forward, the emphasis is on more fully applying it to a rapidly accelerating and complex threat landscape.

“What we’ve seen is continuing sophistication of attacks, coming from a backdrop of security departments being under-resourced and not really knowing where they should put their spend and their people,” said Steve Durbin, managing director at the Information Security Forum, in an interview. “All of this is happening amidst an increasingly complex environment with more and more IoT devices taking feeds from various sources, plus there’s often a hugely complex third-party supply chain. AI is becoming necessary to get one’s arms around all of this.”

The Nirvana, he explained, is the capability to navigate this attack surface from end-to-end. From there,  the goal is to anticipate attacks before they happen or proactively head off cybercriminals before they get to the network in the first place.

There is some evidence that this is starting to happen. For instance, IBM – which employs its Watson AI and advanced analytics to monitor 60 billion security events on a daily basis – has developed an AI-based “cognitive honeypot” to proactively bait hackers into spending valuable time and resources on nonexistent leads. The technology lures malicious hackers in with email exchanges and interactive websites that divert their attacks.

“IBM with Watson clearly demonstrates what the capability of machine learning is, and what it can actually do from an overall business perspective,” Durbin said. “Information that may appear to be completely unrelated can now be tied together. However, vendors are in the early stages in terms of the maturity of their solutions. From an organizational standpoint, I am in favor of security departments running pilots and working collaboratively with vendors to develop tools going forward.”

Suggested articles