An attack on a high-performance computer (HPC) could stall a life saving pharmaceutical research of, for example, COVID-19 and have implications on national security. What does it mean to stand guard over supercomputers?
Some say that the global race to build a more powerful supercomputer is actually a race for knowledge. HPCs are performing pharmaceutical research, simulating nuclear tests, forecasting weather, and climate trends, etc.
The systems that perform high-speed computations are built on traditional hardware, and therefore many cybersecurity issues are similar. But the damage caused by a cyberattack might be irreparable.
Highly valuable data
“Supercomputers are used to study everything from nuclear weapons to diseases, to the electrical power grid infrastructure, and there are many other examples in which scientists are seeking to validate scientific discoveries or theories. So losses can accrue to national security and our understanding of the universe,” a computer scientist Kevin Barker told CyberNews.
Supercomputers are under a constant threat of attackers seeking to plant malicious software.
Together with his colleague Ang Li from The Pacific Northwest Laboratory (PNLL) he has developed a system to ferret out questionable use of HPC systems within the US Department of Energy (DOE).
“The results of computations run on supercomputers not only impact scientific discovery but may also influence national policy decisions and even public health policy. We have seen that this summer with attacks on European supercomputing centers that seemed to target ongoing COVID-19 research data,” Kevin Barker, director of PNNL’s Center for Advanced Technology Evaluation (CENATE), told CyberNews.
In May, European supercomputers were hacked in mysterious cyberattacks – criminals went after HPCs in Germany, the UK, and Switzerland. The attackers might have been after intellectual property or wanted to slow down the COVID-19 research.
A supercomputer being offline due to cyberattacks also costs researchers time and compromises our ability to conduct research, potentially with serious consequences. Many critical software applications may run for weeks or months on a large system; a compromise may require portions of computation to be re-executed and require potentially compromised data to be discarded. These costs can all be significant,
Kevin Barker said.
Some supercomputers may cost hundreds of millions of dollars and may be deployed for only up to 5 years. So it’s expensive to keep a supercomputer offline.
“A supercomputer being offline due to cyberattacks also costs researchers time and compromises our ability to conduct research, potentially with serious consequences. Many critical software applications may run for weeks or months on a large system; a compromise may require portions of computation to be re-executed and require potentially compromised data to be discarded. These costs can all be significant,” Kevin Barker explained.
Particularly attractive targets
According to Kevin Barker, cyberattacks that target central processing unit (CPU) or graphics processing unit (GPU) architectures may potentially compromise supercomputers, as well as many other types of computers. However, there are some key differences in the way supercomputers are used, and they are particularly attractive targets.
“Supercomputers not only provide huge amounts of computational power, but they also tend to work on sensitive data, often with national security implications. Attackers may not be simply attempting to steal the computational capabilities of supercomputers,” said Kevin Barker, adding that more often, attackers are targeting the data that supercomputers process.
He explained that some large-scale computing centers are used solely by one organization. It makes the task of vetting user jobs easier. But supercomputers at universities and within the U.S. Department of Energy are used by researchers across the nation, and, in some cases, around the world. It means that a wide variety of tasks from many different domains are being carried out.
“Specifically, it is often the policy that the users of a supercomputer system are vetted while the actual applications they run are not,” Kevin Barker explained.
The misuse of supercomputers could result in financial or information losses and have national security implications. For example, stealing time on a supercomputer may result in financial charges to organizations that did not use the system (i.e., research projects and organizations are typically billed for usage).
Attackers might invalidate scientific experiments by accessing and manipulating data. Corporate espionage could result in an unfair advantage to a competitor.
“Sophisticated attacks of international espionage could result in sensitive information being leaked to a foreign adversary,” Kevin Barker said.
According to him, cyberattacks on large supercomputing systems can come from many different sources, ranging from unsophisticated, unorganized attackers that are attempting to break into a system out of curiosity, to highly sophisticated attacks orchestrated by nation-states seeking to commit acts of espionage.
“Since the attackers are so varied, the defenses used to protect systems need to be aware of many different attack types. This is why Machine Learning is a useful technology to employ here – its ability to learn from attacks that it has seen allows Machine Learning algorithms to be trained to detect and flag or remove unusual behavior resulting from a wide array of attack types,” he said.
Machine Learning algorithms
CENATE has led the development of Machine Learning methods, such as recurrent neural networks (RNNs), to classify the distinctive signatures of authorized and unauthorized workloads. With a prediction accuracy of more than 95 percent, this open-source framework can assist system administrators in identifying and removing unauthorized workloads and intruders, assuring system availability and integrity for legitimate scientific users.
“Machine Learning techniques can help to secure supercomputing resources by learning what is appropriate usage and what is not, and flagging or removing suspicious activity,” Kevin Barker told CyberNews.
Machine Learning algorithms basically can build an understanding of “normal” and “abnormal” behavior.
“Machine Learning algorithms can sift through the huge amounts of data gathered (such as what components of the system are utilized, how much data is moved through the system, and even how power is consumed by different components of the computer) during the execution of many programs submitted to the supercomputer by many users,” he said.
Algorithms can identify unusual data transfers or accesses from unusual locations and remove potentially malicious software.
Ang Li and colleagues created their own data set from publicly available data from sources, such as GitHub, GitLab, and Bitbucket, to identify fingerprints of malicious activity such as cryptomining applications, password cracking activity, or longer-than-customary computer runtimes.