'Predicting the future' to stay one step ahead of hackers

A pair of computer scientists have developed an early warning system that can accurately “predict the future” and highlight which websites which are likely to fall victim to hackers, months before they make their attack

Community Health said in a securities filing that the hackers took identification data
Community Health said in a securities filing that the hackers took identification data Credit: Photo: Alamy

A pair of computer scientists have developed an early warning system that can accurately “predict the future” and highlight which websites which are likely to fall victim to hackers, months before they make their attack.

The servers which host websites are often targeted by hackers who use known but unpatched vulnerabilities to sneak in and place malicious code within web pages without the knowledge of their owners. Often these security holes could be patched simply by updating software to the latest version.

This embedded malicious code can be used to spread malware to any computer that is used to visit the website, or to carry out “poisoning” of search engine results to promote certain content using SEO techniques.

By analysing 444,519 webites with a total of 4,916,203 pages they could narrow down a list of the ones likely to be targeted by using machine learning and data mining techniques.

It is thought that the tool could be useful for search engines, which are keen to remove links to potentially dangerous sites from their results, or to create a blacklist of potential targets in order to warn their owners.

The software uses a variety of clues to determine which websites are at risk, such as their content, where they are hosted and which CMS they use. It also looks for keywords in the website content.

After predicting which sites would become targets, the researchers found that they had a true positive rate of 66 per cent after one year, and a false positive rate of 17 per cent.

The researchers said the results were “very encouraging” given that they were “essentially trying to predict the future”.

“Our implementation illustrates that even with a modest dataset, decent performance can be achieved,” they said.

The researchers are currently working on making their software publicly available.