Protecting Campus from Malware with Machine Learning

Summary

CylancePROTECT is no longer available to the SMU community and has been replaced by Microsoft Defender.

This service has ended, and CylancePROTECT is no longer available to the SMU community. Microsoft Defender is now the preferred anti-virus platform for SMU. More information is available at smu.edu/defender.


CylanceCylancePROTECT was introduced to the SMU campus in 2016 as a way to further secure SMU systems against viruses and malware. As the University became regularly inundated with malicious files, employees could not be expected to keep up with the volume and complexity of these new threats. As the threats evolved, so did our method of protection – machine learning.

A new way to protect

CylancePROTECTIn the past, companies would hire a number of workers to sort through information and determine which files are inherently clean from those that contain threats. As millions of files, both malicious and non-malicious, are created regularly, they are overwhelmed by the magnitude of data, causing delays in response to threats and missing some altogether. SMU systems processed over 489,000,000 malicious emails just in 2016. Though advancements in the security world include vulnerability analysis, the underlying problem is that human perspective and bias interfere and err toward over-simplification. Cylance’s solution was to use mathematical modeling and machine learning as an alternative to secure a future.

Machine Learning by Collection and Extraction

Machine learning, a branch of artificial intelligence, allows for the system to learn and to make predictions based on data it has processed. This happens through four phases: collection, extraction, learning, and classification. The collection process looks at the file type and determines what kind of file is being read. Once the process determines what the file type is (.doc, .pdf., .xls, .java, etc.), it is placed into three categories: “known and verified valid, known and verified malicious, and unknown.” This review process is done with hundreds of millions of files from industry, proprietary organizations, and active Cylance agents. The process then moves from collection to extraction where the unique characteristics are reviewed depending on the file type found. Using this method, Cylance removes the human bias that would occur during the manual classification of files. A file genome, much like what is done in human DNA analysis, is then created, wherein mathematical models will be used to determine the characteristics of a file.

The Learning in Machine Learning

Once the extraction has been completed, the learning and training phase of machine learning begins. In this phase, Cylance mathematicians develop statistical models to predict which files are malicious. Numerous models are measured and tested. Only those models that passed multiple levels of testing are extracted and put into production for classification – the final phase. For each file, thousands of characteristics of the file type are reviewed to differentiate legitimacy and malware. Due to the diligence of the model testing, the review is completed with an extraordinary amount of accuracy. It divides a single file into a considerable number of characteristics and performs analysis against other files to predict normalcy. The analysis in the classification stage only takes milliseconds to perform.  A “confidence score” is then generated and provided to the analyst so that they can determine whether to block, quarantine, or determine if further review is needed for the file. With Cylance’s mathematical approach, the models quickly determine if the files are malicious or not.

From Learning to Practice

Microsoft Security Advisory 2953095

In 2014, a new vulnerability was discovered in Microsoft Word. The Microsoft Word RTF vulnerability allowed a hacker to remotely run code on an infected computer. Of the 51 anti-virus engines, only 4 detected the 0-day exploit. Cylance was able to recognize the exploit immediately without needing to update it.

You can read more about machine learning in Cylance’s whitepaper Math vs. Malware. For questions about CylancePROTECT, please contact the IT Help Desk at 214-768-HELP.

Print Friendly, PDF & Email

Published by

Ian Aberle

Ian Aberle is an Adobe Creative Educator and the Senior IT Communications Specialist & Trainer for the Office of Information Technology (OIT). For over 25 years, he has helped the SMU community use technology and implement digital and web media through multiple roles with the Digital Commons, SMU STAR Program, and now OIT. Ian enjoys photography and road trips with his family in his free time.