CylancePROTECT was introduced to the SMU campus in 2016 as a way to further secure SMU systems against viruses and malware. As the University became regularly inundated with malicious files, employees could not be expected to keep up with the volume and complexity of these new threats. As the threats evolved, so did our method of protection – machine learning.
A new way to protect
In the past, companies would hire a number of workers to sort through information determining which files are inherently clean from those that contain threats. As millions of files, both malicious and non-malicious, are created regularly, they are overwhelmed by the magnitude of data, causing delays in response to threats and missing some altogether. SMU systems processed over 489,000,000 malicious emails just in 2016. Though advancements in the security world include vulnerability analysis, the underlying problem is that human perspective and bias interferes and errs toward over-simplification. As an alternative, Cylance’s solution was to use mathematical modeling and machine learning as a way for a secure future.
Machine Learning by Collection and Extraction
Machine learning, a branch of artificial intelligence, allows for the system to learn and to make predictions based on data it has processed. This happens through four phases: collection, extraction, learning, and classification. The collection process looks at the file type and determines what kind of file is being read. Once the process determines what the file type is (.doc, .pdf., .xls, .java, etc.), it is placed into three categories: “known and verified valid, known and verified malicious, and unknown.” This review process is done with hundreds of millions of files from industry, proprietary organizations, and active Cylance agents. The process then moves from collection to extraction where the unique characteristics are reviewed depending on the file type found. Using this method, Cylance removes human bias that would occur during manual classification of files. A file genome, much like what is done in human DNA analysis, is then created wherein mathematical models will be used to determine the characteristics of a file.
The Learning in Machine Learning
Once the extraction has completed, the learning and training phase of machine learning begins. In this phase, Cylance mathematicians develop statistical models to predict which files are malicious. Numerous models are measured and tested. Only those models which passed multiple levels of testing are extracted and put into production for classification – the final phase. For each file, thousands of characteristics of the filetype are reviewed to differentiate legitimacy and malware. Due to the diligence of the model testing, the review is completed with an extraordinary amount of accuracy. It divides a single file into a considerable number of characteristics and performs analysis against other files to predict normalcy. The analysis in the classification stage only takes milliseconds to perform. A “confidence score” is then generated, provided to the analyst so that they can determine whether to block, quarantine or determine if further review is needed for the file. With Cylance’s mathematical approach, the models quickly determine if the files are malicious or not.
From Learning to Practice
In 2014, a new vulnerability was discovered in Microsoft Word. The Microsoft Word RTF vulnerability allowed a hacker to remotely run code on an infected computer. Of the 51 anti-virus engines, only 4 detected the 0-day exploit. Cylance was able to recognize the exploit immediately, without the need to update.
You can read more about machine learning in Cylance’s whitepaper Math vs. Malware. For questions about CylancePROTECT, please contact the IT Help Desk at 214-768-HELP.