Summary
CylancePROTECT is no longer available to the SMU community and has been replaced by Microsoft Defender.
CylancePROTECT was introduced to the SMU campus in 2016 as a way to further secure SMU systems against viruses and malware. As the University became regularly inundated with malicious files, employees could not be expected to keep up with the volume and complexity of these new threats. As the threats evolved, so did our method of protection – machine learning.
A new way to protect
In the past, companies would hire a number of workers to sort through information and determine which files are inherently clean from those that contain threats. As millions of files, both malicious and non-malicious, are created regularly, they are overwhelmed by the magnitude of data, causing delays in response to threats and missing some altogether. SMU systems processed over 489,000,000 malicious emails just in 2016. Though advancements in the security world include vulnerability analysis, the underlying problem is that human perspective and bias interfere and err toward over-simplification. Cylance’s solution was to use mathematical modeling and machine learning as an alternative to secure a future.
Machine Learning by Collection and Extraction
Machine learning, a branch of artificial intelligence, allows for the system to learn and to make predictions based on data it has processed. This happens through four phases: collection, extraction, learning, and classification. The collection process looks at the file type and determines what kind of file is being read. Once the process determines what the file type is (.doc, .pdf., .xls, .java, etc.), it is placed into three categories: “known and verified valid, known and verified malicious, and unknown.” This review process is done with hundreds of millions of files from industry, proprietary organizations, and active Cylance agents. The process then moves from collection to extraction where the unique characteristics are reviewed depending on the file type found. Using this method, Cylance removes the human bias that would occur during the manual classification of files. A file genome, much like what is done in human DNA analysis, is then created, wherein mathematical models will be used to determine the characteristics of a file.
The Learning in Machine Learning
Once the extraction has been completed, the learning and training phase of machine learning begins. In this phase, Cylance mathematicians develop statistical models to predict which files are malicious. Numerous models are measured and tested. Only those models that passed multiple levels of testing are extracted and put into production for classification – the final phase. For each file, thousands of characteristics of the file type are reviewed to differentiate legitimacy and malware. Due to the diligence of the model testing, the review is completed with an extraordinary amount of accuracy. It divides a single file into a considerable number of characteristics and performs analysis against other files to predict normalcy. The analysis in the classification stage only takes milliseconds to perform. A “confidence score” is then generated and provided to the analyst so that they can determine whether to block, quarantine, or determine if further review is needed for the file. With Cylance’s mathematical approach, the models quickly determine if the files are malicious or not.
From Learning to Practice
In 2014, a new vulnerability was discovered in Microsoft Word. The Microsoft Word RTF vulnerability allowed a hacker to remotely run code on an infected computer. Of the 51 anti-virus engines, only 4 detected the 0-day exploit. Cylance was able to recognize the exploit immediately without needing to update it.
You can read more about machine learning in Cylance’s whitepaper Math vs. Malware. For questions about CylancePROTECT, please contact the IT Help Desk at 214-768-HELP.