Machine learning has never been more prevalent or accessible than it is today. It is a phenomenon that is transforming how we interact with businesses, devices, and each other. From retailers and advertisers recommending appropriate products and email providers detecting spam, to cars that can drive themselves, and phones that recognize their owner’s face. Meanwhile artificial intelligence and machine learning must be the two most misunderstood concepts in tech today. In this post I will tackle some of the most common machine learning misconceptions.
The argument goes, if something has never happened before it’s predicted probability must be zero, what else could it be? That’s false as events are composed of many smaller components, all of which have relationships and similarities. The power of machine learning is identifying these relationships, and using them to predict rare events with high accuracy.
Experts in different fields have invested significant human effort to develop domain knowledge. It’s easy to think that this knowledge is lost when a machine learning system is used. A vital part of machine learning is feature engineering, the process of extracting patterns or features from raw data. Domain experts are often better than machines at suggesting features that hold predictive power. As such, domain experts form a key part in defining the input to a machine learning system, from which preexisting knowledge can be extracted, extended, and refined.
In supervised learning an algorithm is given an input and a desired output, and the aim is to learn the mapping from input to output. Think of it as a school test, where there are questions and answers, and you are graded by how close your answers are to the actual ones.
Now imagine there are no answers, what can you learn if you only have the questions? In unsupervised learning the aim is to uncover the underlying structure or distribution of the data in order to learn more about it. For example, finding groups of users that behave in the same way, or identifying events that are anomalous. Unsupervised learning can even generate completely new data. If given enough images of faces, we can train a computer to create realistic images of people who don’t even exist.
Machine learning can identify redundant and duplicate data, and for that reason machine learning can represent most of the information in a data-set with only a fraction of the content. However, it can do a lot more, and in reality, its main purpose is to make predictions. Summarizing the products you purchased in the past is just a means to predict which ones you might like to buy in the future. Knowing how product sales have fluctuated in the past is a guide to how they will behave over the coming weeks and months.
The amount of data and events generated in corporate networks is beyond the capacity of human experts, making it impossible for them to shoulder the burden of cyber threats alone. Reveal collects billions of events every day and utilizes a broad range of cutting edge algorithms to identify when something abnormal is happening. Ava provides a system for experts to focus on the small number of events that really matter, and to investigate an incident from start to finish in more detail than ever before.
This post was originally published in June 2018 and has been updated for comprehensiveness.