Machine learning has never been more prevalent or accessible than it is today. It is a phenomenon that is transforming how we interact with businesses, devices, and each other. From retailers and advertisers recommending appropriate products and email providers detecting spam, to cars that can drive themselves, and phones that recognize their owner’s face. Meanwhile artificial intelligence and machine learning must be the two most misunderstood concepts in tech today. In this post I will tackle some of the most common machine learning misconceptions.

Myth: Machine learning can’t predict previously unseen events

The argument goes, if something has never happened before it’s predicted probability must be zero, what else could it be? That’s false as events are composed of many smaller components, all of which have relationships and similarities. The power of machine learning is identifying these relationships, and using them to predict rare events with high accuracy.

Myth: Machine learning ignores preexisting knowledge

Experts in different fields have invested significant human effort to develop domain knowledge. It’s easy to think that this knowledge is lost when a machine learning system is used. A vital part of machine learning is feature engineering, the process of extracting patterns or features from raw data. Domain experts are often better than machines at suggesting features that hold predictive power. As such, domain experts form a key part in defining the input to a machine learning system, from which preexisting knowledge can be extracted, extended, and refined.

Myth: There’s no such thing as unsupervised learning

In supervised learning an algorithm is given an input and a desired output, and the aim is to learn the mapping from input to output. Think of it as a school test, where there are questions and answers, and you are graded by how close your answers are to the actual ones.

Now imagine there are no answers, what can you learn if you only have the questions? In unsupervised learning the aim is to uncover the underlying structure or distribution of the data in order to learn more about it. For example, finding groups of users that behave in the same way, or identifying events that are anomalous. Unsupervised learning can even generate completely new data. If given enough images of faces, we can train a computer to create realistic images of people who don’t even exist.

Myth: Machine learning just summarizes data

Machine learning can identify redundant and duplicate data, and for that reason machine learning can represent most of the information in a data-set with only a fraction of the content. However, it can do a lot more, and in reality, its main purpose is to make predictions. Summarizing the products you purchased in the past is just a means to predict which ones you might like to buy in the future. Knowing how product sales have fluctuated in the past is a guide to how they will behave over the coming weeks and months.