Thought Leadership

Machine learning and Big Data analytics: the perfect marriage

In this blog post, we welcome ex-NGDATA collaborator and now university professor Willem Waegeman who reports on a scientific research project he worked on part-time while working with us. NGDATA takes pride in supporting research to advance the field of machine learning. If you want to learn more about our offerings in this area, take a look at our Lab and do not hesitate to contact us.

While 2012 has been the year of Big Data technologies, 2013 is becoming the year of Big Data analytics. Gathering and maintaining large collections of data is one thing, but extracting useful information from these collections is even more challenging.  Big Data not only changes the tools one can use for predictive analytics, it also changes our entire way of thinking about knowledge extraction and interpretation. Traditionally, data science has always been dominated by trial-and-error analysis, an approach that becomes impossible when datasets are large and heterogeneous. Ironically, availability of more data usually leads to fewer options in constructing predictive models, because very few tools allow for processing large datasets in a reasonable amount of time. In addition, traditional statistical solutions typically focus on static analytics that is limited to the analysis of samples that are frozen in time, which often results in surpassed and unreliable conclusions.

However, more clever alternatives that overcome those problems have been recently proposed in a novel and rapidly expanding research domain: machine learning. At the edge of statistics, computer science and emerging applications in industry, this research community focusses on the development of fast and efficient algorithms for real-time processing of data with as a main goal to deliver accurate predictions of various kinds. To name only a few applications, think of business cases such as product recommendation, segmentation of customers, fraud detection or churn prevention. Machine learning techniques can solve such applications using a set of generic methods that differ from more traditional statistical techniques. The emphasis is on real-time and highly scalable predictive analytics, using fully automatic and generic methods that simplify some of the typical data scientist tasks.  And yes, machine learning is finding its way to industry at this moment!

NGDATA is present this week at the International Conference on Machine Learning in Atlanta (ICML 2013), the premier venue for novel machine learning research. Together with researchers from Germany and Poland, we are organizing a tutorial about multi-target prediction methods. Traditional methods in machine learning and statistics provide data-driven models for predicting one-dimensional targets, such as binary outputs in classification and real-valued outputs in regression. In recent years, novel application domains have triggered fundamental research on more complicated problems where multi-target predictions are required. Such problems arise in diverse application domains, such as document categorization, recommender systems, tag prediction of images, videos and music, information retrieval, natural language processing, drug discovery, marketing, biology, etc. Specific multi-target prediction problems have been studied in a variety of subfields of machine learning and statistics, such as multi-label classification, multivariate regression, sequence learning, structured output prediction, preference learning, multi-task learning, recommender systems and collective learning. Despite their commonalities, work on solving problems in the above domains has typically been performed in isolation, without much interaction between the different sub-communities. The main goal of the tutorial is to present a unifying overview of the above-mentioned subfields of machine learning, by focusing on the simultaneous prediction of multiple, mutually dependent output variables.

If you are interested in this topic, find more about our tutorial here, including our presentation. Be sure to share your thoughts on machine learning by commenting below. Hope to see you at the ICML conference in Atlanta!