Skip to content

ICML 2013 Tutorial: Multi-Target Prediction

Introduction

Traditional methods in machine learning and statistics provide data-driven models for predicting one-dimensional targets, such as binary outputs in classification and real-valued outputs in regression. In recent years, novel application domains have triggered fundamental research on more complicated problems where multi-target predictions are required. Such problems arise in diverse application domains, such as document categorization, tag recommendation of images, videos and music, information retrieval, medical decision making, drug discovery, marketing, biology, geographical information systems, etc.

According to a general definition, the targets in multi-target prediction problems might be characterized by diverse data types, such as binary, nominal, ordinal and real-valued variables, but also rankings and relational structures, representing different entities of interest. Moreover, they often exhibit specific relationships, in the sense of being structured as a tree-shaped hierarchy or a directed acyclic graph, or being characterized by mutual exclusion, parent-child and other types of relationships. Specific multi-target prediction problems have been studied in a variety of subfields of machine learning and statistics, such as multi-label classification (prediction of multiple binary targets), multivariate regression (prediction of multiple numerical targets), sequence learning (ordered targets of varying length), structured output prediction (targets with inherent structure), preference learning (prediction of a preference relation between multiple targets, as in label ranking), multi-task learning (prediction of multiple targets in different but related domains) and collective learning (prediction for dependent observations).

Objectives and Targeted Audience

Despite their commonalities, work on solving problems in the above domains has typically been performed in isolation, without much interaction between the different sub-communities. Moreover, several of the problems have been studied in different communities under different names. Sometimes there is even terminological confusion within the same community. The main goal of the tutorial is to present a unifying overview of the above-mentioned subfields of machine learning, by focusing on the simultaneous prediction of multiple, mutually dependent output variables. In the different subfields of machine learning that cover multi-target prediction it has been acknowledged by many authors that it is important to explicitly model the dependencies between the predicted targets. We are convinced that existing solutions can be brought to a fruitful cross-fertilization. Roughly speaking, one could argue that multi-label classification and multivariate regression are specific instantiations of structured output prediction and multi-task learning, but not all structured output prediction tasks are also multi-task learning tasks and vice versa.

Despite the encouraging progress that has been made in the last decade, the current understanding of multi-target learning tasks and methods remains shallow. Further communication and education on the fundamental insights for this type of problems is still required. To date, it remains unclear which of the numerous approaches recently proposed performs better and under what assumptions. Therefore, the tutorial intends to cover an overview of existing methods, while focussing on general learning principles instead of algorithmic details.

With the tutorial we aim to attract both researchers that are already active in one of the above domains, as well as researchers with little or no prior experience in multi-target prediction. We will only assume a general familiarity with well-known machine learning techniques for classification and regression (kernel methods, ensemble methods, risk minimization, etc.).

Agenda and Slides

  • Introduction [pdf]
  • Part I: Individual-target view [pdf]
  • Part II: Joint-target view [pdf]

Presenters

WWWillem Waegeman – – NGDATA, Gent, Belgium
Willem Waegeman is working as Senior Data Scientist for NGDATA. Previously, he was a post-doctoral fellow of the Research Foundation of Flanders and a member of the research unit KERMIT at Ghent University (Belgium), and he has worked at different research institutes in Germany and Finland. His main interests are machine learning and data mining, including theoretical research and various application domains. He has a strong record of articles published in the top journals like Artificial Intelligence Journal, Machine Learning Journal, Neural Networks, Computational Statistics and Data Analysis, Pattern Recognition Letters, and BMC Bioinformatics. He also presented many papers at well-known machine learning conferences such NIPS and ECML/PKDD. At Ghent University he has been lecturing for several years a machine learning course for bio-science engineers.

WWEyke Hüllermeier – Marburg University, Germany
Eyke Hüllermeier is with the Department of Mathematics and Computer Science at Marburg University (Germany), where he holds an appointment as a full professor and heads the Computational Intelligence group. His research interests are focused on machine learning and data mining, uncertainty and approximate reasoning, and applications in the life sciences. He has published almost 200 research papers on these topics in top-tier journals and major international conferences, several of which have received best paper awards. He serves on the editorial board of several international journals, such as Machine Learning, and is a regular member of the program committee of major conferences in the field of AI; recently, he was area chair at AAAI-2012 (where he got the best area chair award), ECML-2012, and ICML-2010. He is a coordinator of the EUSFLAT working group on Machine Learning and Data Mining and the head of the IEEE CIS Task Force on Machine Learning. He is specifically recognized for his work on preference learning, a topic on which he has recently given several invited talks and tutorials (for example, at ECAI-2012, Discovery Science 2011, Algorithmic Decision Theory 2011).

WW Krzysztof Dembczyński – Institute of Computing Science, Poznan University of Technology, Poland
Krzysztof Dembczynski is an assistant professor at Poznan University of Technology, Poland. As a post-doctoral researcher he spent two years from 2009 to 2011 in the Knowledge Engineering & Bioinformatics (KEBI) Lab at Marburg University, Germany. His research interests span the field of machine learning, in particular, multi-label classification, preference learning, and structured output prediction. He was a member of the winning team of ECML/PKDD 2007 Discovery Challenge. He was also awarded as a team member in Beyond Search 2008 program sponsored and organized by Microsoft Research. As a co-author he won the best paper award at ECAI 2012. He is also a co-organizer of the Preference Learning stream at the European Conference on Operational Research. Currently, he obtains a prestigious scholarship from the Foundation for Polish Science.

References

Boutell M, Luo J, Shen X, Brown C (2004) Learning multi-label scene classification. Pattern Recognition 37(9):1757-1771

Breiman L, Friedman J (1997) Predicting multivariate responses in multiple linear regression. J R Stat Soc Ser B 69:3-54

Caruana R (1997) Multitask learning: A knowledge-based source of inductive bias. Machine Learning 28:41-75

Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Machine Learning 76(2-3):211-225

Diez J., del Coz J.J., Bahamonde, A. A semi-dependent decomposition approach to learn hierarchical classifiers. Pattern Recognition, 43, pp 3795-3804, 2010.

Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML 2010, Omnipress

Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2011) An exact algorithm for F-measure maximization. In: NIPS 2011

Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Machine Learning, 2012.

Dembczynski, K., Waegeman, W., Cheng, W., and Hüllermeier, E. (2012). An analysis of chaining in multi-label classification. In: ECAI.

Dembczynski K, Kotlowski W, Hüllermeier E (2012). Consistent Multilabel Ranking through Univariate Losses. In: ICML 2012

Krzysztof Dembczynski, Wojciech Kotłowski, Arkadiusz Jachnik, Willem Waegeman, Eyke Hüllermeier. Optimizing the F-measure in Multi-label Classification: Plug-in Rule Approach versus Structured Loss Minimization. ICML 2013.

Finley T, Joachims T (2008) Training structural SVMs when exact inference is intractable. In: ICML 2008, Omnipress

Gao, W. and Zhou, Z.-H. (2013). On the consistency of multi-label learning. Artificial Intelligence, 199-200, pp 22-44

Hariharan B, Zelnik-Manor L, Vishwanathan S, Varma M (2010) Large-scale max-margin multi-label classification with priors. In: ICML 2010, Omnipress

Hsu D, Kakade S, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: NIPS 22, pp 772-780

Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artificial Intelligence 172(16-17):1897-1916

Izenman A (1975) Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis 5:248-262

Joe H (2000) Multivariate models and dependence concepts. Chapman & Hall

Jordan MI (ed) (1998) Learning in graphical models. Kluwer Academic Publishers

Kumar A, Vembu S, Menon A, and Elkan, C. (2013). Beam search algorithms for multilabel learning. In Machine Learning.

Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and targeting sequence data. In: ICML 2001, pp 282-289

Neville J., and Jensen D. (2008) A bias/variance decomposition for models using collective inference. Machine Learning 73: pp 87-106

Petterson, J. and Caetano TS (2010). Reverse multi-label learning. In Advances in Neural Information Processing Systems 24, pp 1912-1920

Petterson, J. and Caetano, T. S. (2011). Submodular multi-label learning. In Advances in Neural Information Processing Systems 24, pages 1512-1520.

McAllester, D. (2007). Generalization Bounds and Consistency for Structured Labeling in Predicting Structured Data. MIT Press.

Pletscher P, Ong CS, Buhmann JM (2010) Entropy and margin maximization for structured output learning. In: ECML/PKDD 2010, Springer

Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: ECML/PKDD 2009, pp 254-269

Ricci E, De Bie T, Cristianini N (2009) Magic monents for structured output prediction, Journal of Machine Learning Research, 9:2803–2846.

Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Machine Learning 39:135-168

Tsochantaridis Y, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6:1453-1484

Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehousing and Mining 3(3):1-13

Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification. In: ECML/PKDD 2007, pp 406-417

Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data Mining and Knowledge Discovery Handbook, Springer

Weston J, Chapelle O, Elisseeff A, Scholkopf B, Vapnik V (2002) Kernel dependency estimation. In: NIPS 2002, pp 873-880

Yu S, Yu K, Tresp V, Kriegel HP (2006) Multi-output regularized feature projection. IEEE Trans on Knowl and Data Eng 18(12):1600-1613

Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 999-1008

Acknowledgements

WW Krzysztof Dembczynski is supported by the Foundation of Polish Science under the Homing Plus programme, co-financed by the European Regional Development Fund.