DataScouting is happy to announce Alexander D’yakonov was the winner of our Kaggle competition on Greek Media Monitoring Multilabel Classification. Participants were asked to develop an automated annotation of articles into topics and provide a better monitoring solution for media monitors. In figures, the competition attracted 121 teams, 128 players and 1210 entries.
Alexander is a professor at the Lomonosov Moscow State University and a Kaggle member since 2010. We asked him to share some insights on the competition:
DT: What made you decide to enter?
AD: I wanted to compete and chose several contests, but I did not have much spare time… so got a final solution only in WISE 2014. There was a quite simple problem: input data were real vectors with unit L2-norm and labels. The only difficulty was that one vector might have several labels. I already had solved similar problems. And my previous Kaggle contest (LSHTC) was related with multi-label text classification like this one. It is interesting that the two highest teams on the leader board in these two contests are the same.
DT: What preprocessing and supervised learning methods did you use?
AD: I tried to generate new features, use SVD, and transform initial data, but it only slightly increased performance. My final solution did not use all these tricks. I realized that linear methods (ridge regression and logistic regression) were more suitable for this problem than kNN and naïve bayes. In my final blending I used all these linear methods and kNN. My algorithm consisted of two parts: linear combinations of regressors for each label and a binary decision rule. Such algorithms are very popular in Russia, for example in «the algebraic approach to classification». This technique had been developing by academician Yuri Zhuravlev and his scientific school since 1978 and is unknown in Europe and USA.
DT: What have you taken away from this competition?
AD: I was in sixth place during the last week of the contest and did not have any new ideas. Suddenly I thought up my model 4 hours before the end. I tried the model in my local tests and it sufficiently increased the performance. I ran the model on the whole training set. It took almost 4 hours to build regressors and tune parameters, so I made my final submissions several minutes before the end. I was a lucky that I didn’t make a mistake in the code. I took away that it was possible to win the contest in 4 hours.
To view the entire interview click here
The finish was “thrilling” as one of the contestants mentioned in the forum and we would like to congratulate all the teams and players that took part in what they described as “great” and “interesting” competition and a “real” challenge.
The competition is associated with the WISE 2014 conference that will be held in Thessaloniki, Greece on 12-14 October 2014. The competition was organized by DataScouting, Enimerosi and the Department of Informatics of the Aristotle University of Thessaloniki.