And the winner is….!!!

DataScouting is happy to announce Alexander D’yakonov was the winner of our Kaggle competition on Greek Media Monitoring Multilabel Classification. Participants were asked to develop an automated ann...

PaladorScheduler

PaladorScheduler A job scheduler for large scale data processing (more info at datascouting.com).   DataScouting PaladorScheduler is a software ecosystem for scheduling, distributing, processing...

Highlighting Annotations in SolrJ

Recently in work, we had to use the Solr Indexer when creating a RESTful API in Java using the JAX-RS specification. Solr provides wrappers around its API calls for a variety of programming languages...

Amazon’s Mechanical Turk

In other words, how to rent an army of slaves on demand. Quoting from Amazon Web Services (emphasis mine): Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mecha...

Trac, SVN quick howto on a linux Debian

In the following, I will present a mini guide to setup Trac 0.10.3 and SVN services on a Linux Debian stable. I needed a per project authentication both in trac and in svn. I just finished it, seems ...

Text Analysis inside Lucene

Lucene (http://lucene.apache.org) is a well-known Informational Retrieval (IR) library, implemented in Java, which allows you to add powerful indexing and searching capabilities to your application. ...