I came across a blog post here where somebody suggests a simplistic idea for accelerating algorithm development for a particular application. I welcome suggestions abd discussions about the conditions under which such algorithm development contests would work. Consider the following questions.
1. Who would fund the prize money, and why would they fund it?
2. Researchers at what level would be motivated towards the contest [at the level of a research lab]?
3. What would be the resources required to implement a contest like this, and what would be the value addition, if any?
Subscribe to:
Post Comments (Atom)
Nice blog post you found, I see what you did there :)
ReplyDeleteoff the top of my head some issues:
Besides what you mentioned, you need to design the contest so that:
1. there is some "ground truth", to which you measure progress. If this is supervised learning, we need training set with available labels and test set with hidden labels. If it is unsupervised learning, we are in trouble (see 2)
2. there can be incremental progress... I.e. the solutions should not just be good or bad but measurably good. For regression, this is often rms error on test set or something similar... with unsupervised problems this is more difficult. How to measure the closeness of a solution ? Number of correct clusters recovered ? Correlation of found components with true sources ? Quantization error ? What about the indeterminacies like scale, sign, order of components, etc. etc. etc.
If you subscribe to the philosophical school of thought that says "unsupervised learning = compression" to which I whole-heartedly do then just ask people to compress the data as much as they can and send in their compression algorithm. Then you test the algorithm with a secret data set similar in structure and you are done. With rules properly crafted against loopholes this is equivalent with learning the structure.
3. there is immediate market value in improving the solution. Implement the best solution on the side and fund the contest from the extra profit you make. I have no idea about the market for neuroscience algorithms, so I'll pass this one.
4. prevent overfitting. Somehow prevent contestants from sending in random solutions every second and re-running "the origin of species" by retaining and recombining those solutions that yielded good result and finally obtaining something that wins the contest but doesn't generalize to new data.
5. 50K is not enough to get people interested. Offer $1 million for an almost impossibly good score that would likely need the invention of some new paradigms. At the same time offer some small sum for incremental progress.
6. some other things I'm forgetting right now.
I know of two contests that are almost flawless as far as I can tell in the above sense:
- The Netflix prize.
- The Hutter prize.
There is also the Millenium Prize for mathematics set by the Clay Mathematics Institute (Cambridge, Mass.) in 2000, a privately funded organization that is awarding $1 million each for the millenium's 7 outstanding unsolved problems: http://www.claymath.org/millennium/.
ReplyDeleteIn this case, the proposed solution must be published in a peer-reviewed journal and must still be considered acceptable 2 years after publication before the CMI will consider the solution for the prize.
It seems that G.Perelman is closest to getting the prize for solving one of the problems, except that he doesn't seem to care the slightest:
ReplyDeletehttp://en.wikipedia.org/wiki/Grigori_Perelman
I find the valuable information is provided by you about to the algorithm and mathematics solutions.
ReplyDeleteI recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.