ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
ExCAPE-DB is a free, public chemogenomics data set that can be used to test scalable multi-task learners on a challenging problem domain. The 70 million data points represent compound-protein binding experiments from drug discovery, and are similar to industrial data sets. The challenge is to accurately predict the unknown binding pairs from the training data, which covers about 4% of the space; success would have huge societal impact.
Roel Wuyts is Principal Scientist at IMEC and part-time Professor in Distrinet at the KULeuven. His current research interests lie with programming models for HPC, machine learning, scheduling and runtime load balancing. Previously he did research on adaptive runtime resource management of CPU and GPU's, on tool chains for embedded devices, in programming language composition, and in logic meta programming.