The KDD Cup (International Knowledge Discovery and Data Mining Competition) was held during the KDD conference organized by ACM (Association for Computing Machinery) and is unofficially known as the AI World Championship. Since 1989, the KDD conference has been the world’s oldest and most significant data mining event. Innovations such as crowdsourcing, large-scale data science competitions, algorithms for personalizing ads (e.g. Google), data mining (e.g. Facebook, LinkedIn), and recommendation systems (e.g. Netflix, Amazon, etc.) have come mainly from KDD.
In 2020, the conference attracted over 3,900 researchers from both the commercial and university worlds. KDD participants come from the largest technology companies globally, such as Google, Alibaba, Facebook, Netflix, LinkedIn, Tencent, Microsoft, IBM, Spotify, and Amazon. Equally crucial to the KDD community was the voice of state institutions such as NIH, NSF, or DARPA.
This year, nearly 2,500 teams worldwide competed in three KDD Cup competition categories, with three winners of each category awarded. Synerise competed in the most difficult of them, organized by Stanford University, Facebook AI, Google, and Intel.
“With our work, we want to prove that our AI team can compete with innovation leaders from around the world. We have created one of the most accurate and fastest systems – the processing time of the test set using the Synerise model is about 7 minutes. At the same time, the Google DeepMind solution takes as much as 12 hours,” said Michał Daniluk, AI Research Scientist at Synerise.
The competition task was to predict the subject of scientific publications based on the edges of the heterogeneous graph of studies, citations, authors, and scientific institutions. The graph of unprecedented size (about 250 GB) contained over 244B vertices of the three types, connected by as many as 1.7B edges, which allowed the algorithms to be verified in their readiness to operate on very large-scale data.
“Large heterogeneous graphs appear in many practical applications. The graph we process as part of the KDD Cup concerns academic citations. Still, data with a similar structure is also present in e-commerce (customer transaction graphs), large knowledge bases and document databases. Therefore, the mastery in processing this type of data leads to a tangible business advantage in improving the quality of recommendations and data retrieval. I am glad that data on these types of practical problems increasingly appear in competitions at leading conferences,” said Barbara Rychalska, AI Research Scientist at Synerise.
The Polish team consisting of Jacek Dąbrowski, Michał Daniluk, Barbara Rychalska and Konrad Gołuchowski used proprietary machine learning methods Cleora and EMDE, unlike most teams that improved the existing algorithms. The methods developed by the Synerise team previously allowed for victories in the SIGIR Rakuten Data Challenge 2020 and WSDM Booking.com Data Challenge 2021 competitions. They are also a vital element of the personalization system available to Synerise customers. The solution of the Polish team has already been published on the Stanford University website.
The most technologically advanced companies and universities globally attended the competition. Synerise defeated teams from all over the world, including specialists from Intel (manufacturer of computer processors), OPPO Research Topology Lab (manufacturer of OnePlus iOppo phones) and Huazhong University of Science and Technology.
“At Synerise, we focus on a fundamental understanding of the mathematical phenomena underlying deep learning. Combined with engineering finesse, it allows us to compete with the best research centers in the world, even though we only have a fraction of the resources available to them,” said Jacek Dąbrowski from Synerise.
The company offers a Big Data and AI platform. The latest technological solutions allow real-time processing data from various sources based on proprietary database systems, proprietary artificial intelligence algorithms, and automated execution of business scenarios for retail, banking, telecommunications, or e-commerce. Synerise’s customers include CCC, Carrefour, Żabka, Orange, mBank, Sharaf DG.