Submodular selection for data summarization
Témavezető: | Béres Ferenc |
SZTAKI, Informatikai Kutatólaboratórium | |
email: | beres@sztaki.hu |
Témavezetők
- Bérczi-Kovács Erika Renáta (ELTE, Operációkutatási Tanszék)
Projekt leírás
Machine learning models, especially deep neural networks, perform much better if they are trained on large data sets. Unfortunately with millions of training examples the model training time also increases. Submodular selection is a technique that selects representative subsets from large data and offers theoretical guarantees on the quality of the acquired sample. Thus for small representative subsets it has the potential to enable a significantly faster learning process with comparable accuracy to the full data set training.
Submodularity captures the diminishing return property for set functions and has several applications in machine learning related tasks [2]. Schreiber et al. developed a submodular selection framework in Python that implements a facility location as well as a feature-based approach [1]. A possible future work could be the implementation of additional submodular selection methods in the apricot framework that scale well for large data sets. Furthermore, it would be also interesting to see how these methods perform for temporal data sets with concept drifts as the best representative subset may change over time.
Hivatkozások
- Schreiber et al. apricot: Submodular selection for data summarization in Python, 2019, https://github.com/jmschrei/apricot
- Krause, Golovin: Submodular Function Maximization, in Tractability, Cambridge University Press, pp 71-1042014
Korábbi hallgatók
- Bartalis Dávid: Submodular selection for data summarization (2020/21 I. félév Önálló projekt, szakmai gyakorlat I)
- Bartalis Dávid: Submodular selection for data summarization (2020/21 II. félév Önálló projekt, szakmai gyakorlat II)
- Bartalis Dávid: Submodular selection for data summarization (2021/22 I. félév Önálló projekt, szakmai gyakorlat III)