Submodular selection for data summarization

Témavezető:	Béres Ferenc
	SZTAKI, Informatikai Kutatólaboratórium
email:	beres@sztaki.hu

Témavezetők

Bérczi-Kovács Erika Renáta (ELTE, Operációkutatási Tanszék)

Projekt leírás

Machine learning models, especially deep neural networks, perform much better if they are trained on large data sets. Unfortunately with millions of training examples the model training time also increases. Submodular selection is a technique that selects representative subsets from large data and offers theoretical guarantees on the quality of the acquired sample. Thus for small representative subsets it has the potential to enable a significantly faster learning process with comparable accuracy to the full data set training.

Submodularity captures the diminishing return property for set functions and has several applications in machine learning related tasks [2]. Schreiber et al. developed a submodular selection framework in Python that implements a facility location as well as a feature-based approach [1]. A possible future work could be the implementation of additional submodular selection methods in the apricot framework that scale well for large data sets. Furthermore, it would be also interesting to see how these methods perform for temporal data sets with concept drifts as the best representative subset may change over time.

Hivatkozások

Schreiber et al. apricot: Submodular selection for data summarization in Python, 2019, https://github.com/jmschrei/apricot
Krause, Golovin: Submodular Function Maximization, in Tractability, Cambridge University Press, pp 71-1042014

Submodular selection for data summarization

Témavezetők

Projekt leírás

Hivatkozások

Korábbi hallgatók