Machine learning in media services

Anatoly Starostin

Yandex Plus Funtech

16 December, 13:30, «03 Hall. Queen Erato»


The report examines the technological problems faced by modern media services and shows how machine learning helps to cope with them. We will talk about a whole range of technologies used in Yandex media services, such as music recognition by short and noisy audio fragments, actors’ faces recognition in movie frames, full-text search of musical compositions etc. The recently released music generation technology will also be discussed. Examples from real services with a multi-million audience will be given.

The report provides an overview of the technologies used in Yandex Media Services related to media data processing and discusses the role of machine learning and crowdsourcing methods in the implementation of each of them. Some of these technologies work directly with audio and video and some, in contrast, use only their metadata (usually text). Examples of both cases will be given. We will talk about recognition of a musical composition based on short audio fragments recorded from the microphone of a client device or taken from the audio track of a certain movie. The recognition of actors' faces in the movie frames will also be discussed. We will also cover several tasks that ensure the functioning of the musical scenario of Alice voice assistant and required machine learning or crowdsourcing techniques to implement. Finally, we will present the technology of automatic music generation, which became the basis for a new product of the Yandex Music service, called Neuromusic. This technology is a hybrid of algorithmic methods based on expert knowledge and machine learning methods. Machine learning is used to generate melodic fragments, which are later incorporated into an algorithmically controlled musical canvas. The report discusses the structure of the technology in general and the generation of melodies, in particular.

The talk was accepted to the conference program