Multi-modal classification for semantic web entity extraction


Although web scrapping is a hot topic in the computer engineering community, it has not enjoyed a great popularity among the research community. At Kompyte, we apply machine learning techniques to extract information from web pages in a multi-modal way. In this talk, we will discuss our approach.

In this talk we will describe several approaches to combine text, images and structural features from web pages so that they can be used for several downstream tasks. Among others, we will be discussing the following topics: - Feature engineering from web pages. - Character and sub-span CSS selector embeddings. - Multi-modal semantic web zone classification. - Distributional and contextual word embeddings.

The talk was revoked