The Program Committee has not yet taken a decision on this talk
Ozon
The task of content moderation requires a lot of resources and time using a manual approach. That is why we are implementing ML models to solve this problem.
However, under conditions of high loads and the need for maximum fault tolerance, it's needed to choose the right solution for integrating ML models. NVIDIA's Triton Inference Server turned out to be such a tool for us.
Triton Inference Server is a powerful software that supports inferencing of several models at once and can allocate and use computing resources efficiently. However, in situations where high fault tolerance and maximum automation are required, features of pure Triton are not enough.
To meet the requirements that arise when working with a ML models in production, a number of solutions have been developed to improve stability and fault tolerance.
Main topics to be covered:
* Ensuring scalability
* Additional condition monitoring tools
* Full control and automation of model updates
* Ability to create individual instances for different models for efficient resource utilizing and fault tolerance
Thus, an attempt was made to create a Triton as a Service to make the models integration be easy and improve the stability of the system as a whole.
The largest professional conference for developers of high-load systems
Participation options
Offline
The price is soaring — the closer the conference is, the more it costs.
The current price of a ticket is — 280000 EUR
Changed your mind?
Tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems