Professional conference for developers of high-load systems

Schedule (pdf)

Talks FAQ Partners Ticket price Contacts Personal account

Architectures, scalability (19)

Aram Mkhitaryan

PicsArt

Expanding From Consumer into Enterprise with APIs: build, learn, refine

16 December, 11:10, «02 Hall. Ararat»

Picsart, known mostly for its Photo and Video editor and its huge consumer base, had a rapid growth during past years which led to running into problems with the monolith solutions. To iterate quickly and ensure reliability Picsart moved towards modular, microservices and microfrontends architecture. This allowed it to experiment and eventually expand the company's offering to SMBs and Enterprises with its API & SDK offerings.

The talk was accepted to the conference program

Alexander Horoshilov

Yandex

Yandex Query - serverless federated query system. Inside view

15 December, 13:30, «04 Hall. Ashot Yerkat»

Result of 5-year experience developing Yandex Query to bring batch/stream processing service into Yandex Cloud. YQ can run SQL-like queries over endless dataflow.

We’ll talk about details of design trade-offs: - capacity vs. isolation - performance vs. reliability - security vs. UX.

We present our 5-year experience developing Yandex Query. It is a query processing service in Yandex Cloud. Both batch and stream processing are handled similarly with the same syntax. One can debug their queries on batch data samples and run them for the production stream without changes.

YQ reuses our internal system for distributed query processing. The job can spawn over hundreds of nodes to meet load requirements. YQ can fetch data from and upload it into external systems (i.e. object storage or message queues) to join heterogeneous sources in a single query.

We’ll reveal details of our multitenant system design. Design choices we had and decisions we made:

1. Capacity vs. isolation
Control plane is isolated from the compute plane(s). Processing cluster includes several compute planes (like tenants) to reduce blast radius. It reduces the risk of system downtime in a shared environment.

2. Performance vs. reliability
YQ uses cloud compute nodes and enforces limits and quotas to mitigate DDOS in presence of high-load queries. Data is processed as fast as possible, and we provide exactly-once guarantees under certain conditions.

3. Security vs. UX
YQ conforms to strict cloud policies on data privacy. All data sources support service accounts for flexible access control. The compute plane uses time-limited tokens only. Service is available from cloud console UI and provides API for integration with other services.

Finally, like everything in Yandex, our system is distributed, scalable, and fault-tolerant, with all benefits and complexity of this design.

The talk was accepted to the conference program

Andrew Aksenov

Avito, Sphinx

World constants 2022

16 December, 11:10, «01 Hall. Tigran»

A tribute and update to [Dean2010], if you know what I mean. There's a bunch of "world constants" that directly impact the (low level) system performance limits, and your ability to properly estimate that. Alas, most engineers don't really know those by heart. Worse, they degrade over time, and need regular refreshers. This talk does exactly that: introduces the constants for those who don't yet know that, refreshes them (and introduces a couple that maybe you didn't yet think about) for those who do.

The talk was accepted to the conference program

Nikolay Izhikov

Apache Ignite PMC, Apache Kafka Contributor

Practical aspects of B+ trees

15 December, 12:20, «04 Hall. Ashot Yerkat»

Many engineers are familiar with B+ tree data structure and its application inside DBMS. But how does it work internally in order to provide concurrency, reliability, high throughput and all the other great features? In the talk I will try to give brief overview of methods and tweaks from real-world DB.

The talk was accepted to the conference program

Oleg Anastasyev

Odnoklassniki

Effective and Reliable Microservices

16 December, 17:00, «02 Hall. Ararat»

Odnoklassniki is one of the most popular social networks in CIS and the top 6 globally. It is in the top 20 sites among similar web’s top global websites list. More than 70 million people use Odnoklassniki regularly to share their valuable stories with friends and family, watch and stream videos, listen to music, and play games together.

Odnoklassniki employs hundreds of different microservice applications to serve users’ requests. Many of these services are built as stateful applications - they store their data locally, embedding a Cassandra database into the application’s JVM process. This challenges the usual way of building applications - a stateless microservice with a separate remotely accessible database cluster.

In this talk Oleg will try to cover the advantages of stateful vs stateless microservices, discuss how statefulness affects reliability and accessibility of services and how it helps to build faster applications. We’ll go step-by-step through building a stateful application service, delving into its architecture, major components as well as significant challenges and their solutions.

The talk was accepted to the conference program

Antony Polukhin

Yandex

Microservices on C++, or why we made our own framework

16 December, 13:30, «04 Hall. Ashot Yerkat»

We write IO bound applications, that have CPU intensive parts, may require a lot of memory and should be highly available.

Unfortunately, existing solutions did not match our needs, so we made our own framework, with coroutines and dynamic configs.

From this talk you'll get hints on how to combine usage simplicity and C++, production ready coroutines and language without support for them, high development speed, efficiency and safety.

The talk was accepted to the conference program

Nick Shadrin

Software Architect for NGINX

QUIC and HTTP/3

15 December, 14:40, «01 Hall. Tigran»

New iteration of HTTP protocol brings more challenges than previous upgrades. We now need the UDP transport and need to implement a different negotiation method, another trouble is the need for embedded security/encryption. You will learn how to implement this protocol in the network, and most importantly - why.

Plan:
- Protocol history: HTTP/1, HTTP/1.1, SPDY, HTTP/2
- Main differences of QUIC and HTTP/3
- UDP
- Encryption
- Connection ID
- Protocol negotiation
- Use of HTTP/3 in real networks
- All the boxes in between of client and server
- Implementation in NGINX
- Conclusions and questions

The talk was accepted to the conference program

Vasily Pantyukhin

VeUP Ltd

Cheats & mistakes to read and create SLAs

16 December, 12:20, «01 Hall. Tigran»

Trust matters. We rely on our providers’ SLAs and share our “designed for” SLOs. We need to trust and gain trust to deliver trustworthy solutions. Availability and Durability are essential system reliability SLAs. Unfortunately, quite often we mismeasure, hide, or even distort them.

During the session we’ll discuss common mistakes and problems with reliability of SLAs. Examples illustrate tips to read, share and compare the numbers of 9s.

The talk was accepted to the conference program

Vladislav Shpilevoy

Senior Developer at VirtualMinds

Fair threaded task scheduler verified in TLA+

15 December, 14:40, «03 Hall. Queen Erato»

Algorithm for a multithreaded task scheduler for languages like C, C++, C#, Rust, Java. C++ version is open-sourced. Features: (1) formally verified in TLA+, (2) even CPU usage across worker threads, (3) coroutine-like functionality, (4) almost entirely lock-free, (5) up to 10 million RPS per thread.

Key points for the potential audience: fair task scheduling with multiple worker threads; open source; algorithms; TLA+ verified; up to 10 million RPS per thread; for backend programmers; algorithm for languages like C++, C, Java, Rust, C# and others.

"Task scheduling" essentially means asynchronous execution of callbacks, functions. Some kind of a "scheduler" is omnipresent in most services - an event loop; a thread-pool for blocking requests; a coroutine engine - you name it. Scheduler is an important basis on top of which the service’s logic can be built.

Gamedev is no exception. I work at Ubisoft, we have miles of code used in thousands of servers, mostly C++. There is a vast deal of task types to execute: download a save, send a chat message, join a clan, etc. They often compose one multi-step task: (1) take a profile lock, (2) download a save, (3) free the lock, (4) respond to the player. There is a waiting time between each step until the operation is done.

One of the game engines’ backend codes had a simple scheduler generic enough to be used for every async job in all the servers. It juggled tasks across several internal worker threads. But it had the following typical issues:
- Unfairness. Tasks were distributed to worker threads in a round-robin. If tasks differ in duration, some threads can appear choking while others are idle.
- Polling. In a naive scheduler multi-step task execution works via periodic wakeup of the task. When awake, the task checks if the current step is done and if it can go to the next one. With many thousands of tasks this polling eats notably more CPU than the actual workload.

The talk presents a new highly efficient general purpose threaded task scheduler algorithm, which solves these problems and achieves even more:
- Complete fairness - even CPU usage across worker threads and no task pinning;
- Coroutine-like - API to wake a task up on a deadline and for immediate wakeup;
- No contention - operation is mostly built on lock-free algorithms;
- Formal correctness - the scheduler is formally verified in TLA+.

After the scheduler was implemented in C++ and embedded into several highly loaded servers, it gave N-fold improvement of both RPS and latency (more than x10 speed up for one server).

At the same time the talk is not C++-specific. It is rather a presentation of several algorithms combined in the scheduler. They can be implemented in many languages: at least C, C++, Rust, Java, C#. But there is a need of support of atomics and threads.

All that is completely open-source - https://github.com/ubisoft/task-scheduler.

The talk was accepted to the conference program

Andrei Vasilenkov

Yandex

Not your ordinary CDN

16 December, 11:10, «03 Hall. Queen Erato»

Most of you have read something about classical approaches to CDN: anycast, GeoDNS or just a plain web server with enabled cache layer. And it works great for common web applications — reading text or scrolling through doge memes. But when it comes to video streaming — that’s a whole new story!

Scale always brings new challenges. Having a dozen nodes serving your users without fail — we’ve been there. Moving up to horizontal scaling using different locations in our data centers — long gone. Now we have a massive CDN with external locations serving hundreds of thousands of users simultaneously, distributing terabits of media data per second. In my talk I’ll explain the reasoning behind our CDN architecture and tell you how a basic automated systems algorithm can keep you sane.

In this talk I will:
— introduce basic network-related problems of our video streaming platform;
— talk about why standard CDN building approach is no good for our system;
— iterate through our approaches on distributing traffic via our CDN locations;
— show how we use the PID-controller algorithm to control traffic flow.

The talk was accepted to the conference program

Mons Anderson

Solution Architect at Exness

How to choose a queue properly

15 December, 12:20, «01 Hall. Tigran»

I will talk about approaches of queue usage and key parameters worth looking at like scalability, durability, guaranteed delivery, availability vs consistency and throughput.

Most microservice architectures and distributed applications require some kind of messaging service. It is called message broker or message queue as an inevitable element of system design.

There are quite a few of them, with their own pros and cons. The wrong choice of this component could lead to problems with scalability or fault tolerance in your application.

In my talk I point out key moments in choosing a queue, as well as guide you through the comparison of RabbitMQ, Kafka, NATS and other candidates.

The talk was accepted to the conference program

Lia Yepremyan

AMD Armenia

FPGA Basic Principles: An Introduction to How It Works

16 December, 14:40, «03 Hall. Queen Erato»

Here we will cover different aspects related to FPGAs. First of all, an overview of the basic FPGA architecture is presented. The purpose of this presentation is to focus on the FPGA design process and tools which are required to program an FPGA, in addition to that, we will also discuss programming languages and how to create your first code for FPGA. Later we will provide a practical example and dive into FPGA design optimization.

The talk was accepted to the conference program

Alexander Makarov

ASAPIRL

Theory of programming: packaging principles

15 December, 11:10, «03 Hall. Queen Erato»

Everyone knows SOLID programming principles, the essence of modern object-oriented programming. But there are additional higher-level principles coined by Robert C. Martin that help to determine and measure isolation boundaries between packages, modules, microservices etc.

In this talk you’ll get into principles of package cohesion and coupling. We’ll highlight the shortcomings, tradeoffs and key points of usage and dive into D-metrics.

After the talk you’ll add more tools that help you write better code and design better systems overall.

The talk was accepted to the conference program

Denis Filippov

Coins.ph

Kafka for Golang developers: tips and tricks

16 December, 15:50, «02 Hall. Ararat»

It is not Kafka 101. On the contrary, you are familiar with Kafka and use it in your projects. I’ll demonstrate some traps and pitfalls we ran into. We’ll discuss them, have a look at how it works under the hood and try to figure out if Go philosophy can help or may harm you when working with Kafka.

After a short introduction to Kafka (only things we will need for the understanding of the discussions) I will show some cases with diving into details: - Partition rebalancing (and how you can handle it) - Asynchronous commit: can we make message processing more concurrent? - Batch producing: don’t let default settings slow down your app.

This talk is sort of the story of a survivor. No boring theory, only practical use cases with deep diving into details. We’ll have a look into some issues, figure out why it happens and discuss how Kafka libraries and engine works under the hood.

The talk was accepted to the conference program

Artem Trofimov

CloudIL

Having cake and eating it too: painless and efficient cluster utilization for data scientists

16 December, 10:00, «03 Hall. Queen Erato»

A data science team may become an abyss for expensive hardware. One can allocate Docker/Jupyter instance with GPU but spend most of its time on code writing or data visualization. In this talk we will discuss how to ensure efficient hardware utilization while avoiding unpopular restriction policies.

The talk was accepted to the conference program

Aleksei Dashkevich

X5 Tech

From MVP to Reality. Transition Problems and Solutions

16 December, 10:00, «04 Hall. Ashot Yerkat»

We can describe product development as a struggle between business, technology, marketing and others. When launching an MVP, we usually sacrifice quality for the sake of quickly testing a hypothesis. And what’s next? What technical challenges will we have to face and how to solve them?

How to live and what to do after successfully testing a hypothesis? What technical challenges will we have to face and how to solve them? What if it’s possible to build an architecture that will make life easier for us in the future at the MVP stage? We asked ourselves these questions, so we would like to share our experience in the technical development of fast-growing products: * how to maintain a balance of development and technical debt when scaling up to 10 times? What problems did we encounter and how did we solve them; * we’ll also talk about what you could think of at the beginning of development to make the transition from MVP to Reality easier.

We’ve seen a lot of products on different stages of development. Participated in large-scale roll outs and developed processes to deal with high degree of uncertainty. This includes low level staff, such like custom pod scaling, optimisations, technologies restrictions, metrics collection and so on. And High Level understanding of business process monitoring, data architecture, component architecture, microservices and so on.

The talk was accepted to the conference program

Ignas Bagdonas

Equinix

Everyday Practical Vectorization

15 December, 10:00, «03 Hall. Queen Erato»

Free performance boost! Yes, free - you have already paid for your platform of choice that supports fancy vector processing extensions such as AVX2, AVX-512, SVE, RV-V, and the like - but you were not aware of what those extensions could offer you. Or maybe not that much free? Let’s check and see.

Vectorization has been around for a good while now, and sadly has been undervalued in the software domain - for a multitude of reasons. Trends in compute platforms evolution unanimously have vectorization as the leading performance increase mechanism in hardware domain. There gap between the views of software and hardware worlds is quite enormous - and that is something that needs to be addressed. /
Historically vector processing mechanisms were a domain of floating point calculations. While still being of major relevance, FP is becoming a specialty fragment of what contemporary vector processing approaches are able to provide to general integer computation domain.
Everyday tasks such as pattern matching, endianess conversion, hash function calculation, cryptography operations, adaptation of inherently scalar algorithms for vector domain, impact and restrictions of data structures layout for vector performance - a set of subtopics to discuss in a form of questions and answers, with a focus on analysis of performance boost or limitation factors.

The talk was accepted to the conference program

Kirill Alekseev

Mail.Ru Email Service, VK

Push-notifications in RuStore: how we built an alternative transport to replace Google Firebase

15 December, 15:50, «02 Hall. Ararat»

We have built a complete transport for push-notifications that can be used instead of (or in conjunction with) Google Firebase. A notification flow in our systems excludes Google APIs which means that if some app gets banned from Google’s push transport, their users can still be reached through our service. In a more optimistic world, you can continue using both systems to increase delivery rate and improve latencies. We will also deliver notifications in real time, with text/pictures etc, like Google does. Our service is free to use but the app that wants to use it is required to be deployed to RuStore.

There are 2 main components: Android SDK and backend API.

Android SDK provides the same interface as Firebase SDK does. It encapsulates registering a new device token, fetching and showing notifications.

Backend API is a drop-in replacement for Firebase API, in Mail (Почта Mail.ru) we managed to integrate RuStore push-notifications by simply changing an API host from Google’s to RuStore’s. We have a stateless API that is deployed to a distributed k8s cluster, pub/sub system and a web socket server (for real time notifications delivery), Scylla to store notifications and a Redis Cluster to store device tokens.

The talk was accepted to the conference program

Alik Kurdyukov

UnitedTraders

To Rust or not to Rust: 3 years in production with exchange matching engine

15 December, 17:00, «02 Hall. Ararat»

When you need to implement new system with tight latency requirements you face the problem of selecting the right implementation ecosystem. Most of low-latency systems nowadays are implemented in C/C++. There are upcoming rivals like Golang. But one contender in not so popular in main-stream – Rust.

I’ll tell a story of implementing exchange matching engine in Rust which started in 2018 and is in production for more than 3 years now. We faced different problems starting from team hiring to selecting the right architecture and libraries and then testing methods. We’ll cover lots of practical question like: When it is reasonable considering Rust? How to hire Rust developers? What kind of training does team needs? What kind of architecture Rust force? How can you benefit or loose with ecosystem? How your ways of reasoning change?

The talk was accepted to the conference program
Databases and storage systems (11)

Daniël van Eeden

PingCAP

An introduction to TiDB

15 December, 15:50, «03 Hall. Queen Erato»

TiDB scales writes without adding extra load on the developers working with the database. It can also combine OLTP and OLAP workloads in a hybrid solution (also known as HTAP). And all of this while being compatible with the MySQL protocol.

The talk was accepted to the conference program

Arshak Matevosyan

PicsArt

How to create a fully transparent MongoDB database cluster holding terabytes of data serving hundreds of millions of users simultaneously

16 December, 14:40, «04 Hall. Ashot Yerkat»

Working in a fast-growing microservice hybrid infrastructure from various perspectives and topics, our team had to overcome challenges when the databases received enormous amounts and a variety of queries, which could slow down or even crash the servers.

This experience was unacceptable as database layer issues can cause slowness or even unresponsiveness of the whole application.
The problems can be very different, starting with enormous amounts of bulk queries and ending with unstructured queries or even queries with infinite loops.

Our team developed a solution that allows us to analyze and monitor what's going on in the database using open-source tooling and provide monitoring capabilities for application engineers to see how their queries perform in real time.

The talk was accepted to the conference program

Chris Bohn

MicroFocus LTD

Designing a more efficient OLAP database data flow and architecture

15 December, 10:00, «01 Hall. Tigran»

Modern database systems feature OLTP databases for recording business facts and dimensions, and OLAP databases for data analytics. These database types feature different fundamental storage architectures. OLTP databases are designed for fast single-record lookup, while OLAP databases are designed for fast analytics like aggregation. This leads to different data storage approaches. To enable fast aggregation, OLAP databases usually feature immutable data storage containers, especially in cloud environments like AWS. This makes update and delete operation very expensive, because those immutable storage containers must be destroyed and rebuilt. Excessive updates and deletes can severely impact OLAP database performance. Most transactional middleware applications make use of frequent update and delete operations, which have less performance impact on OLTP databases compared to OLAP. Most businesses feature OLTP databases for running the business, the transaction data then feeding OLAP databases to subsequently analyze the business. OLTP and OLAP databases need to live together nicely, but there is an impedance mismatch due to the data storage differences. At MicroFocus, we set about to minimize the impedance mismatch. We settled on a data flow design where all data loaded into Vertica (our OLAP database) is append-only. We also determined that data integrity would be inherited from the upstream OLTP systems - so why do it again? We thus decided to use no primary or foreign key constraints, because the benefits would be redundant. This allows for much faster ELT data loading and query processing because there is no constraint checking. Again, we are inheriting constraint checking from the upstream OLTP databases, which are much better suited for that. By accepting that the upstream OLTP database has already done referential integrity checks, our OLAP database is freed from the constraint checking overhead. That is a large performance gain.

As mentioned, our ELT is append-only. That means that our Vertica OLAP database has all the iterations of all the records. That means we have a complete record of the whole change history for all the records. The change history has become a hot topic in data analytics because the effect of changes to things such as product description and correlating that to sales revenue is an important data point. Keeping complete change history is becoming essential to data analytics. The OLTP/OLAP design that MicroFocus has taken yields efficient OLAP database performance and retains change history - an important win that comes at no cost.

Summary: The design approach we have taken at MicroFocus with our OLTP/OLAP design has yielded a robust and performant holistic system that enables our Vertica OLAP EDW to perform at its potential, while providing benefits like change history.

The talk was accepted to the conference program

Daniil Gitelson

Lekton

How we cook Foundation DB

16 December, 17:00, «01 Hall. Tigran»

FoundationDB is a low-level ACID database with nice guarantees designed as a ‘foundation’ for high level DBMS’es. None was actually created, so we had to roll our own

FDB is a simple key-value ACID database with nearly 6 operations. So we had to build high level API around it supporting
– Document-like storage with indexes
– Time-based and client-centric partitioning of historical data (e.g. payments history)
– Queues

We evolved this layer from a simple Kotlin library to a separate service. In this talk I will speak on how FDB works and how we implemented that layer squeezing max performance out of it.

The talk was accepted to the conference program

Vladislav Pyatkov

GridGain

How did we build rebalance into the distributed database architecture

15 December, 12:20, «02 Hall. Ararat»

I will describe how the rebalance procedure is changed due to developing replication protocols, and how former processes modified in the new circumstances. All the material based on experience of maintaining and developing Apache Ignite.

The talk was accepted to the conference program

Vladimir Bukhonov

Miro

Miro canvas content migration from Postgres to the in-memory DB + S3

16 December, 17:00, «04 Hall. Ashot Yerkat»

Transferring canvas data to Miro is a long and unexpected process. The current migration process is already the second one in Miro. In my presentation, I want to explain why, how, and why we moved the contents of our canvas from one storage to another and why we came to such a final decision.
I’ll tell you about the criteria for choosing bases, how we compared them, and why we finally came to the conclusion that it’s better to write your own solution.
I will talk about the silly, but non-obvious data errors that we encountered on the way to integrate our new database, as well as the limitations that arose as a result of moving to an in-memory database and how we got around them.
Well, as a bonus, I’ll tell you how we saved about 90% of the financial costs of the database infrastructure.

The talk was accepted to the conference program

Jordan Pittier

Gorgias

PostgreSQL a journey from 0Tb to 40Tb in 4 years

15 December, 13:30, «02 Hall. Ararat»

PostgreSQL is an amazing database, very versatile and capable of processing both online analytics and transactional workloads. Yet operating PG past a certain scale (>1Tb) is challenging and mistakes are costly. In this talk, we will share our experience as our PG databases grew from 0Tb to 40Tb.

The talk was accepted to the conference program

Konstantin Osipov

ScyllaDB

NoSQL and transactions: getting the numbers out

15 December, 11:10, «01 Hall. Tigran»

We built an open-source instrument that benchmarks transactional workload with multiple NoSQL vendors in the cloud. In this talk I’ll present the tool, the method, and the benchmarking results. MongoDB, CockroachDB, FoundationDB and YDB are covered to a different extent.

In an effort to provide both consistency and scalability the NoSQL ecosystem has been rapidly providing transaction support. From pioneers, focused squarely on scaling transactional workloads, to followers, adding transaction support to eventually consistent data stores, more and more vendors are trying to get on board of the relational database train. The idea of serverless scalability of a strongly consistent workload is quite attractive, so we evaluated a few vendors from the cost per transaction perspective, comparing their performance with one popular open-source relational database. In this talk I’ll present the method, the tool, which we made available online, and the evaluation results.

The talk was accepted to the conference program

Anton Zhukov

ManyChat

The 2% Solution

16 December, 15:50, «04 Hall. Ashot Yerkat»

After we started receiving insane AWS invoices for our cold events databases, we decided to optimize the data and store it like compact encrypted and compressed chunks. I’ll tell you about an engineering way of solving the task without using any ready solution like branded database or data storage.

In this talk I will share a full history of custom data storage creation. I will speak about 2 sides of processes. About a simple concept of integrating stateless component beside a database driver and about the migration complexity with parallel processes where each mistake has a monthly expected cost. Research, development, and troubleshooting in the custom data storage which allows us to cut our costs down to 2% from based PosgtreSQL instance.

The talk was accepted to the conference program

Alexey Palazhchenko

FerretDB Inc.

Building an open-source MongoDB-compatible database on top of PostgreSQL

16 December, 15:50, «01 Hall. Tigran»

MongoDB is a life-changing technology for many developers, empowering them to build applications faster than using relational databases. However, MongoDB abandoned its open-source roots, changing the license to SSPL and making it unusable for many open-source and commercial projects. We decided to change that, so we started working on FerretDB – an open-source proxy written in Go. It accepts connections and handles queries from unmodified MongoDB clients, and stores data in PostgreSQL.

In my talk, I will briefly discuss our reasoning for starting this project, our vision, and our plans for the future. I will also cover a lot of technical aspects of FerretDB, such as:
• How did we implement the MongoDB wire protocol?
• How do we store MongoDB/BSON documents in PostgreSQL/jsonb columns?
• How do we query and filter data using SQL, and what problems have we encountered?
• How do we test our implementation?
• And others.

The talk was accepted to the conference program

Igor Loban

Toloka.ai

Transactional queues in PostgreSQL

16 December, 14:40, «01 Hall. Tigran»

Most modern web applications use external message brokers like RabbitMQ, Kafka, or something similar. Developers have to solve a problem of atomic change in a DB and sending a message to a queue.

In my talk, I’ll show the Transactional Outbox pattern, how it solves the problem and our recipe for its reliable implementation for PostgreSQL.

The implementation is pretty challenging, so there is a recommendation to use a ready-to-go solution like PgQ. But PgQ has some disadvantages: requires a daemon process, provides generic queues that are redundant, has a lack of documentation, and doesn’t fit for all (maybe unavailable for managed PostgreSQL).

The talk was accepted to the conference program
Architecture, Design patterns (1)

Valentin Udaltsov

Happy Inc.

ID battle: UUID vs auto increment

16 December, 14:40, «02 Hall. Ararat»

For almost eight years, while developing web applications, I used exclusively auto increments to identify entities. A few years ago I tried UUID in a pet project. Since then, my team and I choose UUIDs for identification in most of the cases. We learned how to correlate entities of different modules by ID, we took advantage of different UUID types, and were among the first to use UUID v6 and v7.

At the conference we will discuss pros and cons of using auto increments and different UUID versions in various situations, study database benchmarks, and find a new winner in a good old battle of identifiers.

The talk was accepted to the conference program
BigData and Machine Learning (6)

Dmitrii Kamaldinov

Qrator Labs

On one interesting generalization of the Leaky Bucket algorithm and Morris's counters

15 December, 14:40, «04 Hall. Ashot Yerkat»

The task of reducing the intensity of the event flow often arises in practice. Often it takes the form of limiting internet traffic to reduce the load on a particular service.

In my talk I will cover a slightly more complicated problem: reduce the intensity of the flow by removing only the most frequent elements (or, equivalently, by removing as few unique elements as possible).

It turns out this alternative approach to rate limiting is quite reasonable in some cases. And the talk will cover a bunch of application examples including several from our experience with traffic filtering at Qrator Labs. We will discuss the pros and cons of this approach and will also compare it with such limiting instruments as NGINX.

I will propose an algorithm developed by our team at Qrator Labs which solves this problem. Based on the famous Leaky Bucket algorithm, the algorithm is incredibly simple and requires no prior knowledge from the audience. Yet we find it quite interesting and elegant because it achieves the goal by spending O(1) time on processing each element, despite the apparent complexity of the task which in some way combines the task of rate limiting with the task of searching for the most frequent elements (so-called heavy hitters), that at first glance would require some kind of sorting.

In the second part of my talk, I will give a brief overview of Morris’s counters that allow counting a large number of events using a small amount of memory by introducing a probabilistic approach to updating when an event hits.

These counters can find their application in systems where the memory is a critical resource (in particular as a part of the algorithm mentioned above). In addition to describing the working principle and properties of the classical Morris’s counters, the talk will also present some novel ideas obtained in the course of our research.

The talk was accepted to the conference program

Roman Grebennikov

DeliveryHero SE

Building an open-source online Learn-to-Rank engine

15 December, 15:50, «04 Hall. Ashot Yerkat»

Building a CTR-optimized ranker takes ~6 months. Most of the time you will be gluing different ML libraries together, repeating the same mistakes everyone made before.

We got tired of it and made Metarank: an open-source LTR service doing 90% of the most typical ranking tasks with only 10% time.

The talk was accepted to the conference program

Dmitry Petrov

Data Version Control (DVC)

ML experiment tracking with VScode, Git and DVC

16 December, 12:20, «03 Hall. Queen Erato»

The machine learning space brings extra challenges in the form of the hundreds and thousands of ML experiments and the large datasets involved. This can be accomplished right from VScode using the DVC extension. We will show how Git can be used as a source of truth for ML experiments.

The machine learning space brings extra challenges in the form of the hundreds and thousands of ML experiments that must be tracked and the large datasets involved. This can be accomplished right from VScode code editor using the DVC extension for VScode. We will show how Git can be used as a source of truth for ML experiments and how teams can collaborate by sharing modeling code and metrics using Git and GitHub.
ML teams will learn how to use the existing tools such as Git, GitHub or GitLab in ML teams and how to better collaborate with software engineering and DevOps teams.

The talk was accepted to the conference program

Roman Smirnov

Exness

Machine Learning in the audio domain: when the neural network is overkill or where are the limits of lightweight models

15 December, 13:30, «03 Hall. Queen Erato»

Machine learning engineers and data scientists typically use neural networks when the task is about media-data: texts, images, sounds/voices. There are many great and pretrained architectures for voice processing, e.g. Wav2Vec2 or Whisper. However, such models are really huge and require expensive computational resources or take too long to process data. I am going to describe several audio processing tasks from classification and regression on audio sequence to diarization and speech recognition with focus on the first two mentioned tasks - experiments with poor and rich datasets to solve these tasks using lightweight gradient boosting on decision trees model and pretrained Wav2Vec2 neural network (that is current SotA in many voice processing tasks). My main goal is to discuss where the limits of gradient boosting algorithms are in the audio domain.

The talk was accepted to the conference program

Anatoly Starostin

Yandex Plus Funtech

Machine learning in media services

16 December, 13:30, «03 Hall. Queen Erato»

The report examines the technological problems faced by modern media services and shows how machine learning helps to cope with them. We will talk about a whole range of technologies used in Yandex media services, such as music recognition by short and noisy audio fragments, actors’ faces recognition in movie frames, full-text search of musical compositions etc. The recently released music generation technology will also be discussed. Examples from real services with a multi-million audience will be given.

The report provides an overview of the technologies used in Yandex Media Services related to media data processing and discusses the role of machine learning and crowdsourcing methods in the implementation of each of them. Some of these technologies work directly with audio and video and some, in contrast, use only their metadata (usually text). Examples of both cases will be given. We will talk about recognition of a musical composition based on short audio fragments recorded from the microphone of a client device or taken from the audio track of a certain movie. The recognition of actors' faces in the movie frames will also be discussed. We will also cover several tasks that ensure the functioning of the musical scenario of Alice voice assistant and required machine learning or crowdsourcing techniques to implement. Finally, we will present the technology of automatic music generation, which became the basis for a new product of the Yandex Music service, called Neuromusic. This technology is a hybrid of algorithmic methods based on expert knowledge and machine learning methods. Machine learning is used to generate melodic fragments, which are later incorporated into an algorithmically controlled musical canvas. The report discusses the structure of the technology in general and the generation of melodies, in particular.

The talk was accepted to the conference program

Ashot Vardanian

Unum Сloud

Designing the fastest ACID Key-Value Store

15 December, 12:20, «03 Hall. Queen Erato»

One node. One CPU socket. 20 GB/s of mixed random I/O in ACID transactions in persistent memory on 10 TB+ collections. In production. 35 GB/s in the lab.

How did Unum reach those numbers? What can GPUs bring to the table? And how does the Linux kernel stand in our way?

The talk was accepted to the conference program
Neural networks (1)

Alexey Voropaev

Evocargo

Perception system of a truly autonomous truck

15 December, 17:00, «03 Hall. Queen Erato»

Autonomous cars wear a bunch of sensors to detect obstacles in time, day or night. But perception isn’t only about cameras or lidars. We train neural networks, design data pipelines, annotate data (not draining your budget), build the infrastructure (surely, with micro-services at its base).

Object: autonomous light trucks without a driver’s cabin that transport cargo 24/7
Setting: an enclosed area at a logistics hub
Our mission: to develop a perception system for such trucks

I’ll talk about how we have solved major challenges in building a perception system and learned to…
- detect people, cars, e-scooters, etc.
- recognize debris on the roadbed to exclude them from the drivable area
- annotate data used in neural networks training, and reduce annotation costs
- and, finally, ensure that algorithms and calculations won’t fry the onboard computer.

Bonus: No bragging about theoretical concepts or how others do it — I’ll show you how our autonomous vehicles see the world based on our own experience in the fields and give real-life examples of how they operate at our clients’ sites.

The talk was accepted to the conference program
Enterprise Systems Performance (3)

Daniel Podolsky

.

TTM as a main KPI: pain and humiliation

15 December, 17:00, «01 Hall. Tigran»

First of all: I’m an engineer and I’m looking at almost everything from the engineers perspective.
We all know how hard the work in the go-go-go project could be. I was there as a developer, lead, CTO and do remember every lovely day I was there.
And I think we could make the situation better!

The talk was accepted to the conference program

Igor Solovyov

Yandex

Level up your Optimization Process: How to Implement Distributed Profiling and Why you Want to Have It

15 December, 10:00, «04 Hall. Ashot Yerkat»

In the Yandex advertising infrastructure team we have learned how to solve some problems related to profiling during optimization. The solution is quite different from more common trace-based like jaeger and local profiling using tools like gdb.

I’m going to cover the following topics in my talk:
— What is distributed profiling
— How distributed profiling works in Yandex Recommendation Systems Infrastructure Service
— Why you might want to use distributed profiling even if you know how to profile locally
— How to implement distributed profiling even if you don’t know how to profile locally
— Interesting features for profiling
— Usage examples

The talk was accepted to the conference program

Pavel Lakosnikov

Avito

Documentation as a way not to fail in microservices

15 December, 11:10, «04 Hall. Ashot Yerkat»

Without a good documentation process your microservice architecture will be doomed. Documentation processes are the best way of creating lots of microservices, sharing context between teams.

You’ll see what exactly engineers want to see in good documentation to be successful and happy.

Different types of Codegen process will be on focus.

The talk was accepted to the conference program
DevOps and Maintenance (6)

Addison Schultz

Miro

Developing a best-in-class deprecation strategy for your features or products

16 December, 10:00, «02 Hall. Ararat»

Nobody likes ambiguity—especially when it comes to the stability of an endpoint or a feature, and the expectations for availability long term. Avoid common pitfalls and explore a critical area where trust is built with developers through thoughtful, best-in-class deprecation strategy.

The talk was accepted to the conference program

Viktor Vedmich

Amazon Web Services

Karpenter: Efficient scaling of Kubernetes clusters

16 December, 17:00, «03 Hall. Queen Erato»

Karpenter - the new groupless cluster autoscaler, that can dramatically improve the efficiency and cost of running workloads on your cluster.

The cloud is all about elasticity and right-sizing. Microservice architectures are keen to be implemented in containers, and Kubernetes arguably the most common orchestrator to run them. These containers, being (usually) stateless, are good candidates to use the cloud elasticity, so scaling them should be a well-known task. We’ll talk about different scaling approaches and focus on Karpenter - the new groupless cluster autoscaler, that can dramatically improve the efficiency and cost of running workloads on your cluster.

Note: The session will include demo part - to compare with autoscaler and how to work with karpenter.

The talk was accepted to the conference program

Andrei Kvapil (kvaps)

CEO & Founder at Ænix

KubeVirt, its networking, and how we brought it to the next level

16 December, 15:50, «03 Hall. Queen Erato»

Short abstract
When choosing KubeVirt as our main virtualization solution, we were unsatisfied with the existing networking implementation. We developed and contributed some enhancements to simplify the design and get the most performance out of the network using KubeVirt.

Full abstract
In this talk I’ll show you how the network operates in KubeVirt as well as the unique features we implemented to enhance it. The following topics will be covered:
- How to run mutable VMs in immutable K8s Pods;
- What is the difference between KubeVirt and traditional cloud platforms;
- What’s wrong with networking in KubeVirt;
- How a live VM migration is performed;
- Communication with the community and contributing process.

The talk was accepted to the conference program

Vadim Ponomarev

cloudification.io

You need Cloud to manage Cloud: Kubernetes as best way to manage OpenStack cloud

16 December, 11:10, «04 Hall. Ashot Yerkat»

We will discuss OpenStack as a microservice application. What you will have to face if you want to launch your own cloud based on OpenStack. What problems arise when you deploy such a large and complex system, how to maintain it, and provide a highly available cloud for your client. And how Kubernetes can (or cannot) help with this. In this talk, based on experience, Vadim will tell you about the tricks, the most common mistakes, and how they can be solved.

The talk was accepted to the conference program

Igor Latkin

KTS

How we reduced logs costs by moving from Elasticsearch to Grafana Loki

16 December, 13:30, «01 Hall. Tigran»

Elasticsearch cluster with billions of log lines can consume terrabytes of disk size. Grafana Loki can be a good candidate for storing and querying logs in large environments. In this talk we will focus on maximizing Loki’s performance and on a task of transferring logs to it in an efficient manner.

Elastic Stack was until a certain time the de facto standard for collecting and processing logs for Kubernetes clusters. However, it is known to be pretty demanding on computing resources such as CPU, RAM and disk usage. Therefore, new players appear on the market, offering alternative solutions, and one of them is Grafana Loki. In case you decide that you need to change the logging stack, there are several problems and questions that need to be answered.

In this talk I will share our experience at KTS of migrating logs from Elasticsearch cluster to Loki, what difficulties we encountered along the way, how we solved them, and how much money we saved in the end.

We’ll also discuss topics such as:
* Architectural differences between the ELK/EFK stack and Grafana Loki
* How Loki allows you to save a lot on the logging infrastructure
* How not to get into a cloud provider’s vendor-lock - here we will analyze the principles of boltdb-shipper and interaction with an S3 storage
* What “knobs” you can tweak in the Loki configuration for it to work at maximum performance
* And most importantly - what to do if the logs are currently in the Elasticsearch cluster, and how to transfer them to Loki in adequate time - I will share our own experience and solution.

The talk was accepted to the conference program

Anton Bystrov

Percona/Simbirsoft

How to create dashboard-"story" for highload

16 December, 12:20, «04 Hall. Ashot Yerkat»

I'll tell our story how we create a new home dashboard for monitoring instances that contains 100…200…300…+ nodes. How we catch issues with performance in old version. Methods that were used to find “bottleneck” in performance. Also want to tell about differences in monitoring strategies and how we compiled methods in our new dashboard. We also solved our main problem with performance and created new dashboard with a story for drilling down into more detailed level.

The talk was accepted to the conference program
Security, DevSecOps (3)

Alon Kiriati

Dropbox

The clashes of the titans - Usability vs. Security. Can they live together?

16 December, 13:30, «02 Hall. Ararat»

Security is critical for every app, but can also be annoying and complicate simple flows. In this talk we will see how using a smart algorithm can make your UX simple without compromising security, by focusing on an example use case - password estimation

Security is critical for each and every app. One mistake or one breach can kill a business. However, security can sometimes be annoying and complicates simple flows. In this talk we will dive into one of these use cases - password strength. We will cover the importance of your customer’s passwords, learn how to estimate them and see tools that will help you evaluate those. We will also discover the trade-offs between product & security and how using a smart algorithm can make your UX simple without compromising security.

The talk was accepted to the conference program

Edgar Mikayelyan

Qrator Labs

Evolution of Distributed Denial of Service attacks on the Internet: 1994 up to the present

15 December, 13:30, «01 Hall. Tigran»

Working with customers from a variety of industries our team has accumulated a big scope of data about evolution of tools for organizing DDoS attacks and methods of their mitigation. I will share our key findings in the field of DDoS attacks problematics and our vision of their future development.

At Qrator Labs we are deeply engaged in research of DDoS attacks, real-time traffic filtering, bots’ activity extending expertise in our core competencies year by year. In my keynote presentation I would like to bring some results and conclusions of our longstanding study in the field of network security and business resilience: ⁃ Most important time milestones of DDoS attacks evolution in terms of technology and public problem perception: Panix, Sony, XboxLive/PSN, Mirai, Memcached. ⁃ How web resources and APIs parsing tools were being developed alongside with DDoS attacks, how scraper bots were created and grew in sophistication and large-scale use. ⁃ Technologies and innovations improving our life on the Internet and breaking new grounds for DDoS and bot attacks. ⁃ Lessons we learned and conclusions we drew from many years’ experience in this field.

The talk was accepted to the conference program

Artem Bachevsky

Independent Researcher

Breaking license

16 December, 12:20, «02 Hall. Ararat»

We often meet with licensed software, but not always dive into how it works.

In talk we’ll discuss:
- How is software licensed?
- Pros and cons of different ways of software protection
- How to break it?
- And finally, how to develop unbreakable software protection and not to “break” your customers?

The talk was accepted to the conference program
Internet of Things (1)

Alina Dima

Amazon Web Services

Moving beyond prototypes: Building resilience at scale in your IoT application

15 December, 14:40, «02 Hall. Ararat»

If 1% of your 100-device fleet goes offline, it’s 1 device. Maybe your use-case can live with that. But can your use-case be bulletproof with 1% of 10 million devices (100000) going offline? If the answer is no, then it is time to learn about resilience at scale.

At scale, the overall health of your IoT application (Edge and Cloud) can be affected by events outside of your control. Here are some examples: the network provider drops the connection, or a high % of your fleet comes online at the same time. In the IoT problem space it is your responsibility as an engineer to handle not only the resilience of an Edge application instance and its interaction with the Cloud, but also the collective resilience of 10s of 1000s of Edge application instances connecting and communicating with your Cloud application, and the impact of them all performing or not performing an action simultaneously. Problems that seem minor at small scale, such as: 1% of your fleet going offline and coming online at the same time, might become major at large scale. Is your IoT application safe from a self-inflicted DDos attack?

This session will focus on explaining resilience at scale, and how scale uncovers problems you don’t see otherwise, and providing examples of mitigation strategies you can build with AWS IoT, to ensure that your IoT application in its entirety is operating reliably at scale.

Key takeaways:
- Understanding what resilience at scale is, with concrete examples of what could go wrong
- Learn how to ensure your IoT application is resilient at scale using AWS IoT
- Take home a mental model for building resilience at scale

The talk was accepted to the conference program
System administration, hardware (1)

Crux CONCEPTION, M.Psych

CRUX CONCEPTION

Insider Threat: What is Social Engineering?

16 December, 10:00, «01 Hall. Tigran»

Retired Criminal Profiler & Hostage Negotiator, Crux Conception has taken his years of training, education, and experience to develop a method that will allow individuals within The Tech Community to utilize social, people, and observation skills to detect potential theft and acts of company espionage.

By converting your ordinary social and observational skills into simple criminal/psychological profiling techniques.

When we hear the word “Ransomware,” is it possible that before a cyberattack is initiated and hackers/cyberthieves penetrate through your online security system, someone on the inside offered valuable information to the hackers, giving them the ability to hold your company for RANSOME?

Is it possible to think that an individual or organization, worlds away, actually-know how much a company is willing to pay? Are we that unaware to think that someone on the inside supported the hackers/cyberthieves with valuable information regarding your company’s security system and protocols?

Is it possible that someone inside has specific information about how much a company is willing to pay, and its product is considered a valuable resource to its vast customer base?

For example, what if AT&T had a disgruntled employee (with pertinent knowledge regarding AT&T’s security system and protocol). To a SOCIAL ENGINEER, this is the perfect candidate to recruit, gather valuable data, and then relay that information to a team of hackers/cyberthieves.

The talk was accepted to the conference program
Video, streaming video (2)

Olga Popova

Yandex

Reducing traffic wastage in video player

15 December, 17:00, «04 Hall. Ashot Yerkat»

Content delivery is expensive for video streaming. I will tell you how to reduce video traffic wastage from the client side by changing technical parameters of the video player. By varying the buffer size and the selected video quality in the player we'll save the company money without loss of QoE.

In my speech I’m going to talk about:
1. Theory:
- How do we lose traffic? Basic scenarios in terms of video player processes
- On what video player aspects can we influence? On the buffer size and chosen bit rate
- QoE (quality of experience metric) vs reducing data wastage
- Few words about our logs. How do we connect traffic with product metrics?
- The evolution of the reducing traffic KPI-metric

2. Harsh reality (our hypotheses and results of real experiments):
- Hypothesis: Buffer limit to X seconds
- Hypothesis: Dynamic buffer
- Hypothesis: Skippable fragments
- Hypothesis: Viewport capping
- Hypothesis: Aesthete capping
- Hypothesis: SwitchUp capping

3. Results and conclusions.
Let’s talk about which hypotheses are suitable for online cinemas, and which are suitable for video hosting.

The talk was accepted to the conference program

Anton Kortunov

Yandex

Things they never tell you about video streaming

15 December, 15:50, «01 Hall. Tigran»

There are IT engineers that make view streaming services. There are video engineers that know how to shoot video. These two sets almost don’t intersect. In this talk I, as an IT guy, will share with you what I have learned from video guys over the last 5 years.

Almost nobody knows how easy it is to drop video quality inside the video processing pipeline. In my talk I’ll go through these sections:
* why is audio track more important than video track
* video is not just a sequence of images, how to use shutter speed on video cameras, stroboscopic effects
* interlaced videos: why does it exist and how to deal with it
* frame rate conversion and its disadvantages
* image gamma and image resize: why do you resize images incorrectly

The talk was accepted to the conference program
QA, Stress testing (2)

Evgeny Potapov

DevOpsProdigy, USA

10 mistakes of a (high)load testing in 2022

15 December, 10:00, «02 Hall. Ararat»

Surprisingly, professionally organized load testing is the area of enterprise projects, banking, and government systems. Professionals and professional teams dedicated specifically to load testing processes work and this process is put on stream.

What is even more interesting is that the niche of load testing is the area of QA specialists, there is one group of people who write and perform such tests, and separately there are engineering teams that do something based on the results of testing. The situation is reminiscent of the distinction between Devs and Ops before 2008.

In commercial web development the situation is different: in most projects, with the exception of very large ones, load testing is carried out "in so far as", most often by the engineers themselves who developed the project. Time for this is allocated according to the residual principle, test scenarios are often worked out “by eye”.

While there are attempts to build load testing into CI/CD, this comes with its own challenges. Business people and management want to have builds and deploys as fast as possible, and adding a service load testing is a huge overhead. Even more, people are not ready, actually, to DDoS production environment every time when someone is doing a deployment. Load testing on such projects happens really rarely, from time to time, on special occasions, and software engineers don’t get the required experience, that they would get doing this regularly.

The results of the load testing in such cases might be completely incorrect, and the problem is not just the wrong numbers. Those wrong results might say that the limit is the sky and business decisions are based on these. It might be a huge marketing campaign that will overflow the servers, it might be a decision to restrain from investing in architecture scaling, or just the decision to release the project when it’s not ready to accept the traffic.

Common mistakes:
* The project was tested for 5 minutes instead of a long time;
* The test profile was defined incorrectly;
* Staging/Pre-prod environments were tested and production has a completely different infrastructure;
* site returned HTTP 200 when it wasn't actually working;
* some of the microservices were tested in isolation from others and in production they work in connection to each other;
* and a million more reasons.

In my presentation I want to go over the main problems that we see in our work and which lead to incorrect results of load testing or to incorrect interpretation of test results. I will tell you how to avoid them both from a technical point of view and from an organizational one (in a conversation with a business), and how to try to integrate the testing process into the regular process of development, breaking down silos.

The talk was accepted to the conference program

Karen Mkhitaryan

Ameriabank

The Role of API Testing and the Importance of API Automation

15 December, 11:10, «02 Hall. Ararat»

We all write Automated tests for our applications, but...

1. Do we really understand the importance and the advantages of API test automation?
2. Do we write API automated tests?
3. Do we use API automation in UI tests?

During the speach we will discuss a lot of points related to API automation and getting familiar with API automation using Java and REST Assured.

The talk was accepted to the conference program
TechTalks (2)

Anton Kortunov

Yandex

Videos in Yandex and where to find them

15 December, 12:10, «02 Hall. Ararat»

Yandex services have been working with video for more than a decade. That includes video search, Kinopoisk, and even an experiment with our own small video hosting service.
In 2017, Yandex Efir came into the world in line with all the latest in technology and powerful infrastructure, priming it to handle Kinopoisk HD and all our other video projects.
In this tech talk, we’ll be going back to the project’s genesis, looking at its infrastructure and development challenges, and taking a peek into the future to see what lies ahead for streaming services.

The talk was accepted to the conference program

Oleg Bondar

Yandex Infrastructure

YDB — Open-Source Distributed SQL Database from Yandex

16 December, 12:10, «02 Hall. Ararat»

There are many well-known open-source projects by Yandex. There are frameworks for writing your own services like userver, complex, math-heavy projects for machine learning like CatBoost, frameworks for frontend developers like DivKit, trained machine learning models (YaLM), resilient scalable databases for high loads, and much more.
In addition to open-source projects, Yandex is engaged in developing various standards. For example, thanks to the WG21 initiative, the new C++ standards now contain ideas and improvements suggested by our developers.
I’m going to talk about open source and YDB — our open-source distributed SQL database.

The talk was accepted to the conference program

Architectures, scalability (19)

Expanding From Consumer into Enterprise with APIs: build, learn, refine

Yandex Query - serverless federated query system. Inside view

World constants 2022

Practical aspects of B+ trees

Effective and Reliable Microservices

Microservices on C++, or why we made our own framework

QUIC and HTTP/3

Cheats & mistakes to read and create SLAs

Fair threaded task scheduler verified in TLA+

Not your ordinary CDN

How to choose a queue properly

FPGA Basic Principles: An Introduction to How It Works

Theory of programming: packaging principles

Kafka for Golang developers: tips and tricks

Having cake and eating it too: painless and efficient cluster utilization for data scientists

From MVP to Reality. Transition Problems and Solutions

Everyday Practical Vectorization

Push-notifications in RuStore: how we built an alternative transport to replace Google Firebase

To Rust or not to Rust: 3 years in production with exchange matching engine

Databases and storage systems (11)

An introduction to TiDB

How to create a fully transparent MongoDB database cluster holding terabytes of data serving hundreds of millions of users simultaneously

Designing a more efficient OLAP database data flow and architecture

How we cook Foundation DB

How did we build rebalance into the distributed database architecture

Miro canvas content migration from Postgres to the in-memory DB + S3

PostgreSQL a journey from 0Tb to 40Tb in 4 years

NoSQL and transactions: getting the numbers out

The 2% Solution

Building an open-source MongoDB-compatible database on top of PostgreSQL

Transactional queues in PostgreSQL

Architecture, Design patterns (1)

ID battle: UUID vs auto increment

BigData and Machine Learning (6)

On one interesting generalization of the Leaky Bucket algorithm and Morris's counters

Building an open-source online Learn-to-Rank engine

ML experiment tracking with VScode, Git and DVC

Machine Learning in the audio domain: when the neural network is overkill or where are the limits of lightweight models

Machine learning in media services

Designing the fastest ACID Key-Value Store

Neural networks (1)

Perception system of a truly autonomous truck

Enterprise Systems Performance (3)

TTM as a main KPI: pain and humiliation

Level up your Optimization Process: How to Implement Distributed Profiling and Why you Want to Have It

Documentation as a way not to fail in microservices

DevOps and Maintenance (6)

Developing a best-in-class deprecation strategy for your features or products

Karpenter: Efficient scaling of Kubernetes clusters

KubeVirt, its networking, and how we brought it to the next level

You need Cloud to manage Cloud: Kubernetes as best way to manage OpenStack cloud

How we reduced logs costs by moving from Elasticsearch to Grafana Loki

How to create dashboard-"story" for highload

Security, DevSecOps (3)

The clashes of the titans - Usability vs. Security. Can they live together?

Evolution of Distributed Denial of Service attacks on the Internet: 1994 up to the present

Breaking license

Internet of Things (1)

Moving beyond prototypes: Building resilience at scale in your IoT application

System administration, hardware (1)

Insider Threat: What is Social Engineering?

Video, streaming video (2)

Reducing traffic wastage in video player

Things they never tell you about video streaming

QA, Stress testing (2)

10 mistakes of a (high)load testing in 2022

The Role of API Testing and the Importance of API Automation

TechTalks (2)

Videos in Yandex and where to find them

YDB — Open-Source Distributed SQL Database from Yandex