HighLoad++ Armenia Hub

Talks

Architectures, scalability (8)

Kafka is a distributed messaging system capable of delivering high performance. During the talk I’ll explain the architecture of a broker and client parts, emphasising on design concepts enabling high performance. It will be of use for system designers and overall understanding of Kafka.

The talk was accepted to the conference program

Trust matters. We rely on our providers’ SLAs and share our “designed for” SLOs. We need to trust and gain trust to deliver reliable solutions. The system’s reliability vital characteristics are Availability and Durability. Unfortunately, sometimes we mismeasure, hide or even distort them.

During the session, we discuss common mistakes and problems with reliability SLAs. Examples illustrate tips to read, share and compare the numbers of 9s. Patterns to improve Availability and Durability support the concept of trustful reliability.

The talk was accepted to the conference program

When you need to implement new system with tight latency requirements you face the problem of selecting the right implementation ecosystem. Most of low-latency systems nowadays are implemented in C/C++. There are upcoming rivals like Golang. But one contender in not so popular in main-stream – Rust.

The talk was accepted to the conference program

Short abstract
I will tell about approaches of queue usage and key parameters worth looking at like scalability, durability, guaranteed delivery, availability vs consistency and throughput.

Full abstract
Most of microservice architectures and distributed applications require some kind of messaging service. It is called message broker or message queue as an inevitable element of system design.

There are quite a few of them, with their own pros and cons. The wrong choice of this component could lead to problems with scalability or fault tolerance in your application.

In my talk I point key moments in choosing a queue, as well as guide you through the comparison of RabbitMQ, Kafka, NATS and other candidates.

The talk was accepted to the conference program

Short abstract
In this talk Oleg will try to cover the advantages of stateful vs stateless microservices, discuss how statefulness affects reliability and accessibility of services and how it helps to build faster applications.

Full abstract
Odnoklassniki is one of the most popular social networks in CIS and the top 6 globally. It is in the top 20 sites among similar web’s top global websites list. More than 70 million people use Odnoklassniki regularly to share their valuable stories with friends and family, watch and stream videos, listen to music, and play games together.

Odnoklassniki employs hundreds of different microservice applications to serve users’ requests. Many of these services are built as stateful applications - they store their data locally, embedding a Cassandra database into the application’s JVM process. This challenges the usual way of building applications - a stateless microservice with a separate remotely accessible database cluster.

In this talk Oleg will try to cover the advantages of stateful vs stateless microservices, discuss how statefulness affects reliability and accessibility of services and how it helps to build faster applications. We’ll go step-by-step through building a stateful application service, delving into its architecture, major components as well as significant challenges and their solutions.

The talk was accepted to the conference program

Short abstract
In only a few years, the number of options available to run containers in the Cloud has literally exploded. It’s easy to get overwhelmed! This talk will categorize the various options available on AWS, Azure & GCP focusing on what is state-of-the-art in 2022 including some of the new serverless opt.

Full abstract
In only a few years, the number of options available to run containers in the Cloud has literally exploded. Each provider now offers tens of “slightly different” services, each with its own minor trade-offs. Furthermore, running your applications in 2022 is definitely not like doing it in 2019: some of the new serverless options offer unique value propositions that shouldn’t be missed. It’s easy to get overwhelmed! This talk will categorize the various options available on AWS, Azure & GCP, focusing on what is state-of-the-art in 2022. We’ll look at Kubernetes and its evolution. Finally, we’ll explain the trade-offs between different categories from a technical and organizational standpoint. We’ll then do a deep dive with a demo on some of the new services that have been recently launched and that are quickly evolving to change the game: GCP Cloud Run, Azure Container Instances, and AWS Fargate.

The Program Committee has not yet taken a decision on this talk

Why don’t we have one FS to rule them all yet?
Let’s look at battle tested ZFS, it’s promising architecture and concepts in comparison with other FSes, why you’ll want to use it (who said bitrot?), and why you may not use it (mmm, zero copy on CoW).
And don’t forget about people behind FS.

What file system to choose in open source world? What’s the difference? Why do this talk’s author became a ZFS contributor?
Let’s look at battle tested Zettabyte file system, it’s promising architecture and concepts in comparison with other FSes, why you’ll want to use it (who said bitrot?), and why you may not use it (mmm, zero copy on CoW).
And don’t forget about people - we’ll look at developers’ community, it’s main drivers, how FS can be truly crossplatform, and how can FS live outside of Linux kernel and still hope to be a first class citizen.

The talk was accepted to the conference program

Let’s talk about what tricks there are for optimizing an application with millions of users. How to reduce heavy operations and latency. How to ditch the expensive GPU machines. And what optimizations will come in handy when you add ML-based recommendations.

The talk was accepted to the conference program

Databases and storage systems (9)

Short abstract
In this presentations we will look at choices you have for running MySQL on Kubernetes, Best practices to follow in MySQL and Kubernetes to achieve optimal performance, availability and security.

Full abstract
As Kubernetes becomes a more and more mature solution to run stateful applications, more and more people consider it as a platform to run MySQL, or already running it in Production. Yet Kubernetes is a whole new universe for Database Engineers and not following Kubernetes Best Practices can cause Poor Performance, Downtime or Data Loss. In this presentations we will look at choices you have for running MySQL on Kubernetes, Best practices to follow in MySQL and Kubernetes to achieve optimal performance, availability and security.

The Program Committee has not yet taken a decision on this talk

Short abstract
We’ll look at a problem requiring to use of transactional queues. Also, you’ll see how to make a bulletproof queue in your PostgreSQL and look through existing alternatives.

Full abstract
The talk is about a recipe for the transaction inbox pattern (https://microservices.io/patterns/data/transactional-outbox.html) for PostgreSQL.

This pattern allows an atomic update in PostgreSQL and an external queue (or another service).

The implementation is pretty challenging, so there is a recommendation to use a ready-to-go solution like PgQ (https://github.com/pgq/pgq). But PgQ has some disadvantages (requires demon-process along with PostgreSQL, provides generic queues that are redundant, has a lack of documentation), and doesn’t fit for all (doesn’t applicable for managed DB).

The talk was accepted to the conference program

Short abstract
Running databases on Kubernetes already? How about boosting the performance by utilizing local node storage? Sounds great, but you will also learn about the downside of this approach and issues you might face.

Full abstract
At Percona we have lots of experience running databases on Kubernetes - MySQL, MongoDB, PostgreSQL. We had multiple interactions with users and customers, who were tuning their databases for performance on K8s and decided to use local storage. Others were trying to save money - ex. use NVMe SSDs instead of EBS. We want to share these stories and provide more insights into various performance and stability implications when running databases on Kubernetes on local storage.

The plan of the talk is the following: - intro into DBs on k8s - why would anyone want to use local storage - review various solutions for local storage - look deeper into known issues and possible problems - measure performance of various solutions (on the example of MySQL)

The talk was accepted to the conference program

Short abstract
We will look into the differences between MySQL and MariaDB in the core areas such as SQL features, query optimizations, replication, storage engines, and security. We will also discuss the unique features and capabilities MySQL 8 and MariaDB 10.8 offer to different applications.

Full abstract
MySQL 8 and MariaDB 10.8 are the latest Major versions for MySQL and MariaDB. While MariaDB started by being a slightly different MySQL variant, now it has grown into a very much different database platform that grows more different with every release.

In this presentation, we will look into the differences between MySQL and MariaDB in the core areas such as SQL features, query optimizations, replication, storage engines, and security. We will also discuss the unique features and capabilities MySQL 8 and MariaDB 10.8 offer to different applications. I will provide some recommendations for choosing the database options. 

Moreover, we will pay attention to Gallera 4, the latest version of replication technology used both in MySQL 8 and MariaDB 10.8 for managing secure and high-performance database clusters.

The Program Committee has not yet taken a decision on this talk

TiDB scales writes without adding extra load on the developers working with the database. It can also combine OLTP and OLAP workloads in a hybrid solution (also known as HTAP). And all of this while being compatible with the MySQL protocol.

The Program Committee has not yet taken a decision on this talk

We built an open source instrument that benchmarks transactional workload with multiple NoSQL vendors in the cloud. In this talk, I’ll present the tool, the method, and benchmarking results. MongoDB, CockroachDB, FoundationDB and YDB are covered to a different extent.

In an effort to provide both consistency and scalability the NoSQL ecosystem has been rapidly providing transaction support. From pioneers, focused squarely on scaling transactional workloads, to followers, adding transaction support to eventually consistent data stores, more and more vendors are trying to get on board of the relational database train. The idea of serverless scalability of a strongly consistent workload is quite attractive, so we evaluated a few vendors from the cost per transaction perspective, comparing their performance with one popular open source relational database. In this talk I’ll present the method, the tool, which we made available online, and the evaluation results.

The talk was accepted to the conference program

PostgreSQL is an amazing database, very versatile and capable of processing both online analytics and transactional workloads. Yet operating PG past a certain scale (>1Tb) is challenging and mistakes are costly. In this talk, we will share our experience as our PG databases grew from 0Tb to 40Tb.

The Program Committee has not yet taken a decision on this talk

Short abstract
It has been an exciting year in the open source database industry, with more choice, more cloud, and key changes in the industry. We will dive into the key developments over 2022.

Full abstract
It has been an exciting year in the open source database industry, with more choice, more cloud, and key changes in the industry. We will dive into the key developments over 2022, including the most important open source database software releases in general, the significance of cloud-native solutions in a multi-vendor, multi-cloud world, the new criticality of security challenges, and the evolution of the open source software industry.

The Program Committee has not yet taken a decision on this talk

Short abstract
So you’re looking to run your Open Source Database on Kubernetes. What best practices should you follow and what pitfalls should you avoid? We will look at how to run stateful applications on Kubernetes overall as well as cover high availability, security, backups, and disaster recovery.

Full abstract
So you’re looking to run your Open Source Database on Kubernetes. What best practices should you follow and what pitfalls should you avoid? In this presentation we will look at how to run stateful applications on Kubernetes overall as well as what is particularly important for databases - we will cover high availability, security, backups, and disaster recovery. Finally, we will show how these practices can be implemented with Percona Operators for MySQL, MongoDB, and PostgreSQL - one of the leading solutions to run Open Source Databases on Kubernetes.

The Program Committee has not yet taken a decision on this talk

BigData and Machine Learning (1)

Short abstract
AutoML automates each step of the ML workflow so that it’s easier to use machine learning. In this session I will cover AutoGluon, library for ML practitioners seeking an open source solution, and Amazon SageMaker tools for data scientists who prefer a fully-managed service.

Full abstract
Many companies are interested in implementing Machine Learning in their application. Most of them making a decision to build a new custom model and it takes a lot of time while there are many SOTA model already available as an open source packages / libraries. AutoML is one of the concept which democratise machine learning and allows to build a strong model, without deep knowledge, in a few lines of code. Amazon invested a lot in this are by building several AWS services and open source library and this talk will be focused on them.

Note
70% of my talk I will focus on AutoGluon (open source) library which use Stack ensembling technique to build super strong models for most of problems. This library is able to work even with Multi Modal Data (when you have text, images, tabular in a same dataset). I will cover what is Stack ensembling technique, how we implemented it and several cool features which we implemented in this library, like Adaptive Early Stoping, Inference limit, etc.

The Program Committee has not yet taken a decision on this talk

Neural networks (1)

Short abstract
AutoML democratise machine learning. AWS provides a range of AutoML solutions for all levels of expertise. In this talk I will introduce open source library AutoGluon, which is able to achieve high scores in ML competitions with just 3 lines of code and a few hours of training.

Full abstract
AutoGluon is an open source library developed by Amazon and based on Stack ensembling technique and it could be used for Tabular, Text, Images, Multimodal and TimeSeries data. With 3 lines of code you could build a strong model for many problems. While it is super easy to use, you still have a full control and could influence on how models are trained and ensemble. will cover what is Stack ensembling technique, how we implemented it and several cool features which we implemented in this library, like Adaptive Early Stoping, Inference limit, etc.

The Program Committee has not yet taken a decision on this talk

DevOps and Maintenance (4)

Short abstract
Karpenter - the new groupless cluster autoscaler, that can dramatically improve the efficiency and cost of running workloads on your cluster.

Full abstract
The cloud is all about elasticity and right-sizing. Microservice architectures are keen to be implemented in containers, and Kubernetes arguably the most common orchestrator to run them. These containers, being (usually) stateless, are good candidates to use the cloud elasticity, so scaling them should be a well-known task. We’ll talk about different scaling approaches and focus on Karpenter - the new groupless cluster autoscaler, that can dramatically improve the efficiency and cost of running workloads on your cluster.

Note: The session will include demo part - to compare with autoscaler and how to work with karpenter.

The talk was accepted to the conference program

Short abstract
When choosing KubeVirt as our main virtualization solution, we were unsatisfied with the existing networking implementation. We developed and contributed some enhancements to simplify the design and get the most performance out of the network using KubeVirt.

Full abstract
In this talk, I’ll show you how the network operates in KubeVirt as well as the unique features we implemented to enhance it. The following topics will be covered:

- How to run mutable VMs in immutable K8s Pods;
- What is the difference between KubeVirt and traditional cloud platforms;
- What’s wrong with networking in KubeVirt;
- How a live VM migration is performed;
- Communication with the community and contributing process.

The talk was accepted to the conference program

Short abstract
We will begin by examining how MySQL database consistency is achieved via two replication methods, then using a variety of Operators we’ll demonstrate database deployment. * Write Set Replication wsrep (Percona XtraDB Cluster, MariaDB Galera Cluster) * MySQL Group Replication (MySQL InnoDB Cluster)

Full abstract
We’ll be reviewing the provisioning, maintenance, and monitoring of MySQL instances using the following three operators: * MySQL Operator for MySQL Group Replication * Percona Operator for PXC Percona XtraDB Cluster * Percona Operator for MySQL Group Replication

The talk was accepted to the conference program

Short abstract
Imagine that your set of services is needed by a lot of new customers as one installation inside their infrastructure. How would you release such product and deploy and support your own production at the same time? This story is about it.

Full abstract
2GIS.KIT is a set of services which one could use as a service. Every service is developed by separate development team with own process, release cycles, deployment strategy and so on. One day a customer came and said “I need all your services inside my infrastructure. I don’t have access to Internet”. Since that time we had to invent new processes and technologies how to build these services into one product and deliver it to customer. We reviewed a lot of questions: - How to develop our services? - How to test them? - How to change Support Process? And many many more. In this talk I will share our way and results.

The Program Committee has not yet taken a decision on this talk

Security, DevSecOps (1)

Working with customers from a variety of industries our team have accumulated a big scope of data about evolution of tools for organising DDoS attacks and methods of their mitigation. I will share our key findings in the field of DDoS attacks problematics and our vision of their future development.

At Qrator Labs we are deeply engaged in research of DDoS attacks, real-time traffic filtering, bots activity extending expertise in our core competencies year by year. In my keynote presentation I would like to bring some results and conclusions of our longstanding study in the field of network security and business resilience: ⁃ Most important time milestones of DDoS attacks evolution in terms of technology and public problem perception: Panix, Sony, XboxLive/PSN, Mirai, memcached. ⁃ How web resources and APIs parsing tools were developing alongside with DDoS attacks, how scraper bots were created and grew in sophistication and large scale use. ⁃ Technologies and innovations improving our life on the Internet and breaking new grounds for DDoS and bot attacks. ⁃ Lessons we learned and conclusions we drew from many years’ experience in this field.

The talk was accepted to the conference program

System administration, hardware (2)

As a now-retired criminal/behavioral profiler, I will engage the audience:
By outlining the psychological aspects of Social Engineering, Data Breaching, and the prevention of Data loss.
By converting your everyday social and observational skills.

The talk was accepted to the conference program

Short abstract
Algorithm for multithreaded task scheduler for languages like C, C++, C#, Rust, Java. C++ version is open-sourced. Features: (1) formally verified in TLA+, (2) even CPU usage across worker threads, (3) coroutine-like functionality, (4) almost entirely lock-free, (5) up to 10 million RPS per thread.

Full abstract
Key points for potential audience: fair task scheduling with multiple worker threads; open-source; algorithms; TLA+ verified; up to 10 million RPS per thread; for backend programmers; algorithm for languages like C++/C/Java/Rust/C#/others.

"Task scheduling" essentially means asynchronous execution of callbacks, functions. Some kind of a "scheduler" is omnipresent in most services - an event loop; a thread-pool for blocking requests; a coroutine engine - you name it. Scheduler is an important basis on top of which the service’s logic can be built.

Gamedev is no exception. I work at Ubisoft - we have miles of code used in thousands of servers, mostly C++. There is a vast deal of task types to execute: download a save, send a chat message, join a clan, etc. Often they compose one multi-step task: (1) take a profile lock, (2) download a save, (3) free the lock, (4) respond to the player. Between each step there is a waiting time until the operation is done.

One of the game engines’ backend code had a simple scheduler generic enough to be used for every async job in all servers. It juggled tasks across several internal worker threads. But it had the following typical issues:

- Unfairness. Tasks were distributed to worker threads in a round-robin. If tasks differ in duration, some threads can appear choking while others are idle.
- Polling. In a naive scheduler multi-step task execution works via periodic wakeup of the task. When awake, the task checks if the current step is done and if it can go to the next one. With many thousands of tasks this polling eats notably more CPU than the actual workload.

The talk presents a new highly efficient general purpose threaded task scheduler algorithm, which solves these problems and achieves even more:

- Complete fairness - even CPU usage across worker threads and no task pinning;
- Coroutine-like - API to wake a task up on a deadline and for immediate wakeup;
- No contention - operation is mostly built on lock-free algorithms;
- Formal correctness - the scheduler is formally verified in TLA+;

After the scheduler was implemented in C++ and embedded into several highly loaded servers, it gave N-fold improvement of both RPS and latency (more than x10 speed up for one server).

At the same time the talk is not C++-specific. It is rather a presentation of several algorithms combined in the scheduler. They can be implemented in many languages: at least C, C++, Rust, Java, C#. Only need support of atomics and threads.

All that is completely open source - https://github.com/ubisoft/task-scheduler.

The talk was accepted to the conference program

Video, streaming video (1)

There are IT engineers that make view streaming services. There are video engineers that know how to shoot video. These two sets almost don’t intersect. In this talk, I, as an IT guy, will share with you what I have learned from video guys over the last 5 years.

Almost nobody knows how easy is it to drop video quality inside the video processing pipeline. In my talk I’ll go through these sections: * why does audio track more important than video track * video is not just a sequence of images, how to use shutter speed on video cameras, stroboscopic effects * interlaced videos: why does it exist and how to deal with it * frame rate conversion and its disadvantages * image gamma and image resize: why do you resize images incorrectly

The talk was accepted to the conference program

QA, Stress testing (1)

A/B testing is the most common way of establish causality and proving that a feature or a product change results in an improvement of your metrics. This talk will give an overview of what A/B experiments are and provide a deep dive into most common challenges and pitfalls, and how to avoid them.

The talk was accepted to the conference program