become our partner
  • Architectures and scalability (19)


    George Melikov

    OpenZFS contributor, VK Cloud

    Why write greedy software? Principles of reconcillation loop (hello, k8s!)

    The world is not ideal - any large system consists of many separate subsystems.
    We cannot control all of them during development and expluatation at once.
    And according to Murphy's law, if anything can go wrong, it WILL go wrong.
    When applied to the creation of distributed systems, this means that absolutely everything may and will break someday.

    And in such conditions, we need to develop software that does not require constant attention from its creator.

    We'll speak about the practices and our experience of creating software with self-healing based on the principles of closed loop automation
    (at last, let's talk about the reasons for the stability of kubernetes),
    compare it with the event-based approach that is common in the industry,
    and honestly admit that the employer will have increased resources overhead and spent money at the expense of our good night’s sleep.

    The talk was accepted to the conference program


    Vladislav Shpilevoy


    First Aid Kit for C/C++ server performance

    Enhancing server performance typically entails achieving one or more of the following objectives: reduce latency, increase number of requests per second (RPS), and minimize CPU and memory usage. These goals can be pursued through architectural modifications, such as eliminating some network hops, distributing data across multiple servers, upgrading to more powerful hardware, and so forth. This talk is not about that.

    I categorize the primary sources of code performance degradation into three groups:

    [*] Thread contention. For instance, too hot mutexes, overly strict order in lock-free operations, and false sharing.
    [*] Heap utilization. Loss is often caused by frequent allocation and deallocation of large objects, and by the absence of intrusive containers at hand.
    [*] Network IO. Socket reads and writes are expensive due to being system calls. Also they can block the thread for a long time, resulting in hacks like adding tens or hundreds more threads. Such measures intensify contention, as well as CPU and memory usage, while neglecting the underlying issue.

    I present a series of concise and straightforward low-level recipes on how to gain performance via code optimizations. While often requiring just a handful of changes, the proposals might amplify the performance N-fold.

    The suggestions target the mentioned bottlenecks caused by certain typical mistakes. Proposed optimizations might render architectural changes not necessary, or even allow to simplify the setup if existing servers start coping with the load effortlessly. As a side effect, the changes can make the code cleaner and reveal more bottlenecks to investigate.

    The talk was accepted to the conference program


    Aleksei Ozeritskii


    On optimizing an SQL engine and SQL queries for OLAP

    The report discusses how we improved the YQL (YDB) engine on the popular OLAP benchmarks - ClickBench and TPCH. Even a year ago, we couldn't pass the TPCH tests on a dataset of 100 gigabytes, and the performance of ClickBench was unsatisfactory. I will talk about how we investigated bottlenecks using perf and bpftrace tools, what specific problems we found. I will talk about improvements in the join algorithm and improvements in the way data is transmitted.
    I will also talk about how we rewrote TPCH plans for optimal execution.
    I will show concrete performance numbers and comparisons to other systems.

    The talk was accepted to the conference program


    Artem Bakuta


    Collaborative editing at scale

    Collaborative editing has become a huge part of our daily lives. Whether it's writing code in IDEs, collaborating on documents using Google Docs, creating diagrams on Miro, conducting job interviews, or even making grocery lists, collaborative editing becomes the backbone of modern communication and productivity.

    However, creating a collaborative service that can handle high load and maintain consistency is not an easy task. That's why two approaches, CRDT and Operational Transformation, have emerged as the go-to solutions. But how do you decide which approach to choose? What are the challenges of creating a collaborative service that can handle high rps? And what are the limitations of collaborative editing?

    These are all important questions that I'll be answering in my talk. I'll provide practical insights on how Yandex built its collaborative engine and addressed these problems. You'll learn everything you need to know about collaborative editing and how to make it reliable and lightning-fast in this presentation!

    Moreover, I'll be diving into the advantages of using CRDT over Operational Transformation, which allows for decentralization, flexibility, and usability, as well as how Yjs is a fast and efficient implementation of CRDT. Additionally, I'll be discussing the impact of latency on the collaborative editing experience and how to ensure that not a single character is lost.

    And lastly, I'll be covering how to ensure the availability of the collaborative engine using Node.js, as well as the challenges of implementing a fault-tolerant system. You'll also learn about our experience of using the engine in production and how it performed.

    So, join me in exploring the exciting world of collaborative editing, and discover how CRDT can make it more robust and easy to maintain.

    The talk was accepted to the conference program


    Andrei Novoselov


    The silver bullet for your magnum

    In this talk, we will cover two different approaches to orchestrating Kubernetes clusters: OpenStack Magnum and Cluster API.

    The story will start with: You and your team are now responsible for the Managed Kubernetes as a Service based on the OpenStack Magnum. Great news! And what is OpenStack Magnum?
    We will talk a little about how Magnum works, what are its pros and cons, what’s wrong with Heat, and what is Heat.

    It will continue to the: Houston we have a problem; I’m pretty sure we have to rewrite the whole service from scratch.
    We will talk about how to find the right words for your Product Manager to make him agree that sometimes revolution is much better than evolution, and why you should get rid of legacy ASAP.

    And in the end, you will know how easy it is to swap service core technology on the go.
    We will talk about pets vs. cattle, about immutable vs. mutable, and how you can do big things with a small team if you automate everything. In other words, in this part, we will learn what perfection is.

    The talk was accepted to the conference program


    Alexey Korepov


    Debugging decoupled applications with microservices using OpenTelemetry

    Monitoring and debugging a simple decoupled application just with a frontend and a single backend is already a tricky task, right?

    Let’s throw in a couple of microservices there in different languages like Python, Go, Node.js, or maybe even Rust, which keep crashing, and a pretty slow and unstable third-party API… And suddenly the task became a real pain in the neck!

    From my session, you will learn how to easily apply OpenTelemetry to all components and get the full observability of what’s happening in your decoupled project!

    You will able to see in a single interface how all the components of your infrastructure perform: all logs, metrics, end-to-end traces - all in one place!

    And even more! For a specific failed operation, that causes an error for a user on the FE, you can easily find and see the whole trace of it, starting from a user click on a button in your React app, to a backtrace of one of your microservices, that crashed at that time.

    Come to listen, and grab the magic pill from your microservices pain! ;)

    The talk was accepted to the conference program


    Edoardo Vacchi


    Putting the asm in Wasm: from bytecode to native

    A WebAssembly runtime is an embeddable virtual machine. This allows platforms to dynamically load and execute third-party bytecode without rebuilding their OS and Arch-specific binary. While WebAssembly is a W3C standard, there are a lot of mechanics required to do this, and many critical aspects are out-of-scope, so left to the implementation.

    Most presentations discuss WebAssembly at a high level, focusing on how end users write a program that compiles to Wasm, with a high-level discussion of how a virtual machine enables this technology. This presentation goes the other way around. This talk overviews a function call beginning with bytecode, its translation to machine code, and finally how it is invoked from host code.

    The implementation used for discussion is the wazero runtime, so this will include some anecdotes that affect these stages, as well as some design aspects for user functions that allow them to behave more efficiently. However, we will also indulge in a high-level comparison with other runtimes and similar technologies that in the past tried to solve the same problem.

    When you leave, you'll know how one WebAssembly runtime implements function calls from a technical point of view.

    The talk was accepted to the conference program


    Ruslan Shakhaev

    Yandex Delivery

    What we learned from production incidents

    About 4 years ago, when we started developing Yandex Delivery, we used all the main patterns for building stable and reliable applications:

    - canary release
    - retries and timeouts
    - rate limiting
    - circuit breaker
    - feature toggling

    Even if one of our datacenters is unavailable, our users will not notice anything. We can enable/disable and configure our features in production in real time, and much more.

    But all this was not enough to prevent the system from experiencing downtime sometimes

    I'll tell you about the non-obvious problems we encountered and what lessons we learned from various production incidents

    Main sections:
    - architectural solutions that lead to problems (inter-service interaction, entity processing, etc.)
    - problems when developing external API
    - specifics of working with mobile clients
    - problems with PostgreSQL and what we did wrong

    The talk was accepted to the conference program


    Andrei Tuchin

    JPMorgan Chase

    FX Risk Calculation: Why/How?

    How can one create a Market Risk System capable of providing real-time calculations for an entire portfolio, conducting strategy backtesting, performing hypothetical scenario analyses, and effectively analyzing and manipulating data.
    Whether it's for an investment bank, a hedge fund, or any other business, the necessity for such a solution becomes apparent at a certain stage.

    Drawing upon my experience in developing such systems from the ground up at several prominent investment banks, I'll endeavor to present common ideas on an architecture.

    The talk was accepted to the conference program


    Aleksey Uchakin


    The CDN journey: There and Back Again

    In general, CDN is very simple thing. You just need a bunch of servers, several fat links to the best ISP and the nginx. Is it enough?

    And how to choose the best option for your project from several candidates?

    * what issues you can fix with CDN;
    * questions you have to ask before onboarding;
    * black magic of routing: what is the real nearest node;
    * how to rule the world with BGP and DNS.

    The talk was accepted to the conference program


    Tejas Chopra


    Designing media optimized byte transfer and storage at Netflix

    Netflix is a media streaming company and a movie studio with data at exabyte scale. Most of the data generated, transferred and stored at Netflix is very media specific, for example, raw camera footage, or data generated as a result of encoding and rendering for different screen types.

    In this session, I will throw light on how we design a media aware and optimized transfer, storage and presentation layer for data.

    By leveraging this architecture at Netflix scale, we provide a scalable, reliable, and optimized backend layer for media data.

    Major takeaways from this session
    - Learn about the challenges of designing a scalable object storage layer for data while adhering to the file system POSIX semantics of media applications
    - Learn about the optimizations applied to reduce cloud storage footprint, such as chunking, deduplication
    - Learn about how different applications expect data to be presented at different locations and in different formats.

    The talk was accepted to the conference program


    Ivan Potapov

    Qrator Labs, Product Manager

    One of the ways to expand the network for heavy loads

    You need to scale the network to handle increasing traffic and ensure the required service quality.

    There are two methods for balancing surging user traffic – GeoDNS and BGP Anycast (a brief description of these technologies).
    We'll examine how large companies tackle this task.
    Let's move on to how you can solve these problems yourself, using open utilities and route information data.

    Following this, we'll discuss an approach for expanding the BGP Anycast network.

    For a global network expansion, two key questions arise:
    - Where (in which country) should a new node be installed?
    - Which local provider should it connect to for optimal service quality?

    To answer the first question, we'll utilize the RIPE Atlas public toolkit to create an RTT map, highlighting regions with maximum network delays. This reveals the weak points in our current network.

    To answer the second question, we'll describe a method based on analysing route information, which can be used to select the best providers in the region.

    Simplistically, the method can be described as follows:
    1. We gather route information from some BGP collector (e.g., RIPE, Route Views, PCH)
    2. Using this data, we identify major players in the market – regional leaders, employing various metrics (a brief overview of these metrics and the algorithms behind them is provided).
    3. We create a rating of the most promising providers for connection and select candidates from this rating.

    This systematic approach swiftly identifies optimal locations for new node installation, enabling effective network development.

    The talk was accepted to the conference program


    Daniel Podolsky


    Practical software architecture: what do we all know about it but are too lazy to use

    Practical software architecture: what do we all know about it but are too laisy to use
    Software architecture is a tough subject: we all know we need it, we all put so much energy to get it right and we all are mostly unsatisfied with the result.

    All our efforts on the architectural direction are based - ок at least expected to be based - on the two fundamental studies programmers universe provided us: “Clean Architecture” by Robert Martin and "Domain-Driven Design: Tackling Complexity in the Heart of Software" from Eric Evans.

    But - no, it is very rare in our society to use the ideas from any of these books in the day-to-day job. Looks like these books are too good for us!

    Seriously, they are just like the design for а shining castle on a high hill. And we need something much much simpler like a barnyard to grow and feed our projects. This is what I heard from my dear colleagues so many times!

    And finally, I’ve decided to create a speech about practical architecture:
    1. Yes, we need a barnyard and yes even a barnyard needs good architecture.
    2. Good architecture is not about beauty but about very practical things: testability,
    extendability and debugability.
    3. Good architecture is easy, because
    3.1. All our projects are almost the same
    3.2. Singe slightly variated pattern will lead us to the good architecture for any of
    the projects we are growing in our barnyard
    3.3. Practical architectural requirements are really easy to follow. Actually, it is
    much harder to not follow them as soon as you finally look at them from this
    3.4. Practical architecture is a self-supporting thing: as soon as you start it right in
    one part of the project it will magically organize the environment in the proper
    4. The books mentioned above are not about shiny castles but - shocking - about

    The talk was accepted to the conference program


    Ivan Savu


    Achieving scalability with custom locks

    The talk will cover following topics:

    Memory architecture - CPU/Memory gap, memory hierarchy, multiprocessor cache, cache line contentions and its effects on scaling

    Multiprocessor RW Lock - Fast reader/writer lock that scales linearly for read

    0-bit spin lock - Lock that effectively uses no memory, allowing for cheap fine grained locking.

    The talk was accepted to the conference program


    Alexey Palazhchenko

    FerretDB Inc

    Go performance profiling in theory and practice

    Your Go application does not perform as well as you hoped in production. You connect to it, get an execution trace, open it – and it looks like a cardiogram of 22 football players. So, how do you make sense of it? Unfortunately, you have almost no chance if that’s the first time you see it.

    It takes some time to get the experience and develop the intuition about backend performance in general and Go applications in particular. That intuition would help you to recognize specific patterns in those cardiograms like a doctor. My talk would allow you to start on this journey, starting with simple teaching examples and ending with real performance problems of production software.

    The talk was accepted to the conference program


    Dmitrii Nekrylov

    Yandex 360

    Night porter's notes on how to provide a non-cloud-native service as a service

    We will discuss concrete examples from Yandex 360

    * In Yandex Telemost, when we do broadcasts, or manage incoming calls from meeting rooms, we need to allocate heavy VMs as resources. They require warmup, authorization and healthchecks, because at scale any one of these VMs can break or go rogue at any time. And only one of these instances can serve a stream at any given moment.

    We need to maintain up to 99.99% availability of these services. We have specific rules how it is calculated, and we can formally minimize downtime from planned updates with the help of a wisely chosen strategy. We have historical data at our disposal to test theories. And we have been using it.

    * Sometimes these services are so imbalanced in CPU/RAM ratio, that we need to host multiple user sessions within one container. Otherwise RAM consumption and overhead on PaaS would be enormous. In this case we need to orchestrate a 2-layered service with all of the requirements from above.

    * Yandex Telemost is based on Jitsi. It holds multi-user sessions across several distinct components in memory. And registers in several discovery systems to organize the calls. Special care should be taken to prevent unintentional random split of conferences into several independent rooms. Or to prevent a rogue POD from intercepting one of the traffic channels thus making it impossible for users to join a particular conference at all

    based on these examples, we are going to discuss
    * problem of stateless single-pod services. And our approach to their managent and maintenance
    * how we can calculate and minimize donwtime of these stateful in-memory components and enable more frequent releases
    * why there should be only one service discovery
    * how split brain-like situations emerge from scaling a component that provides multi-user sessions. When the component is not built for scaling world-wide. And we did to address the issue.

    The talk was accepted to the conference program


    Alexander Gilevich


    Let’s talk Architecture: Limits of Configuration-driven Ingestion Pipelines

    Need to continuously ingest data from numerous disparate and non-overlapping data sources and then merge them together into one huge knowledge graph to deliver insights to your end users?

    Pretty cool, huh? And what about multi-tenancy, mirroring access policies and data provenance? Perhaps, incremental loading of data? Or monitoring the current state of ingestion in a highly-decoupled distributed microservices-based environment?

    In my talk I will tell you our story: all started with a simple idea of building connectors, we ended up building fully configurable and massively scalable data ingestion pipelines which deliver disparate data pieces to a single data lake for their later decomposition and digestion in a multi-tenant environment. All while allowing customers and business analysts to create and configure their own ingestion pipelines in a friendly way with a bespoke pipeline designer with each pipeline building block being a separate decoupled microservice (think Airflow, AWS Step Functions, Azure Data Factory and Azure Logic Apps). Furthermore, we'll touch such aspects as choreography vs orchestration, incremental loading strategies, ingestion of access control policies (ABAC, RBAC, ACLs), parallel data processing, how frameworks can help in the implementation of cross-cutting concerns, and even briefly talk about the benefits of knowledge graphs.

    The talk was accepted to the conference program


    Denis Babichev

    Hilbert Team

    Clickhouse as backend for Prometheus

    1. Few words about ClickHouse and Prometheus
    2. LTS problems in Prometheus
    3. Clickhouse as a backend
    4. Integration/Configuration:
    5. **Why Not?** (VM/Thanos/Mimir)**…**
    6. **Why Not ClickHouse?**
    7. **Сonclusions**

    The talk was accepted to the conference program


    Lia Yepremyan

    AMD Armenia

    The Hardware Behind AI

    AI hardware consists of special parts that drive artificial intelligence technologies. These parts are created to manage the complex calculations needed for recognizing patterns, making decisions, and analyzing data.

    The talk was accepted to the conference program

  • Databases and storage systems (4)


    Nickolay Ihalainen


    Implementing MySQL and Postgresql databases in Kubernetes

    While the kubernetes could run databases for a quite long time, production databases taking more attention to important details:
    * crash recovery
    * disaster recovery
    * backups
    * access control
    * user management and security benchmarks
    * Performance at scale (100GB+ databases)

    The talk was accepted to the conference program


    Alexander Zaitsev


    Object Storage in ClickHouse

    ClickHouse is an ultra-fast analytic database. Object Storage is cheap. Can they work together? Let's learn!

    ClickHouse is an ultra-fast database originally designed for local storage. Since 2020 a lot of effort has been made in order to make it efficient with object storage, like S3, that is essential for big clusters operated in clouds. In this talk I will explain ClickHouse storage model, and how Object Storage support is implemented. Finally, we will see performance results and discuss further improvements.

    The talk was accepted to the conference program


    Andrew Aksyonoff

    Avito && Sphinx

    All BSONs suck

    I will dissect multiple internal binary JSON representations in several DBs (Mongo, Postgres, YDB, my own Sphinx, maybe more), and rant how they are so not great for querying.

    My rant will also include a partial benchmark (of course), and a limited way out for the databases: as in, a few techniques I have tried and will be implementing in Sphinx, so that our BSON sucks on par or less. Spoiler alert: BSONs suck and nothing works really well for them, everything you thought is a lie (including the cake), hash tables suck, binary searches suck (even the clever ones and of course the naive ones), AVX2 sucks, maybe AVX512 sucks too (maybe I'll have the time to try that). As for the database users? Weeell, at least you will know how much your specific database sucks, why so, and what can the competition offer.

    The talk was accepted to the conference program


    Evgenii Ivanov

    Yandex Infrastructure

    YDB vs. TPC-C: the Good, the Bad, and the Ugly behind High-Performance Benchmarking

    Modern distributed databases scale horizontally with great efficiency, making them almost limitless in capacity. This implies that benchmarks should be able to run on multiple machines and be very efficient to minimize the number of machines required. This talk will focus on benchmarking high-performance databases, with a particular emphasis on YDB and our implementation of the TPC-C benchmark—the de-facto gold standard in the database field.

    First, we will speak about benchmarking strategies from a user's perspective. We will dive into key details related to benchmark implementations, which could be useful when you create a custom benchmark to mirror your production scenarios. Throughout our performance journey, we have identified numerous anti-patterns: there are things you should unequivocally avoid in your benchmark implementations. We'll highlight these "bad" and "ugly" practices with illustrative examples.

    Next, we’ll briefly discuss the popular key-value benchmark YCSB, which we believe is a prerequisite for robust performance in distributed transactions. Following this, we'll explore the TPC-C benchmark in greater detail, sharing valuable insights derived from our own implementation.

    We'll conclude our talk by presenting performance results from YCSB and TPC-C benchmarks, comparing YDB's performance with that of CockroachDB and YugabyteDB — other trusted and well-known distributed SQL databases.

    The talk was accepted to the conference program

  • BigData and Machine Learning (2)


    Shivay Lamba


    ​​Fine-Tuning Large Language Models with Declarative ML Orchestration

    Large language models like GPT-3 and BERT have revolutionized natural language processing by achieving state-of-the-art performance. However, these models are typically trained by tech giants with massive resources. Smaller organizations struggle to fine-tune these models for their specific needs due to infrastructure challenges.

    This talk will demonstrate how open-source ML orchestration tools like Flyte can help overcome these challenges by providing a declarative way to specify the infrastructure required for ML workloads. Flyte's capabilities can streamline ML pipelines, reduce costs, and make fine-tuning of large language models accessible to a wider audience.

    Specifically, attendees will learn:

    - How large language models work and their potential applications
    - The infrastructure requirements and challenges for fine-tuning these models
    - How Flyte's declarative specification and abstractions can automate and simplify infrastructure setup
    - How to leverage Flyte to specify ML workflows for fine-tuning large language models
    - How Flyte can reduce infrastructure costs and optimize resource usage

    By the end of the talk, attendees will understand how open-source ML orchestration tooling can unlock the full potential of large language models by making their fine-tuning easier and more accessible, even with limited resources. This will enable a larger community of researchers and practitioners to leverage and train large language models for their specific use cases.

    The talk was accepted to the conference program


    Dmitrii Khodakov


    How we built personal recommendations in the world’s most significant classified

    Context: setting the task - a feed of personal recommendations on the main page. How to launch recommendations in production when you have 150 million items and 100 million users? I will share my experience, tell you about the pitfalls
    A quick overview of the arsenal of models: classic ML approach
    A quick overview of metrics starts with product metrics.
    The basis of everything: fast experiments and analytics on actual data
    Where to start? Classical matrix factorization and its launch pattern.
    What problems did you encounter at this stage
    Little more advanced: switching real-time user features and history. An alternative approach with simpler models.
    Advanced models: Let's add neural networks, the strength is in diversity.
    Mixing models - great blender
    How does it work in production? Replaced Go with Python, what happened to time to market?
    And again, about the experiment cycle, I'll tell you about product metrics.

    The talk was accepted to the conference program

  • DevOps and Maintenance (5)


    Dmitry Tsepelev


    Backend monitoring from scratch

    Almost everyone has monitoring. In the ideal world it is a reliable tool that detects sympthoms earlier than they become serious problems. Often time APM on a free plan with out-of-the-box reports is used as a monitoring tool. As a result, something is measured, some alerts are sent into the chat, no one responds to them, and one day the major incident happens.

    In the talk we will:

    - define monitoring antipatterns;

    - pick the most critical metrics and ways to see insights in charts;

    - represent the system in the terminology of queue theory;

    - figure out how to choose lower–level metrics and how to use them to find problems;

    - discuss why alerts are helpful, and when they are not needed.

    The talk was accepted to the conference program


    Oleg Voznesensky


    Demystifying GitOps. How to upgrade your CIOps to GitOps in a minimalistic way

    The purpose of this talk is to help DevOps engineers to understand GitOps pattern and take decisions about using GitOps or not. Also, I will discuss the most frequent problems and ways to solve them.

    The talk was accepted to the conference program


    Soumyadip Chowdhury

    Red Hat India

    Simplifying Cloud Native Chaos Engineering: A Deep Dive into Chaos Mesh

    Ensuring reliable and resilient cloud native applications is crucial in today's evolving digital era. Chaos engineering has emerged as a powerful methodology for testing and validating the robustness of complex applications. Also we'll discuss best practices for integrating chaos engineering into your application development lifecycle to improve the reliability and resilience of your cloud-native applications and how big companies like Netflix, Amazon, Spotify have taken advantage of Chaos Engineering.

    The talk was accepted to the conference program


    Viktor Vedmich

    Amazon Web Services

    From Middle-Earth to Monitoring: Unifying Systems Performance in a Cloud World

    In today's cloud-centric landscape, achieving optimal systems performance is paramount. This presentation delves deep into the intricacies of monitoring complex systems. We'll begin by exploring the challenges and strategies associated with optimizing system performance, drawing upon foundational principles from renowned references in the field. Transitioning to the practical side, we'll delve into the real-world experiences of a leading cloud vendor in monitoring. By sharing insights from hands-on implementations and how our customers have effectively tailored monitoring solutions, attendees will gain a comprehensive understanding of best practices and pitfalls to avoid.

    The talk was accepted to the conference program


    Tadeh Hakopian

    Energy Vault

    Understanding The Big Picture: Why You Should Show System Architecture With Diagrams

    Great presentations are a powerful tool for conveying ideas, sharing knowledge, and engaging audiences. This talk will delve into the fusion of MARP, a simple yet powerful presentation framework, and Markdown, a lightweight markup language, to create captivating presentations from Readme files.

    Key Takeaways

    Understand the fundamentals of using MARP and Markdown for creating engaging presentations from Readme files.
    Discover the features and customization options offered by MARP, such as themes, layouts, slide transitions, and interactive elements.
    Learn how to effectively communicate your ideas, projects, and expertise in a visually compelling and engaging manner using tools like VS code, Github actions and other common editors.
    Learn best practices for structuring content, incorporating visuals, and utilizing storytelling techniques to deliver compelling presentations from Readme files.

    The talk was accepted to the conference program

  • Security, DevSecOps (3)


    Artem Bachevsky


    Container and Kubernetes: modern attacks and mitigations

    Each new technology brings us not only speed and convenience, but also dozen attack vectors, which, in turn, give new defense tools.

    And solving the problem of protecting your container infrastructure, it would be nice to understand the tactics and techniques of attacks on it, as well to understand how to prevent and detect them.

    In the talk we will try to understand top of the threats of containers on each stage of attack and operation mapped on popular framework matrices. And what to check after you will return to office!

    The talk was accepted to the conference program


    Sergey Chubarov


    Offensive Azure Security

    Demo-based session.

    Since the cloud is growing in popularity, more companies are moving to the cloud. Low initial investments make the entry much easier, but there is a chance to misconfigure security settings.
    Session demonstrates typical anti-patterns used by many companies.

    The talk was accepted to the conference program


    Ignat Korchagin

    Cloudflare, Linux Guru

    Sandboxing in Linux with zero lines of code

    Linux seccomp is a simple, yet powerful tool to sandbox running processes and significantly decrease potential damage in case the application code gets exploited. It provides fine-grained controls for the process to declare what it can and can’t do in advance and in most cases has zero performance overhead.

    The only disadvantage: to utilise this framework, application developers have to explicitly add sandboxing code to their projects and developers usually either delay this or omit completely as their main focus is mostly on the functionality of the code rather than security. Moreover, the seccomp security model is based around system calls, but many developers, writing their code in high-level programming languages and frameworks, either have little knowledge to no experience with syscalls or just don’t have easy-to-use seccomp abstractions or libraries for their frameworks.

    All this makes seccomp not widely adopted—but what if there was a way to easily sandbox any application in any programming language without writing a single line of code? This presentation discusses potential approaches with their pros and cons.

    The talk was accepted to the conference program

  • QA, Stress testing (1)


    Ivan Prihodko

    Ozon Tech

    Ozon Performance Testing Service - HighLoad by Schedule

    Million RPS on Demand. Highload By Schedule. How works Ozon Performance Testing platform.

    About topic:
    Ozon is growing up twice every year since 2019. Main technologies are: Go/ C#, gRPC, Kafka, A lot of code Generation, S2S Routing and security etc.
    Performance testing platform was started as service, that helps regular Ozon engineers starts Performance Tests by one cli-util command or one click on UI.

    Also, there are 3 main paradigms in Ozon Performance testing:
    1) Confidence from Performance tests, can be achieved only on Production.
    2) Bandwidth target for next season for concrete service, calculates by analytics, and uploads to Performance Testing platform.
    3) Once a week, IT management, looks at consolidated performance reports, provided by our performance testing platform and collect confidence of Readiness Ozon for upcoming Season.
    The platform grew with Ozon. It contains several microservices, that helps us and our users to start performance tests easier.
    We standardize and simplified most performance Testing activities and provides a lot of integrations with Ozon Infrastructure.
    Also we created CPU-effective load generators for http, gRpc and scenario traffic.

    All activities, described below accompany by problems, caused by HighLoad nature of our environment.
    Such as:
    - Problems with payload collection system and Kafka.
    - Load Generators, that generate too much load for our Performance testing system.
    - Statistics, that overload our statistics Storage.
    - Bandwidth limits in our k8s cluster, that was reached one day.
    - CPU limit, also was reached one day.
    How we solved all these problems, we run 27 thousand tests a month, as well as how we generate more than a Million RPS on production every night, you will learn at my speech.

    The talk was accepted to the conference program

  • Platform engineering (1)


    Stanislav Zmiev


    How to maintain hundreds of API Versions and survive

    Web API Versioning is a way to allow your developers to move quickly and break things while your clients enjoy the stable API in long cycles. It is best practice for any API-first company. Without versioning, the company will either be unable to improve their API, or their clients will have their integrations broken every few months.

    The problem is that there is no information on how to implement it: If you try searching for methods of API versioning, you will see hundreds of articles on whether to put the version into the URL or a header, a few pieces of ASP.NET documentation, and a single article by Stripe that delves deep into the subject matter. Sadly, even they only describe one approach.

    I'll cover all sorts of ways you can pick to implement API versioning: extremely stable and expensive, easy-looking but horrible in practice, and even completely version-less yet viable. I will provide you with the best practices for creating a modern API versioning solution and discuss what Stripe and Monite have chosen for their APIs.

    When you leave, you'll have enough information to make your API versioning user-friendly without overburdening your developers.

    The talk was accepted to the conference program