Professional conference for developers of high-load systems

talks faq partners location
and time contacts personal
account

become our partner

accepted talks

Architectures and scalability (15)

Soumyadip Chowdhury

Red Hat India

Tracing the Evolution of Serverless : FaaS to DaaS

The emergence of serverless computing was initiated by functions-as-a-service (FaaS) offerings, which allow code execution in response to specific events such as an API request or uploading a file to a storage service. By utilizing FaaS offerings, developers can easily build and deploy code without worrying about the underlying infrastructure.

Serverless computing has blended with container-based technologies like Kubernetes, which enables containerized applications' deployment and management in the cloud. Consequently, hybrid serverless architectures emerged, combining the advantages of both serverless and container-based technologies. I'll also dive into how serverless computing has expanded beyond FaaS to include database-as-a-service (DBaaS).

Although the excitement about serverless computing may have dwindled in recent years, the use of serverless technologies is still on the rise, with over 50% of cloud customers leveraging serverless technologies. As the technology develops further, it is likely that serverless computing will continue to be a crucial aspect of cloud computing.

The Program Committee has not yet taken a decision on this talk

Vladislav Shpilevoy

Senior Developer at VirtualMinds

First Aid Kit for C/C++ server performance

Enhancing server performance typically entails achieving one or more of the following objectives: reduce latency, increase number of requests per second (RPS), and minimize CPU and memory usage. These goals can be pursued through architectural modifications, such as eliminating some network hops, distributing data across multiple servers, upgrading to more powerful hardware, and so forth. This talk is not about that.

I categorize the primary sources of code performance degradation into three groups:

[*] Thread contention. For instance, too hot mutexes, overly strict order in lock-free operations, and false sharing.
[*] Heap utilization. Loss is often caused by frequent allocation and deallocation of large objects, and by the absence of intrusive containers at hand.
[*] Network IO. Socket reads and writes are expensive due to being system calls. Also they can block the thread for a long time, resulting in hacks like adding tens or hundreds more threads. Such measures intensify contention, as well as CPU and memory usage, while neglecting the underlying issue.

I present a series of concise and straightforward low-level recipes on how to gain performance via code optimizations. While often requiring just a handful of changes, the proposals might amplify the performance N-fold.

The suggestions target the mentioned bottlenecks caused by certain typical mistakes. Proposed optimizations might render architectural changes not necessary, or even allow to simplify the setup if existing servers start coping with the load effortlessly. As a side effect, the changes can make the code cleaner and reveal more bottlenecks to investigate.

The talk was accepted to the conference program

Daniel Podolsky

.

Practical software architecture: what do we all know about it but are too laisy to use

Practical software architecture: what do we all know about it but are too laisy to use
Software architecture is a tough subject: we all know we need it, we all put so much energy to get it right and we all are mostly unsatisfied with the result.

All our efforts on the architectural direction are based - ок at least expected to be based - on the two fundamental studies programmers universe provided us: “Clean Architecture” by Robert Martin and "Domain-Driven Design: Tackling Complexity in the Heart of Software" from Eric Evans.

But - no, it is very rare in our society to use the ideas from any of these books in the day-to-day job. Looks like these books are too good for us!

Seriously, they are just like the design for а shining castle on a high hill. And we need something much much simpler like a barnyard to grow and feed our projects. This is what I heard from my dear colleagues so many times!

And finally, I’ve decided to create a speech about practical architecture:
1. Yes, we need a barnyard and yes even a barnyard needs good architecture.
2. Good architecture is not about beauty but about very practical things: testability,
extendability and debugability.
3. Good architecture is easy, because
3.1. All our projects are almost the same
3.2. Singe slightly variated pattern will lead us to the good architecture for any of
the projects we are growing in our barnyard
3.3. Practical architectural requirements are really easy to follow. Actually, it is
much harder to not follow them as soon as you finally look at them from this
perspective.
3.4. Practical architecture is a self-supporting thing: as soon as you start it right in
one part of the project it will magically organize the environment in the proper
way.
4. The books mentioned above are not about shiny castles but - shocking - about
barnyards!

The Program Committee has not yet taken a decision on this talk

Aleksey Uchakin

EdgeCenter

The CDN journey: There and Back Again

In general, CDN is very simple thing. You just need a bunch of servers, several fat links to the best ISP and the nginx. Is it enough?

And how to choose the best option for your project from several candidates?

Abstract
* what issues you can fix with CDN;
* questions you have to ask before onboarding;
* black magic of routing: what is the real nearest node;
* how to rule the world with BGP and DNS.

The talk was accepted to the conference program

Ivan Potapov

Qrator Labs, Product Manager

One of the ways to expand the network for heavy loads

You need to scale the network to handle increasing traffic and ensure the required service quality.

There are two methods for balancing surging user traffic – GeoDNS and BGP Anycast (a brief description of these technologies).
We'll examine how large companies tackle this task.
Let's move on to how you can solve these problems yourself, using open utilities and route information data.

Following this, we'll discuss an approach for expanding the BGP Anycast network.

For a global network expansion, two key questions arise:
- Where (in which country) should a new node be installed?
- Which local provider should it connect to for optimal service quality?

To answer the first question, we'll utilize the RIPE Atlas public toolkit to create an RTT map, highlighting regions with maximum network delays. This reveals the weak points in our current network.

To answer the second question, we'll describe a method based on analysing route information, which can be used to select the best providers in the region.

Simplistically, the method can be described as follows:
1. We gather route information from some BGP collector (e.g., RIPE, Route Views, PCH)
2. Using this data, we identify major players in the market – regional leaders, employing various metrics (a brief overview of these metrics and the algorithms behind them is provided).
3. We create a rating of the most promising providers for connection and select candidates from this rating.

This systematic approach swiftly identifies optimal locations for new node installation, enabling effective network development.

The talk was accepted to the conference program

Sameer Paradkar

Eviden (An AtoS Business)

Modelling Non-Functional Requirements for Business-Critical Applications

Nonfunctional Requirements (NFRs) define system attributes such as security, reliability, performance, maintainability, scalability, and usability. They serve as constraints or restrictions on the design of the system across the different backlogs. Non-functional requirements are just as critical as functional Epics, Capabilities, Features, and Stories. They ensure the usability and effectiveness of the entire system. Failing to meet any one of them can result in systems that fail to satisfy internal business, user, or market needs, or that do not fulfill mandatory requirements imposed by regulatory or standards agencies. This session is about modelling and sizing critical NFRs for large distributed systems on cloud.

The Program Committee has not yet taken a decision on this talk

Dhrumil Dhanesha

Dhrumil Dhanesha Technologies

Developing for the Foldable Phone Revolution

Foldable phones have created a new revolution in the smartphone industry with their unique form factors, bringing a new set of challenges and opportunities for developers. The challenge for developers is to make their applications adapt to different screen sizes and aspect ratios, while also taking advantage of the foldable form factor. This talk will explore the possibilities and challenges of developing for foldable phones, with a focus on building user-friendly and engaging applications that make the most of the new form factor.

The Program Committee has not yet taken a decision on this talk

Denis Babichev

Hilbert Team

How to survive, when opensource becomes deprecated

Topic Disclosure:

1. **What risks do we get when choosing opensource solution? Top rules to consider when deciding to use opensource:**

1. Few words first about pros of OS (free, community etc.)

2. Risks: Bugs can take a long time to resolve (may not be fixed at all) + examples. You might need to write Workaround’s or solve it yourself. Software can become depricated and lose any support (examples of response from support for such cases)

3. Top Rules:

1. Be careful which OS you choose. Keep in mind that if you make a fork of the repository (a certain tag or version), support comes to you. You can get away with it, but no one is save here

2. If the solution you choose has many dependencies, each of which evolves independently, do not be lazy to study the history of versioning and compatibility. It is quite possible that one of the elements of the system may dramatically change the way it works and bring you a lot of pain (As an example at some point one of your opensource dependencies can suddenly go from binary-extension to database extension, consisting lots of migrations. And so, rollback for such a case might become tricky). Besides, solutions are often offered for single-node clusters and multi-node (high-available) clusters (different configurations). In such cases the support and development are often not parallel either

3. Study experience of other companies. It is quite possible that the chosen solution may not be able to handle your needs.

4. Try the solution on sandboxes/playgrounds, often provided by the vendors themselves

2. **How and what we might encounter when using opensource when it has become depricated?**

1. Consider challenges and problems we can fase using opensource (ex. - Solution for your architecture & tech stack does not work correctly)

3. **Saving the drowning. How to survive? Our story**

1. In common, the right solution is to switch to another software that has support. Or apply the vendor's recommendations. However, the customer may not be ready to spend time and money on the migration and want everything to work here and now (at least for some time). In this case we need to fix what we have.

2. On our example I will tell what we faced, where we had problems, how we fixed it during migration from deprecated soft to non-deprecated

In brief: Opensource solution, that we were using became deprecated. From that time support of the system fell on our shoulders. While migrating to another solutions, we still needed to fix bugs and issues ourselves and keep system running. Main thesises:

1. Do not draw conclusions about the correct work of the system at once and not chase deadlines. It is better to observe time and load, than to shoot from your hip and fix the consequences.

2. Do not be afraid to dive into the source code and even rebuild your custom images (show our example)

3. Be sure to monitor the key critical metrics. Define them for yourself (also tell on our example)

3. Our story about moving from deprecated solution to supported one (TimescaleDB → Victoria Metrics). Main problems and thesises:

1. There is usually no ready-made solution (data-migrator), or it may not work well. You will probably have to write it yourself. What do you need to consider first for such a migration:

1. Define how your data is stored (which tables, whether key-value only, perhaps there are series and labels lists). How can you retrieve and migrate them as quickly as possible? Show it on our case

2. Understand how to bring data to the new system. How to parallelize and fasten the process (example: upload/download to each datanode (shard) separately)

2. As a conclusion tell how total time was spent on the migration process and how much control was needed.

4. **Costs and expenses. Price for moving from depricated opensource to a supported solution**

1. How much time does it take to implement whole migration process? From analysis to process start

2. What competencies are needed, specialists of which roles might be necessary to add to the whole process?

The Program Committee has not yet taken a decision on this talk

Denis Tsvettsikh

DevBrothers

The Twelve-Factor App in practice

I will show how to use 12 factor app methodology in prantice.

Factor 1: share libs but not code. How to share own code between services? Local package server like nuget
Factor 2. Self-contain build for .NET on Java allows to avoid additional dependences
Factor 6 about stateless services. Is is posible to use in-memory cache when we need single instance of some service. But in-memory cache should be hidden using some generic interface. When we need to scale the service then we replace in-memory cache by Redis cache using configuration without changes in application code.
Factor 6: it is also possible to use file system for example for CSV export. But file should be removed when it closed. And requests should not search for files created by other requests.
Factor 9: graceful shutdown. IHostApplicationLifetime interface allows to react on application shutdown. Also I will show how to configure Quartz to finish background tasks such way that the can be resumed or restarted bu other instanses of the same service.
Factor 11: logging to stdout. If service writes logs to some service like Elsatic or Datadog then it is possible to lose some log records on service crash. So it is needed to use sidecar pattern, addtional service to read logs from console and upload to logs storage.
Factor 12: admin tasks as a process. I will talk about SQL migration tools like EF Core bundle and EF Core extension which allows to add any SQL script to the migration (https://github.com/CUSTIS-public/CUSTIS.NetCore.EF.MigrationGenerationExtensions). And I will show how to implement a tool to fix schema of NoSql data storages.

The Program Committee has not yet taken a decision on this talk

Sameer Paradkar

Eviden (An AtoS Business)

Modernizing Legacy IT Systems and Applications

This provides guide-rails for modernizing legacy applications & systems with modern technologies like Open Source and Java EE on cloud. The talk serves as reference for the technology options and decision considerations when you’re weighing these choices for architecting application and systems.

The Program Committee has not yet taken a decision on this talk

Ivan Chernov

Emerging travel group

Let's build a search aggregator in travel tech

In travel tech search engines are a different thing from common IT. In IT you store all the data inside your database, and then you need to create some index. For the aggregators in travel tech you can only fetch available hotels with prices through API in real time.

A simple solution can start as a IO bound service with classic key value storage like redis. Then you try to make a good HA multi master redis, but fail and switch to Aerospike. Later suppliers ask you to lower search requests to API, and you come with custom load balancing based on rendezvous hashing to group them. Some suppliers ask to have a white IP for ACL, but your bare-metal infrastructure does not have a NAT gateway as AWS and you need to fine-tune the HAProxy by hand.

So we will see how we evolutionary come to these and other decisions.

The Program Committee has not yet taken a decision on this talk

Andrei Tuchin

JPMorgan Chase

FX Risk Calculation: Why/How?

How can one create a Market Risk System capable of providing real-time calculations for an entire portfolio, conducting strategy backtesting, performing hypothetical scenario analyses, and effectively analyzing and manipulating data.

Whether it's for an investment bank, a hedge fund, or any other business, the necessity for such a solution becomes apparent at a certain stage.

Drawing upon my experience in developing such systems from the ground up at several prominent investment banks, I'll endeavor to present common ideas on an architecture.

The Program Committee has not yet taken a decision on this talk

Piotr Trębacz

voidborn.one

Terraform, day 1001

this presentation will be about good patterns of terraform code and pitfalls of bad IaaC architecture - especially in Agile environment.
I will focus on guiderails for platform teams, usage patterns from development teams, easy auditability of TF Code and deployment patterns

The Program Committee has not yet taken a decision on this talk

Ruslan Shakhaev

Yandex Delivery

What we learned from production incidents

About 4 years ago, when we started developing Yandex Delivery, we used all the main patterns for building stable and reliable applications:

- canary release
- retries and timeouts
- rate limiting
- circuit breaker
- feature toggling

Even if one of our datacenters is unavailable, our users will not notice anything. We can enable/disable and configure our features in production in real time, and much more.

But all this was not enough to prevent the system from experiencing downtime sometimes

I'll tell you about the non-obvious problems we encountered and what lessons we learned from various production incidents

Main sections:
- architectural solutions that lead to problems (inter-service interaction, entity processing, etc.)
- problems when developing external API
- specifics of working with mobile clients
- problems with PostgreSQL and what we did wrong

The talk was accepted to the conference program

Tadeh Hakopian

Senior Program Manager at Amazon

Architecting for Streaming Large Scale Digital Twins Data: Being Strategic with Your Choices

In this session we will review what a Digital Twin is, how they work, what it takes to create one and how to Architect a solution that scales to meet your needs. These virtual models can be used for a wide range of applications, including predictive maintenance, optimization, and simulation. Key to that is using the right Architecture and cloud platform for providing a comprehensive set of tools and services for creating, deploying, and managing digital twin solutions.

Attendees will also learn how to create a digital twin solution on cloud platforms(AWS, Azure, etc), including data ingestion, processing, and visualization. We will also get into the practical needs for specifying a Digital Twin deliverable and what kind of platforms can support their operation. So don’t get left out on the evolution of Digital Twins Design, check out this talk!

Learning objectives:

Learn what a Digital Twin is and how they correspond to the built environment

Understand what kind of tools are available for creating a Digital Twin platform

Learn how to Architect data from IoT devices into a data stream for a visualization platform for large scale operations

Become able to develop a Digital Twin design in your next project with sample guidelines

The Program Committee has not yet taken a decision on this talk
Databases and storage systems (10)

Peter Farkas

FerretDB

MongoDB Alternatives: Is There A Need For A New Open Standard?

This talk takes you on a journey through the history of SQL as an Open Standard, emphasizing its pivotal role in shaping the database industry. It also highlights the pressing need for a similar standard in MongoDB-compatible open databases. The presentation introduces FerretDB as a groundbreaking solution bridging MongoDB and open databases, ensuring seamless transitions without extensive application-level changes. This talk illuminates the importance of open standards and presents a path forward for enhanced compatibility and collaboration within the open-source database community.

The Program Committee has not yet taken a decision on this talk

Peter Zaitsev

Percona

The State of Open Source

This year was exciting for open-source. We saw more choices and a bigger focus on the cloud. We're going to talk about the big things from 2023, covering the most notable open-source software releases, the importance of cloud-native solutions in a multi-vendor, multi-cloud environment, the heightened focus on security challenges, and the evolution of the open-source software industry.

The Program Committee has not yet taken a decision on this talk

Igor Zolotarev

VK, Tarantool

Failures are prohibited: how we made automatic failover

Cartridge is a tool for managing distributed applications based on Tarantool NoSQL Database. One of the main requirements for our applications is fault tolerance. The load from fallen nodes in the cluster should be switched to live ones. This does not seem like a difficult task, but there are many pitfalls in practice.

In Cartridge, this problem is solved by an automatic failover. I will talk about the history of its development and implementation features, describe the problems we encountered during its maintenance, and share valuable ideas on the development of similar systems.

The Program Committee has not yet taken a decision on this talk

Peter Zaitsev

Percona

Database Performance for Data Engineers

As Data Engineer Database Performance, Efficiency should be among your priorities, and to optimize those you need to understand what influences query Performance and what you can do about it. In this presentation we will look into all the factors which influence query performance - the database engine, data structures and algorithms it uses, query optimizer, various types of indexes and other data access optimization methods as well as hardware innovations which allow us to reach modern performance. After attending this talk you will be better empowered to take the optimal choices to help your team to get the best performance while minimizing your costs.

The Program Committee has not yet taken a decision on this talk

Vitaliy Likhachev

avito.tech

One PostgreSQL to rule them all

Using PostgreSQL for all backend technologies can make your stack simpler, reduce the number of moving parts, speed up development, lower your risk, and deliver more features to your users. PostgreSQL can replace many other backend technologies, such as Kafka, RabbitMQ, Mongo, and Redis, and can handle millions of users.

In modern world you have many options for tooling.
You can use different SQL databases/NoSQL databases/full text search engines/queues/messaging systems/etc.

But what if you could simplify your tech stack (in accordance with expected tradeoffs) and focus on one technology instead of many tools?

And it's name is PostgresSQL! :)

You can use it as:
1. Message queue
2. NoSQL database
3. GEOspatial search engine
4. Full text search engine
5. And more coming in this talk

The Program Committee has not yet taken a decision on this talk

Robert Hodges

Altinity

Fast, Faster, Fastest: Object Storage, Cloud Block Storage, and SSD in Analytic Databases

Storage is the heart of every database. But which storage is fastest? And which is best for your analytic application? This talk explores the trade-offs between the main storage types available in all major public clouds. We’ll start with basic performance metrics. Using ClickHouse as an example, we’ll show how databases access different storage types and how it translates to query speeds visible to your users. Spoiler: the fastest storage does not always give the best query performance. We’ll reveal standard tricks like caching, volume stacking, sorting, and compression that help you build fast analytic applications even on so-called “slow” storage.

The Program Committee has not yet taken a decision on this talk

Alexander Zaitsev

Altinity

Object Storage in ClickHouse

ClickHouse is an ultra-fast analytic database. Object Storage is cheap. Can they work together? Let's learn!

ClickHouse is an ultra-fast database originally designed for local storage. Since 2020 a lot of effort has been made in order to make it efficient with object storage, like S3, that is essential for big clusters operated in clouds. In this talk I will explain ClickHouse storage model, and how Object Storage support is implemented. Finally, we will see performance results and discuss further improvements.

The talk was accepted to the conference program

Robert Hodges

Altinity

Fast, Cheap, DIY Monitoring with Open Source Analytics and Visualization

Monitoring is the key to the successful operation of any software service, but commercial solutions are complex, expensive, and slow. Why not do it yourself with open source? We’ll show you how to build simple, cost-effective, fast monitoring for practically any system using ClickHouse and Grafana. First, we introduce key elements of your monitoring solution: ingest, query, and visualization. We then dig into an example system showing how to store data efficiently, build queries, and create operational dashboards. Any developer can do it–join us to find out how!

The Program Committee has not yet taken a decision on this talk

Andrew Aksyonoff

Avito && Sphinx

All BSONs suck

I will dissect multiple internal binary JSON representations in several DBs (Mongo, Postgres, YDB, my own Sphinx, maybe more), and rant how they are so not great for querying.

My rant will also include a partial benchmark (of course), and a limited way out for the databases: as in, a few techniques I have tried and will be implementing in Sphinx, so that our BSON sucks on par or less. Spoiler alert: BSONs suck and nothing works really well for them, everything you thought is a lie (including the cake), hash tables suck, binary searches suck (even the clever ones and of course the naive ones), AVX2 sucks, maybe AVX512 sucks too (maybe I'll have the time to try that). As for the database users? Weeell, at least you will know how much your specific database sucks, why so, and what can the competition offer.

The talk was accepted to the conference program

Denis Babichev

Hilbert Team

Time-series Data Management at Scale with TimescaleDB

Topic Disclosure:

1. **What is TimescaleDB, and how to cook it? What kind of tasks would it be good for?**
1. TimescaleDB core concepts. How does it work as an postgresql - extension, which benefits does it give (ex. makes it possible to manage data/metrics with the usual SQL-queries). Which mechanisms does it have (space/time partitioning, chunks, hypertables data compression and retention, etc.). Configuration differences (singlenode/multinode setup)
2. More specific about singlenode/multinode setups. What is promscale and how does it work in pair with TimescaleDB?
3. What cases are most suitable for specific TimescaleDB setup?
2. **Choosing configuration and infrastructure for TimescaleDB**
1. Best Practices for System Deployment
2. High - available setup: Patroni + AccessNode + Datanodes + Kubernetes
3. Recommended parameters for VM’s and Databases. Postgres/patroni configuration (shared memory/wal params etc.). CPU/RAM/Disks etc.
3. **What to monitor?**
1. Key metrics to pay attention to (basics and specific for tsdb)
2. Which exporters and dashboards do we need?
4. **What to monitor?**
1. Key metrics to pay attention to (basics and specific for tsdb)
2. Which exporters and dashboards do we need?
5. **Troubleshooting from console**
1. Useful sql - queries and utilities, that may help you
2. Applying it in action. Production cases
6. **Advices and our own experience with mistakes**
1. When to add resources and when to tune the software itself?
2. TSDB/Patroni/Promscale wins & failures

The Program Committee has not yet taken a decision on this talk
Architecture, Design patterns, Queues and Data Streams (1)

Alexander Voynovskiy

expert

Modern approaches to IT landscape management

The world is changing faster and faster, and technology is changing with it. Not so long ago we were working with gigabytes, but tomorrow we will be short of zettabytes. The high speed of change has become a real challenge for Enterprise Architecture.

In order to understand how the world of technology is changing, we will discuss the key stages of IT Architecture development and its transformation into Enterprise, key technologies and trends, as well as demonstrate the current approach to managing the complex IT landscape in large companies.

In the first part the focus will be on the development of trends. Classic IT domains will be discussed: development, integration, infrastructure, data, platforms and others. For each of them, the evolution of the approach to solving different tasks will be demonstrated and current trends in their development will be presented.

The second part presents modern approaches in Enterprise Architecture management: based on business capabilities, using generative artificial intelligence, architecture as a code / Governance as a Code and much more. We will discuss the theory and methodology, and then examine them with practical examples based on the experience of implementation in large companies.

The Program Committee has not yet taken a decision on this talk
BigData and Machine Learning (6)

Alexei Gorbunov

Ozon

Stable and scalable Triton Inference Server in production

The task of content moderation requires a lot of resources and time using a manual approach. That is why we are implementing ML models to solve this problem.

However, under conditions of high loads and the need for maximum fault tolerance, it's needed to choose the right solution for integrating ML models. NVIDIA's Triton Inference Server turned out to be such a tool for us.

Triton Inference Server is a powerful software that supports inferencing of several models at once and can allocate and use computing resources efficiently. However, in situations where high fault tolerance and maximum automation are required, features of pure Triton are not enough.

To meet the requirements that arise when working with a ML models in production, a number of solutions have been developed to improve stability and fault tolerance.
Main topics to be covered:
* Ensuring scalability
* Additional condition monitoring tools
* Full control and automation of model updates
* Ability to create individual instances for different models for efficient resource utilizing and fault tolerance

Thus, an attempt was made to create a Triton as a Service to make the models integration be easy and improve the stability of the system as a whole.

The Program Committee has not yet taken a decision on this talk

Ryan Shahbazi

SoCalGas

We know your appliances are broken before you do - profiling failing appliances with energy usage

SoCalGas, The USA's largest natural gas utility, collects hourly gas usage data for over 6 million residential customers through advanced meters. With that information, we are able to chart usage patterns and assess various profiles that correspond to different real world scenarios. Some of them are faulty appliances, like leaking water heaters. With this information we are able to proactively reach out to customers and connect them with our field team for no cost repairs and information about other energy efficiency programs.

This talk is about bringing large scale data from large scale IoT implementation into a real-world use case benefits many people. The content will provide a brief overview of the device and tech stack the company uses, how the information is profiled, some examples of usage profiles, and then how that information is leveraged through many departments to benefit the customer. The beneficiaries of this talk will be senior management who could be inspired by how everything connects from IT, to data science, to call centers, to field techs, into a single package.

The Program Committee has not yet taken a decision on this talk

Dmitrii Khodakov

Avito

How we built personal recommendations in the world’s most significant classified

Context: setting the task - a feed of personal recommendations on the main page. How to launch recommendations in production when you have 150 million items and 100 million users? I will share my experience, tell you about the pitfalls
A quick overview of the arsenal of models: classic ML approach
A quick overview of metrics starts with product metrics.
The basis of everything: fast experiments and analytics on actual data
Where to start? Classical matrix factorization and its launch pattern.
What problems did you encounter at this stage
Little more advanced: switching real-time user features and history. An alternative approach with simpler models.
Advanced models: Let's add neural networks, the strength is in diversity.
Mixing models - great blender
How does it work in production? Replaced Go with Python, what happened to time to market?
And again, about the experiment cycle, I'll tell you about product metrics.

The talk was accepted to the conference program

Robert Hodges

Altinity

Repel Boarders! What Every Developer Should Know about Protecting Data on Kubernetes

Kubernetes has blossomed into a popular vessel for running databases. But is your data really safe from pirates? This talk introduces security for data on Kubernetes, focusing on basic techniques that any developer can apply using open source tools. The talk starts with basic ways to harden the environment like container scanning, the use of secrets, and encryption. Next, we will show how to lean on operators to protect databases running on Kubernetes. Finally, we’ll consider ways to protect Kubernetes itself. Attend the talk to find out how!

The Program Committee has not yet taken a decision on this talk

Mikhail Sterkhov

MY.GAMES

How to build data ecosystem in the gaming company

In the gaming industry's fierce competition, discover how MY.GAMES constructs a robust data ecosystem. Uncover how we develop tech solutions for analysts for data-driven game development, player engagement, and business growth. Empower analysts to build data pipelines and visualize insights autonomously.

The Program Committee has not yet taken a decision on this talk

Robert Hodges

Altinity

Building Real-time Analytics on Kubernetes with the ClickHouse Operator

Looking to transform your apps with real-time analytics? Kubernetes is an outstanding platform operating high performance databases and ClickHouse runs well on it. This talk starts from the basics of Kubernetes and introduces an operator we implemented that enables you to stand up ClickHouse clusters. We'll walk through the installation process and bring up a ClickHouse cluster in real-time during the talk. We'll then show how running on Kubernetes enables emergent behavior like independent scaling of compute and storage, server fault tolerance, cross-AZ high availability, and rolling upgrades. You'll have enough guidance from this talk to start your journey to real-time data on a robust, cloud native architecture.

The Program Committee has not yet taken a decision on this talk
Enterprise Systems Performance (2)

Denis Tsvettsikh

DevBrothers

Dark side of Event Sourcing

I will share my experience in event sourcing and show some pitfalls of this approach:
- EventSourcing terminology (event, stream, snapshot, projection)
- Technical advantages of events storing in comparison of storing states
- Is it possible to implement EventSourcing without CQRS?
- Is it possible to use transactional consistency instead of eventual consistency with Event Sourcing?
- Which business tasks can be more effective solved using EventSourcing
- How to avoid snapshot expiration
- The fastest way to build SQL projections
- Myths about Event Sourcing (it allows to avoid soft delete, events allows to do any analysis)
- Typical problems which has another solution with EventSoucing
* unable to remove user's events with sensitive data so events are encrypted and removed encryption key
* it is difficult to implement unique constraints
* unable to load aggregate with any relations, only data from stream
* HTTP Patch doesn't work
* projections shouldn't have unique indexes and foreign keys
* production data anonymizer should fill new event store with anonymized events

Also I will show Event Sourcing frameworks like Marten, describe its pros and cons, and does it make sense to use it in production.

The Program Committee has not yet taken a decision on this talk

Andrey Nagikh

WebPros

From Zero to Production: The Practical Guide to WebAssembly

There are many overviews about the WebAssembly technology or its internals. In this presentation, Andrey will focus on the practical side. He will share his experience of porting the real-world C++ networking application to the browser with WebAssembly and Embind.

There were many obstacles hiding in this way. Andrey will explain how to deal with some of them. We will see how to start porting, how tooling works, how to debug WebAssembly in the browser, and more.

Andrey will show some examples of WebAssembly applications and demos. We will see the pros and cons of using this technology and learn when it is applicable.

The Program Committee has not yet taken a decision on this talk
DevOps and Maintenance (9)

Dmitry Tsepelev

UULA

Backend monitoring from scratch

Almost everyone has monitoring. In the ideal world it is a reliable tool that detects sympthoms earlier than they become serious problems. Often time APM on a free plan with out-of-the-box reports is used as a monitoring tool. As a result, something is measured, some alerts are sent into the chat, no one responds to them, and one day the major incident happens.

In the talk we will:

- define monitoring antipatterns;

- pick the most critical metrics and ways to see insights in charts;

- represent the system in the terminology of queue theory;

- figure out how to choose lower–level metrics and how to use them to find problems;

- discuss why alerts are helpful, and when they are not needed.

The talk was accepted to the conference program

Max Vanyushkin

Tinkoff

The butterfly effect in SRE

Strong SLA tends to be a requirement in the modern digital world, and it's actually required in fintech like companies. But some things are not important at the first look could dramatically breach SLA, especially when we deal with highly loaded services.
Our team works on an observability platform we created at Tinkoff and named "Sage". Sage is an internal product and covers the whole ecosystem of the company. Sage is a pretty loaded system and gets 4 Gigabytes/s of incoming traffic and holds 7.5 Petabytes of user's data.
In my talk, I'm going to share with you experience of overcoming several failures (hardware and software) we got operating that system due to small things we didn't pay enough attention.

The Program Committee has not yet taken a decision on this talk

Soumyadip Chowdhury

Red Hat India

Maximize the Developer Experience with Backstage

I will be delivering a presentation on how to improve the developer experience by using the Backstage Developer's Portal. Backstage is an open-source platform that allows you to create your own developer portal. Many well-known companies, including Unity, Netflix, and Spotify, have already implemented this highly adaptable platform.

My discussion will center around the key features of Backstage, such as software templating, cataloging, searching, and a straightforward portal for all documentation. By utilizing Backstage, you can overcome various developer challenges, such as managing documentation, clarifying relationships between different parts of your software, identifying the responsible person for a particular module or source code piece, or launching a new project with best practices.

Furthermore, I will demonstrate how you can manage multiple applications from a single portal by creating plugins in the backend, and how you can enhance the user experience by offering.

The Program Committee has not yet taken a decision on this talk

Oleg Voznesensky

Gazprombank

Demystifying GitOps. How to upgrade your CIOps to GitOps in a minimalistic way

The purpose of this talk is to help DevOps engineers to understand GitOps pattern and take decisions about using GitOps or not. Also, I will discuss the most frequent problems and ways to solve them.

The talk was accepted to the conference program

Anton Grigoryev

Evocargo

Remote diagnostics and monitoring of a fleet of self-driving vehicles with Robot Operating System

Self-driving vehicles, as any mobile robots, are complex systems of hardware and software components, each with potentially unique failure scenarios. Ability to quickly respond to suboptimal performance and component failure is key to getting from a prototype to production and then scaling the business.

One of the most popular frameworks for building high-level robotic software, the Robot Operating System (ROS), allows convenient hands-on debugging, but requires additional integrations for remote monitoring.

I’ll give an overview of ROS and the unique challenges of robotic systems, talk about the standard diagnostics capabilities in ROS, and then show how we use VictoriaMetrics and Grafana to greatly enhance observability of our fleet of autonomous trucks.

The Program Committee has not yet taken a decision on this talk

Piotr Trębacz

voidborn.one

Power Docker

Over 6 years of writing Dockerfiles for living and helping other teams/companies to understand and use containers better gives a full picture about our modern devops fundamentals

The Program Committee has not yet taken a decision on this talk

Daniil Gitelson

Lekton

Building log aggregation solution with ClickHouse

ClickHouse is not an out-of-the box solution for logs (like ELK & Grafana Loki) and requires some additional work to make it suitable for storing & querying logs. But, due to it's powerful SQL capabilities, fast data ingestion and good data compression it allows to use it even for logs. In this talk I will describe
* Why did we choose ClickHouse instead of other solutions
* How logs are propagated from apps to ClickHouse
* Logs structure: we use structured-based logging, so what exactly we collect
* How we designed ClickHouse tables for storing logs: column based storage, data retention, storage tiers, etc
* How ClickHouse storage works in context of logs structure
* How we implemented full-text search on a top of ClickHouse, which does not contain it out-of-the box
* How ClickHouse performs against Loki & Elastic in terms of storage size

The Program Committee has not yet taken a decision on this talk

Soumyadip Chowdhury

Red Hat India

Automating cloud infrastructure with Ansible

During this session, I will first discuss the capabilities and practical applications of Ansible. Later on, I will demonstrate how we can use Ansible for automation, write Ansible playbooks, and utilize them for scaling instances and deployment. Furthermore, I will explain how developers can take advantage of Ansible by utilizing various plugins.

During my discussion, I will delve into the principal workings of Ansible and provide guidance on how to get started configuring your cloud infrastructure. This will include the configuration of the Manager, Application Deployment, Orchestration, and Cloud Provisioning using Ansible.

With Ansible, you can deploy multitier applications quickly and easily, without the need to write custom code for system automation. Instead, you can compile a list of tasks in a playbook, and Ansible will determine how to configure your systems to the desired state. Ansible has been designed to ensure that configuration management is simple, reliable, and consistent, making it an ideal choice for IT professionals. Additionally, Ansible offers various APIs that can be utilized to expand connection types, callbacks, and other functions.

Furthermore, I will explore Ansible's provability and audibility while reproducing applications in Docker on the Cloud, as well as provide guidance on how to create your own plugins that can be used with any cloud service provider.

The Program Committee has not yet taken a decision on this talk

Dhrumil Dhanesha

Dhrumil Dhanesha Technologies

From Zero to Hero: How Students can Excel in Programming and App Development

Join Dhrumil Dhanesha in an immersive session that uncovers the transformative power of passion, perseverance, and audacious goals. Starting from the humble beginnings of a 6-year-old boy who spent countless hours exploring the wonders of gaming, Dhrumil's story transcends boundaries and exemplifies the boundless potential within us all.

Discover how Dhrumil's insatiable curiosity propelled him on a self-taught journey of discovery, diving headfirst into the world of programming and coding. Fueling his passion, he diligently honed his skills, spending countless nights researching and experimenting, determined to unravel the secrets of app development.

At the tender age of 12, Dhrumil achieved a remarkable milestone by publishing his first app on the Google Play Store. The thrill of success ignited a flame within him, and he yearned for more. With unwavering determination, he took the leap into entrepreneurship, establishing his own company, and offering web and app development services to startups and businesses. What started as a bold step evolved into a flourishing business, with clients flocking to Dhrumil's doorstep, drawn by his professionalism, expertise, and unwavering commitment.

The Program Committee has not yet taken a decision on this talk
Security, DevSecOps (3)

Mohamed Wali

Amazon Web Services (AWS)

AWS Security Reference Architecture: Visualize your security

How do AWS security services work together and how do you deploy them? The AWS Security Reference Architecture (AWS SRA) provides prescriptive guidance for deploying the full complement of AWS security services in a multi-account environment. AWS SRA describes and demonstrates how security services should be deployed and managed, the security objectives they serve, and how they interact with one another. In this session, learn about these assets, the AWS SRA team’s design decisions, and guidelines for how to use AWS SRA for your security designs. Discover an authoritative reference to help you design and implement your own security architecture on AWS.

The Program Committee has not yet taken a decision on this talk

Artem Bachevsky

Independent Researcher

Container Attack Simulation Framework: going on the gray side

Each new technology brings us not only speed and convenience, but also dozen attack vectors, which, in turn, give new defense tools.

And, solving the problem of protecting your container infrastructure, it would be nice to understand the tactics and techniques of attacks on it, as well as to understand the completeness and accuracy of the tools that promise you security.

Container Attack Simulation Framework is an attempt to create a framework for simulating attacks with an architecture that allows you to cover all tactics and techniques by popular classifications of attacks on containers.

The Program Committee has not yet taken a decision on this talk

Artem Bachevsky

Independent Researcher

GPT3+ models in security problems

Artists, copywriters, and journalists are already at the factory. Is it time for security officers to go to the store for jumpsuit?
In this talk, we will explore how GPT3+ models are already being used by security specialists, developers, and administrators.

Who is ChatGPT? Friend or enemy?

The Program Committee has not yet taken a decision on this talk
System administration, hardware (2)

Mohamed Wali

Amazon Web Services (AWS)

Choose the right Microsoft directory service on AWS

Whether you’re migrating an existing Microsoft Active Directory (AD) or starting from scratch on AWS, this chalk talk will help you choose the right Microsoft directory service for your needs. Learn about AWS Managed Microsoft AD, AD Connector, Simple AD, and AD Domain Services on Amazon EC2. Learn about the features, scalability, security, and cost implications of these services to help you make informed decisions about how to seamlessly integrate and manage your directory services on AWS.

The Program Committee has not yet taken a decision on this talk

Egor Gordovskiy

Yandex Infrastructure

The art of data storage: the embodiment of innovation in Data centers

Let's get acquainted with the modern Yandex Data Center. And also let's talk about the hardware in Data centers and the people who work there.

It's time to find out how Yandex approaches the process of designing, building and operating its data centers.
How Yandex's own developments affect the application and widespread introduction of new technologies in its data centers, and what role servers of its own design and production play in this.

Let's talk about how to cool servers with air from the street and not spend extra money at the same time. And we will also find out how many people are needed to service a data center with a capacity equal to a small city.

The Program Committee has not yet taken a decision on this talk
Platform engineering (2)

Piotr Trębacz

voidborn.one

Lead your Software Teams like a Dungeon Master

Learn techniques from an experience Dungeon Master that work both at the gaming and business table to keep people from becoming murder hobos ;)

The Program Committee has not yet taken a decision on this talk

Andrei Aleksandrov

enabling.team

How internal platform impacts organizational structure

My talk is going to be about discovering what's really going on in a company and noticing how changes in platform affect organisational structure of product teams. We will start discussion with minimal basics of Team Topologies. It's approach I used to visualise teams interactions and understand how company truly works. We will see how different teams block each other in real life and a way platform can solve this kind of problems.

The Program Committee has not yet taken a decision on this talk

Architectures and scalability (15)

Tracing the Evolution of Serverless : FaaS to DaaS

First Aid Kit for C/C++ server performance

Practical software architecture: what do we all know about it but are too laisy to use

The CDN journey: There and Back Again

One of the ways to expand the network for heavy loads

Modelling Non-Functional Requirements for Business-Critical Applications

Developing for the Foldable Phone Revolution

How to survive, when opensource becomes deprecated

The Twelve-Factor App in practice

Modernizing Legacy IT Systems and Applications

Let's build a search aggregator in travel tech

FX Risk Calculation: Why/How?

Terraform, day 1001

What we learned from production incidents

Architecting for Streaming Large Scale Digital Twins Data: Being Strategic with Your Choices

Databases and storage systems (10)

MongoDB Alternatives: Is There A Need For A New Open Standard?

The State of Open Source

Failures are prohibited: how we made automatic failover

Database Performance for Data Engineers

One PostgreSQL to rule them all

Fast, Faster, Fastest: Object Storage, Cloud Block Storage, and SSD in Analytic Databases

Object Storage in ClickHouse

Fast, Cheap, DIY Monitoring with Open Source Analytics and Visualization

All BSONs suck

Time-series Data Management at Scale with TimescaleDB

Architecture, Design patterns, Queues and Data Streams (1)

Modern approaches to IT landscape management

BigData and Machine Learning (6)

Stable and scalable Triton Inference Server in production

We know your appliances are broken before you do - profiling failing appliances with energy usage

How we built personal recommendations in the world’s most significant classified

Repel Boarders! What Every Developer Should Know about Protecting Data on Kubernetes

How to build data ecosystem in the gaming company

Building Real-time Analytics on Kubernetes with the ClickHouse Operator

Enterprise Systems Performance (2)

Dark side of Event Sourcing

From Zero to Production: The Practical Guide to WebAssembly

DevOps and Maintenance (9)

Backend monitoring from scratch

The butterfly effect in SRE

Maximize the Developer Experience with Backstage

Demystifying GitOps. How to upgrade your CIOps to GitOps in a minimalistic way

Remote diagnostics and monitoring of a fleet of self-driving vehicles with Robot Operating System

Power Docker

Building log aggregation solution with ClickHouse

Automating cloud infrastructure with Ansible

From Zero to Hero: How Students can Excel in Programming and App Development

Security, DevSecOps (3)

AWS Security Reference Architecture: Visualize your security

Container Attack Simulation Framework: going on the gray side

GPT3+ models in security problems

System administration, hardware (2)

Choose the right Microsoft directory service on AWS

The art of data storage: the embodiment of innovation in Data centers

Platform engineering (2)

Lead your Software Teams like a Dungeon Master

How internal platform impacts organizational structure

Become partner