15 December, 13:30, «04 Hall. Ashot Yerkat»
Result of 5-year experience developing Yandex Query to bring batch/stream processing service into Yandex Cloud. YQ can run SQL-like queries over endless dataflow.
We’ll talk about details of design trade-offs: - capacity vs. isolation - performance vs. reliability - security vs. UX.
We present our 5-year experience developing Yandex Query. It is a query processing service in Yandex Cloud. Both batch and stream processing are handled similarly with the same syntax. One can debug their queries on batch data samples and run them for the production stream without changes.
YQ reuses our internal system for distributed query processing. The job can spawn over hundreds of nodes to meet load requirements. YQ can fetch data from and upload it into external systems (i.e. object storage or message queues) to join heterogeneous sources in a single query.
We’ll reveal details of our multitenant system design. Design choices we had and decisions we made:
1. Capacity vs. isolation
Control plane is isolated from the compute plane(s). Processing cluster includes several compute planes (like tenants) to reduce blast radius. It reduces the risk of system downtime in a shared environment.
2. Performance vs. reliability
YQ uses cloud compute nodes and enforces limits and quotas to mitigate DDOS in presence of high-load queries. Data is processed as fast as possible, and we provide exactly-once guarantees under certain conditions.
3. Security vs. UX
YQ conforms to strict cloud policies on data privacy. All data sources support service accounts for flexible access control. The compute plane uses time-limited tokens only. Service is available from cloud console UI and provides API for integration with other services.
Finally, like everything in Yandex, our system is distributed, scalable, and fault-tolerant, with all benefits and complexity of this design.
The talk was accepted to the conference program