Designing a more efficient OLAP database data flow and architecture

Photo
Chris Bohn

MicroFocus LTD

15 December, 10:00, «01 Hall. Tigran»

Abstracts

Modern database systems feature OLTP databases for recording business facts and dimensions, and OLAP databases for data analytics. These database types feature different fundamental storage architectures. OLTP databases are designed for fast single-record lookup, while OLAP databases are designed for fast analytics like aggregation. This leads to different data storage approaches. To enable fast aggregation, OLAP databases usually feature immutable data storage containers, especially in cloud environments like AWS. This makes update and delete operation very expensive, because those immutable storage containers must be destroyed and rebuilt. Excessive updates and deletes can severely impact OLAP database performance. Most transactional middleware applications make use of frequent update and delete operations, which have less performance impact on OLTP databases compared to OLAP. Most businesses feature OLTP databases for running the business, the transaction data then feeding OLAP databases to subsequently analyze the business. OLTP and OLAP databases need to live together nicely, but there is an impedance mismatch due to the data storage differences. At MicroFocus, we set about to minimize the impedance mismatch. We settled on a data flow design where all data loaded into Vertica (our OLAP database) is append-only. We also determined that data integrity would be inherited from the upstream OLTP systems - so why do it again? We thus decided to use no primary or foreign key constraints, because the benefits would be redundant. This allows for much faster ELT data loading and query processing because there is no constraint checking. Again, we are inheriting constraint checking from the upstream OLTP databases, which are much better suited for that. By accepting that the upstream OLTP database has already done referential integrity checks, our OLAP database is freed from the constraint checking overhead. That is a large performance gain.

As mentioned, our ELT is append-only. That means that our Vertica OLAP database has all the iterations of all the records. That means we have a complete record of the whole change history for all the records. The change history has become a hot topic in data analytics because the effect of changes to things such as product description and correlating that to sales revenue is an important data point. Keeping complete change history is becoming essential to data analytics. The OLTP/OLAP design that MicroFocus has taken yields efficient OLAP database performance and retains change history - an important win that comes at no cost.

Summary: The design approach we have taken at MicroFocus with our OLTP/OLAP design has yielded a robust and performant holistic system that enables our Vertica OLAP EDW to perform at its potential, while providing benefits like change history.

The talk was accepted to the conference program