MySQL Parallel Query Processing of SSB Queries via InfiniDB
Posted by: jtommaney
on Dec 04, 2009
Let's take a look at MySQL parallel query processing with the InfiniDB storage engine with ~ 6 billion row fact table with the Star Schema Benchmark (SSB).
The star schema benchmark description and a data generator (dbgen) can be found here: http://www.cs.umb.edu/~xuedchen/research/publications/ . The SSB benchmark modifies TPC-H/DBT-3 following data warehousing rules as described in some of the publications.
Note: The load files for the fact table as emitted by dbgen randomizes the dates across the 6 billion rows. For all examples here, the data was split into months, but not sorted, prior to cpimport (bulk load) into InfiniDB. This corresponds more closely to typical data load patterns, and, in my opinion, corresponds to a more realistic data set. This allows for significant I/O elimination to take place, dramatically improving performance.
In addition, for 6 of the 13 queries, an additional predicate was added to duplicate a filter originally applied to a dimension table onto the fact table. For Q1.1, the d_year filter was effectively added to the fact table with the following lo_orderdate predicate. A future enhancement will effectively deliver the same I/O elimination automatically.
-- Q1.1
select sum(lo_extendedprice*lo_discount) as revenue
from dateinfo, lineorder where lo_orderdate = d_datekey
and d_year = 1993
and lo_orderdate between 19930101 and 19931231 -- added matching predicate
and lo_discount between 1 and 3
and lo_quantity < 25;
To give a comparative benefit from the I/O elimination (think partition elimination), the query runs in:
About 22 seconds from disk with the added predicate,
about 10 seconds from cache with the added predicate,
versus about 120 seconds without the predicate.
The timings were captured by running the SSB queries from Q1.1 to Q4.3 in sequence without flushing the cache or restarting system. These times are from the second execution of the script to replicate a warm cache scenario. The system configuration was then modified to add or remove performance modules (servers). The test script starts with 4 PMs (pm1, pm2, pm3, pm4) and removes from service to test different configurations. Re-enabling modules works the same way.
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_4PM_run1.log
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_4PM_run2.log
cc altersystem-disablemodule pm3,pm4 y
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_2PM_run1.log
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_2PM_run2.log
cc altersystem-disablemodule pm2 y
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_1PM_run1.log
idbmysql -vvv ssb_1000 < ssb_w_trace.sql > run_ssb_trace_1000sf_1PM_run2.log
The test includes a single performance module implementation (similar to open source single server) as well as the scale-out feature offered as part of the InfiniDB Enterprise Edition. The overall parallel query capabilities are similar for the single server open source edition as well as the enterprise. The cc command shown above is a call to the Calpont Console capabilities to administer the system. In addition, these tests were run with the default tuning parameter settings.
From the standpoint of the parallel query capabilities, adding additional cores either by scale-out (enterprise) or scale-up (available with single server) offer approximately the same benefits for many query categories.
The queries/script as run can be found at:
http://docs.google.com/Doc?docid=0AWI1rlqF3OcyZGhjcTVqcnFfNGN3N3RodmMz&hl=en
The timings for the queries against the 6 billion row fact table and up to 30 million rows in the dimension tables for 1,2 and 4 Performance Modules (PMs) are (all times in seconds):
| ----- PM Count ----- | |||
| 1PM | 2PMs | 4PMs | |
| Q1.1 | 81 | 41.03 | 22.11 |
| Q1.2 | 7.91 | 4.15 | 2.46 |
| Q1.3 | 50.73 | 25.43 | 16.06 |
| Q2.1 | 482.25 | 247.68 | 129.15 |
| Q2.2 | 385.21 | 200.88 | 103.3 |
| Q2.3 | 354.32 | 180.51 | 96.78 |
| Q3.1 | 830.53 | 432.64 | 228.84 |
| Q3.2 | 548.38 | 286.23 | 150.81 |
| Q3.3 | 473.07 | 216.29 | 113.29 |
| Q3.4 | 24.48 | 15.87 | 8.49 |
| Q4.1 | 999.43 | 519.98 | 280.04 |
| Q4.2 | 255.01 | 130.94 | 57.27 |
| Q4.3 | 158.44 | 82.65 | 26.35 |
The same metrics displayed graphically show the performance improvement via adding more cores (in this case through scale-out).

To better understand the results we can view the elapsed time as log(seconds) to compare to a linear trend line, so the graph now includes one additonal line titled Linear Trend Line. .png)
The overall trend lines show very good scalability across a wide variety of cases. A look into the details shows a couple of variations from the ideal linear scalabilty trend line. The results could be classified into 3 general groups:
Group 1 - Most queries show the same slope as Linear Trend Line (purple towards the middle of the graph), indicating something approaching linear scaling.
Group 2 - A few show some modest variations from the best case scaling, indicating that some portion of the query is not fully parallel, or potentially some variation in the timings that may go away with a larger sample size.
Group 3 - Some better than linear scaling occurs for queries 4.2 and 4.3 where the query is more than twice as fast as resources double. This is due to a larger cache allowing elimination of significant amounts of physical I/O. So, with caching, greater than linear scalability is possible. Obviously, this is conditional depending on the exact queries, data, and memory.
This chart shows the reduction in elapsed time as the parallel query is distributed across more cores.
1 -- 2 shows the reduction in time as the system scales from 8 cores (on one PM) to 16 cores (on two PMs).
2 -- 4 shows the reduction in time as the system scales from 16 cores to 32 cores (on 4 PMs).
Group 2 queries are highlighted in yellow, and Group 3 queries are highlighted in green.
| Scalability Factor | ||
| 1 -- 2 | 2 -- 4 | |
| Q1.1 | 0.507 | 0.539 |
| Q1.2 | 0.525 | 0.593 |
| Q1.3 | 0.501 | 0.632 |
| Q2.1 | 0.514 | 0.521 |
| Q2.2 | 0.521 | 0.514 |
| Q2.3 | 0.509 | 0.536 |
| Q3.1 | 0.521 | 0.529 |
| Q3.2 | 0.522 | 0.527 |
| Q3.3 | 0.457 | 0.524 |
| Q3.4 | 0.648 | 0.535 |
| Q4.1 | 0.520 | 0.539 |
| Q4.2 | 0.513 | 0.437 |
| Q4.3 | 0.522 | 0.319 |
| Average | 0.522 | 0.519 |
The overall average reduction in time approaches, but does not reach .50, the theoretical best case result for perfect linear scalability. I would characterize the Star Schema Benchmark data set and queries as showing very strong parallel query for MySQL with the InfiniDB storage engine.
Let us know if you have any questions or would like to see some additional data sets profiled. We look forward to helping you put your data to work.
Thanks - JT
