InfiniDB load 60 Billion SSB rows trended

Posted by: jtommaney

Tagged in: 10k Scale Factor

I wanted to offer another InfiniDB load rate metric using the SSB lineorder fact table.  In this case we are using a scale factor of 10,000 which translates to 60 Billion rows.  As a point of reference, the recent Percona benchmark was at a scale factor of 1000 (6 billion rows) http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/ .  The load rate per hour varied only slightly across the entire run, averaging about 478 million rows per hour.  As always, your actual load rate will vary based on your disk, table, and column definitions, but you should expect consistent load times across very wide cardinality ranges. 

The table is the Lineorder table as defined here:

Some disclaimers about the SSB at 10,000.  Use of the benchmark at this scale  appears to be bleeding edge:
NOTE: Data generation for scale factors >  1000 GB is still in development,
        and is not yet supported.
Your resulting data set MAY NOT BE COMPLIANT!
In addition, I munged one of 2560 files prior to import and so I am missing about 24 million records vs. the expected.   


mysql> desc ssb_10k.lineorder;
+------------------+---------------+------+-----+---------+-------+
| Field            | Type          | Null | Key | Default | Extra |
+------------------+---------------+------+-----+---------+-------+
| lo_orderkey      | bigint(20)    | YES  |     | NULL    |       |
| lo_linenumber    | int(11)       | YES  |     | NULL    |       |
| lo_custkey       | int(11)       | YES  |     | NULL    |       |
| lo_partkey       | int(11)       | YES  |     | NULL    |       |
| lo_suppkey       | int(11)       | YES  |     | NULL    |       |
| lo_orderdate     | int(11)       | YES  |     | NULL    |       |
| lo_orderpriority | char(15)      | YES  |     | NULL    |       |
| lo_shippriority  | char(1)       | YES  |     | NULL    |       |
| lo_quantity      | decimal(12,2) | YES  |     | NULL    |       |
| lo_extendedprice | decimal(12,2) | YES  |     | NULL    |       |
| lo_ordtotalprice | decimal(12,2) | YES  |     | NULL    |       |
| lo_discount      | decimal(12,2) | YES  |     | NULL    |       |
| lo_revenue       | decimal(12,2) | YES  |     | NULL    |       |
| lo_supplycost    | decimal(12,2) | YES  |     | NULL    |       |
| lo_tax           | decimal(12,2) | YES  |     | NULL    |       |
| lo_commitdate    | int(11)       | YES  |     | NULL    |       |
| lo_shipmode      | char(10)      | YES  |     | NULL    |       |
+------------------+---------------+------+-----+---------+-------+

Example syntax to import:
     /usr/local/Calpont/bin/colxml ssb_10k -t lineorder -j 10000
     /usr/local/Calpont/bin/cpimport -j 10000

Actual row count: 59,977,404,781

Example data: 

|1|1|73799965|1551894|41364203|19960102|5-LOW|0|17.00|3307894.00|18660018.00|
4.00|3175578.00|116749.00|2.00|19960212|TRUCK|

A stable and predictable load rate can be very important when dealing with larger and larger data sets. Let us help you put your data to work!

Thanks - Jim Tommaney,  Chief Product Architect

Comments (2)Add Comment
Ruslan Yalyshev
...
written by Ruslan Yalyshev, February 11, 2010
Thanks! It's a very promising results!
Can you publish some other details about your tests
1. What hardware did you use for this test?
2. How much space was taken by source files and what is the result size of db-files after upload?
Jim Tommaney
...
written by Jim Tommaney, February 12, 2010
The server executing the cpimport load process was an 8 core HP 1U server with 16GB memory (~2Ghz).. There were multiple LUNs adding up to about 9 TB, with each LUN within an LSI storage device and consisting of a raid 0 stripe across 8 disks. The cpimport parameters were left at the default. Note that today we have multi-threaded, but not scale-out write processing, so the server executing the cpimport was primarily writing to one LUN at a time.

I collected but have not aggregated the space utilization here, but it is about the same ratio as what Vadim @ Percona measured for the scale factor 1000 test he did (610 GB source, 626 GB within InfiniDB). So something like 6.1 TB source 6.26 TB in InfiniDB.

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy