Posted by: robin
on Mar 15, 2010
Tagged in: Untagged
In our first alpha of InfiniDB 1.1, we’ve responded to a number of feedback items from our community, and we’d like to thank those of you who have been working with us on these and other things. Specifically, two recurring comments from the community, with respect to our 1.0 version, were: (1) CREATE TABLE statements for wide tables sometimes take a long time to complete; (2) The amount of space used by empty and/or small tables is excessive.
First, a little background as to why points one and two above were occurring, and then I’ll show you how we’ve responded to your feedback. InfiniDB is designed to work with lots of data and so many of our default options are set for ‘big’ tables. We’ve also tried to optimize everything we can for fast performance. One of these optimizations has been to preallocate contiguous space on disk for tables during CREATE TABLE time because it’s normally much faster to access data for a table whose underlying storage is not fragmented on disk. As a table gets larger, we also do everything we can to grow a table contiguously. Our default has been to initially allocate space for 8 million rows in every table that is created, with the actual underlying storage usage being based on the width of a table and the datatypes used for each column. The end result has been that very wide and empty tables can take up a lot of space on disk and the DDL time to create them can be longer than times to create MyISAM or InnoDB objects.
With our 1.1.0 alpha, we now have enhancements included that (1) More efficiently utilize space on disk for smaller tables, and even some ‘large’ tables; (2) Create tables much faster than in 1.0; (3) Retain our overall fast load and ‘data readiness’ rates. These enhancements are now available in our 1.1.0 alpha, and will also be backported to our next 1.0 Community maintenance release.
In a nutshell, what we do now is preallocate space during a CREATE TABLE execution that yields enough contiguous space for 256,000 rows (instead of 8 million) with any extents needed afterwards being contiguously allocated at our old extent sizes of 8 million rows.
As an example of the improvements you’ll see, consider the below tests of a TPCH 1 and 10 database. First, in terms of storage space used, the new enhancements result in nice disk savings for both the TPCH 1 and 10 database:
|
Disk Usage for Create and Load of TPCH 1 Followed By TPCH 10
|
|
|
|
|
|
|
|
|
|
|
|
Old KB
|
Old GB
|
New KB
|
New GB
|
Factor New/Old
|
% Saved
|
|
create tpch1 tables
|
6,561,828
|
6.26
|
206,632
|
0.20
|
0.03
|
96.9%
|
|
import tpch1
|
6,824,228
|
6.51
|
2,431,196
|
2.32
|
0.36
|
64.4%
|
|
create tpch10 tables
|
11,638,672
|
11.10
|
2,582,512
|
2.46
|
0.22
|
77.8%
|
|
import tpch10
|
22,192,652
|
21.16
|
15,225,764
|
14.52
|
0.69
|
31.4%
|
Time to create empty tables is much faster, and the overall time to create and load tables is very good too:
|
Seconds to Create and Load TPCH 1 Followed By TPCH 10
|
|
|
|
|
|
|
|
|
|
|
|
|
Old
|
New
|
Factor New/Old
|
% Saved
|
|
|
|
create tpch1 tables alone
|
114.38
|
10.66
|
0.09
|
90.7%
|
|
|
|
create and import tpch1 tables
|
144.38
|
77.66
|
0.54
|
46.2%
|
|
|
|
create tpch10 tables alone
|
113.19
|
11.55
|
0.10
|
89.8%
|
|
|
|
create and import tpch10 tables
|
518.19
|
451.55
|
0.87
|
12.8%
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
662.57
|
529.21
|
0.80
|
20.1%
|
|
|
Finally, query times appear to be the same for the old and new methods, so things look good there too.
So again, thanks to the community for the feedback and we hope you like our new storage paradigm. We’re likely going to make the 256K row default a configurable option, so be watching for that. Of course, we’re happy to get more suggestions on how to make things even better, so shoot us your ideas today and please download our new 1.1 alpha and let us know what you think.