Response to Community Feedback

Posted by: robin

Tagged in: Untagged 

In our first alpha of InfiniDB 1.1, we’ve responded to a number of feedback items from our community, and we’d like to thank those of you who have been working with us on these and other things. Specifically, two recurring comments from the community, with respect to our 1.0 version, were: (1) CREATE TABLE statements for wide tables sometimes take a long time to complete; (2) The amount of space used by empty and/or small tables is excessive.

First, a little background as to why points one and two above were occurring, and then I’ll show you how we’ve responded to your feedback. InfiniDB is designed to work with lots of data and so many of our default options are set for ‘big’ tables. We’ve also tried to optimize everything we can for fast performance. One of these optimizations has been to preallocate contiguous space on disk for tables during CREATE TABLE time because it’s normally much faster to access data for a table whose underlying storage is not fragmented on disk. As a table gets larger, we also do everything we can to grow a table contiguously. Our default has been to initially allocate space for 8 million rows in every table that is created, with the actual underlying storage usage being based on the width of a table and the datatypes used for each column. The end result has been that very wide and empty tables can take up a lot of space on disk and the DDL time to create them can be longer than times to create MyISAM or InnoDB objects.

With our 1.1.0 alpha, we now have enhancements included that (1) More efficiently utilize space on disk for smaller tables, and even some ‘large’ tables; (2) Create tables much faster than in 1.0; (3) Retain our overall fast load and ‘data readiness’ rates. These enhancements are now available in our 1.1.0 alpha, and will also be backported to our next 1.0 Community maintenance release.

In a nutshell, what we do now is preallocate space during a CREATE TABLE execution that yields enough contiguous space for 256,000 rows (instead of 8 million) with any extents needed afterwards being contiguously allocated at our old extent sizes of 8 million rows.  

As an example of the improvements you’ll see, consider the below tests of a TPCH 1 and 10 database. First, in terms of storage space used, the new enhancements result in nice disk savings for both the TPCH 1 and 10 database:

Disk Usage for Create and Load of TPCH 1 Followed By TPCH 10

 

 

 

 

 

 

 

 

 

Old KB

Old GB

New KB

New GB

Factor  New/Old

% Saved

create tpch1 tables

6,561,828

6.26

206,632

0.20

0.03

96.9%

import tpch1

6,824,228

6.51

2,431,196

2.32

0.36

64.4%

create tpch10 tables

11,638,672

11.10

2,582,512

2.46

0.22

77.8%

import tpch10

22,192,652

21.16

15,225,764

14.52

0.69

31.4%

Time to create empty tables is much faster, and the overall time to create and load tables is very good too:

Seconds to Create and Load TPCH 1 Followed By TPCH 10

 

 

 

 

 

 

 

 

 

 

Old

New

Factor New/Old

% Saved

 

 

create tpch1 tables alone

114.38

10.66

0.09

90.7%

 

 

create and import tpch1 tables

144.38

77.66

0.54

46.2%

 

 

create tpch10 tables alone

113.19

11.55

0.10

89.8%

 

 

create and import tpch10 tables

518.19

451.55

0.87

12.8%

 

 

 

 

 

 

 

 

 

Total

662.57

529.21

0.80

20.1%

 

 

Finally, query times appear to be the same for the old and new methods, so things look good there too.

So again, thanks to the community for the feedback and we hope you like our new storage paradigm. We’re likely going to make the 256K row default a configurable option, so be watching for that. Of course, we’re happy to get more suggestions on how to make things even better, so shoot us your ideas today and please download our new 1.1 alpha and let us know what you think.

Comments (0)Add Comment

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy