.... ...........

Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

InfiniDB Team Blog

A short description about your blog

When should I NOT use InfiniDB?

Posted by: robin

Tagged in: Untagged 

It's great to see such good interest in InfiniDB, but now that we are seeing increasing numbers of users, we're also seeing people who are trying to use InfiniDB in places where they shouldn't. The end result is a bad experience for them and a bad impression of InfiniDB, which is a shame. InfiniDB is somewhat like MySQL Cluster: in the right use cases, you can't beat it, in the wrong use cases, it beats you.

So I thought I would quickly speak to when you should NOT use InfiniDB in hopes of helping you know where not to step with it:

  • Queries where SELECT * are routinely issued or the vast majority of the columns in a table are requested. Being column-oriented, InfiniDB works far better when only a subset of columns are requested in a query. Unlike a row DB, the width of the table doesn't matter with InfiniDB, but what does matter is how many columns are asked for - each is a 'touch' (I/O) for us whereas a SELECT *  in a row DB is normally one 'touch'
  • OLTP work. Yes, InfiniDB is fully ACID compliant and transactional, however it is not suited for OLTP systems. Singleton inserts and deletes run much slower on *any* column database because each column must be updated, whereas inserts and deletes on row DB's are a single I/O. Updates, however, are OK on column DB's because they are column based
  • Tiny databases. If you've just got a few GB, then it's likely you won't be wowed by the performance difference between InfiniDB and another MySQL storage engine like MyISAM.  Column databases like InfiniDB shown their brawn when much larger data volumes are in play (e.g. 250GB-TB's).  So if you've got little, static databases, stick with a general MySQL storage engine

So the obvious next question is - when SHOULD I use InfiniDB? I'll keep it short and sweet:

  • Applications with queries that use only a subset of columns in a table
  • Data warehouses/marts that have a lot of ad-hoc query activity; in other words, systems that are difficult to consistently index because requests change all the time
  • Applications that front big databases - hundreds of GB's to TB's
  • Databases that are primarily updated via load jobs vs. singleton inserts and deletes

Hopefully the above will help serve as a guide as when to and when not to use a column DB like InfiniDB.  We'd love your feedback on how you're using InfiniDB, so please visit our forums and let us know.


InfiniDB Now Available on Windows

Posted by: robin

Tagged in: Untagged 

Without a doubt, it was the statistic that surprised me the most in the surveys I did at MySQL. In short, MySQL is huge on the Windows platform. In terms of downloads, no other platform comes close to Windows, even when you total up all the Linux variants. And when I specifically asked on our surveys what production O/S platform is used for MySQL, Windows was #2 for MySQL Enterprise customers, with RHEL being number one, and (very surprising!) #1 for Community users. Moreover, when asked what other databases people were moving away from, the Microsoft databases – Access and SQL Server – were the top ones.

These facts, of course, give Microsoft great pause.

As someone who has been a SQL Server DBA for a lot of years, I never thought I’d find a database as easy to use on Windows as SQL Server, but MySQL qualifies. And with the upcoming MySQL Workbench, which combines modeling, administration, and a SQL editor/utility all under the same roof, the story gets even better for Windows users.

These are just a few of the reasons I’m very happy to announce that we now have our first port of InfiniDB for Windows ready for you to try. It’s alpha at the moment, and we still have some tuning to do for performance, but you can now download and try InfiniDB on your 32 or 64-bit Windows machine. Of course, for really heavy lifting (i.e. large data) you’ll want to go 64-bit; 32-bit is fine for getting the feel for InfiniDB on Windows.

You can find the Windows binaries/installer at: http://www.infinidb.org/downloads/cat_view/40-binary-release (InfiniDB Versions 1.1 and higher all provide an installer for Windows). There's also a quick FAQ on installing and using InfiniDB on Windows on our FAQ board. Let us know what you think. And thanks again for your use and support of InfiniDB.


InfiniDB Ignite talk presentation now available

Posted by: robin

Tagged in: Untagged 

Thanks to Brian Aker for including me in Wednesday night's Ignite presentation lineup at the MySQL User's Conference. My slides are now available for viewing and download.


MySQL Conference Presentation Available for Download

Posted by: robin

Tagged in: Untagged 

Just a quick note to say the talk I gave today at the MySQL User's Conference - The Thinking Person's Guide to Data Warehouse Design - is now available for viewing and download on Slideshare.

 

Update: Sheeri was kind enough to have this session recorded and posted the video at: http://www.youtube.com/watch?v=G_iaJ8TFwy8.


Meet the InfiniDB Team at the MySQL User Conference

Posted by: robin

Tagged in: Untagged 

Just a quick note that the InfiniDB team is at the MySQL show, so make sure you stop by our booth in the Exhibition Hall to say hi. Tonight is the Expo Hall reception, so grab some food and drink, and come on by to see a pretty cool demo we've got with lots of data that will show you what InfiniDB is capable of.  Hope to see you there!


Quick Note on InfiniDB and NoSQL

Posted by: robin

Tagged in: Untagged 

Well, now I have to wade into the NoSQL discussions that have been going on recently, because we’ve been getting some questions on the differences between InfiniDB and the various NoSQL offerings. I’ve posted a FAQ addition to our site that you can read to get a handle on the distinctions between InfiniDB and NoSQL products. It’s a quick read, I promise.

One thing to keep in mind is that InfiniDB is architected in a very modular fashion, and we are basically a MapReduce style implementation under the covers. We follow the divide and conquer strategy and do parallelism across all participating nodes with built-in failover present between the nodes doing the I/O work (what we call Performance Modules). All parallelized work is then pushed back up to the user layer (called User Modules) that aggregate all the ‘map’ work done and ‘reduce’ (aggregate) it for the result set that’s returned to the user. All done with a MySQL front end.

We have a new poll running on infinidb.org that asks if you’d like us to offer a NoSQL option to what we already do. We’d appreciate your vote either way…


MySQL Migration Guide to InfiniDB Now Available

Posted by: robin

Tagged in: Untagged 

I finally got some free time to write up this new tech paper on how to migrate the parts of your MySQL databases that make sense to InfiniDB.  For those of you already familiar with InfiniDB, you can skip past the front 'why migrate' sections and go right to the "Migration Strategies" part that covers some general procedures and then goes on to give examples of various migration approaches.  And no, there is no registration on the site needed to get this paper.

If you find any errors in the doc or have other suggestions for additional migration methods, please let me know.


The Data Scientist

Posted by: robin

Tagged in: Untagged 

Here's a very interesting article for those of us who work with big data: Data, data everywhere in the Economist.  One interesting quote predicts the emergence of a new database professional: the data scientist: "Chief information officers (CIOs) have become somewhat more prominent in the executive suite, and a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data."

This part also caught my eye: "Data are [sic] becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart." That's the puzzle for all of us, eh?


Response to Community Feedback

Posted by: robin

Tagged in: Untagged 

In our first alpha of InfiniDB 1.1, we’ve responded to a number of feedback items from our community, and we’d like to thank those of you who have been working with us on these and other things. Specifically, two recurring comments from the community, with respect to our 1.0 version, were: (1) CREATE TABLE statements for wide tables sometimes take a long time to complete; (2) The amount of space used by empty and/or small tables is excessive.

First, a little background as to why points one and two above were occurring, and then I’ll show you how we’ve responded to your feedback. InfiniDB is designed to work with lots of data and so many of our default options are set for ‘big’ tables. We’ve also tried to optimize everything we can for fast performance. One of these optimizations has been to preallocate contiguous space on disk for tables during CREATE TABLE time because it’s normally much faster to access data for a table whose underlying storage is not fragmented on disk. As a table gets larger, we also do everything we can to grow a table contiguously. Our default has been to initially allocate space for 8 million rows in every table that is created, with the actual underlying storage usage being based on the width of a table and the datatypes used for each column. The end result has been that very wide and empty tables can take up a lot of space on disk and the DDL time to create them can be longer than times to create MyISAM or InnoDB objects.

With our 1.1.0 alpha, we now have enhancements included that (1) More efficiently utilize space on disk for smaller tables, and even some ‘large’ tables; (2) Create tables much faster than in 1.0; (3) Retain our overall fast load and ‘data readiness’ rates. These enhancements are now available in our 1.1.0 alpha, and will also be backported to our next 1.0 Community maintenance release.

In a nutshell, what we do now is preallocate space during a CREATE TABLE execution that yields enough contiguous space for 256,000 rows (instead of 8 million) with any extents needed afterwards being contiguously allocated at our old extent sizes of 8 million rows.  

As an example of the improvements you’ll see, consider the below tests of a TPCH 1 and 10 database. First, in terms of storage space used, the new enhancements result in nice disk savings for both the TPCH 1 and 10 database:

Disk Usage for Create and Load of TPCH 1 Followed By TPCH 10

 

 

 

 

 

 

 

 

 

Old KB

Old GB

New KB

New GB

Factor  New/Old

% Saved

create tpch1 tables

6,561,828

6.26

206,632

0.20

0.03

96.9%

import tpch1

6,824,228

6.51

2,431,196

2.32

0.36

64.4%

create tpch10 tables

11,638,672

11.10

2,582,512

2.46

0.22

77.8%

import tpch10

22,192,652

21.16

15,225,764

14.52

0.69

31.4%

Time to create empty tables is much faster, and the overall time to create and load tables is very good too:

Seconds to Create and Load TPCH 1 Followed By TPCH 10

 

 

 

 

 

 

 

 

 

 

Old

New

Factor New/Old

% Saved

 

 

create tpch1 tables alone

114.38

10.66

0.09

90.7%

 

 

create and import tpch1 tables

144.38

77.66

0.54

46.2%

 

 

create tpch10 tables alone

113.19

11.55

0.10

89.8%

 

 

create and import tpch10 tables

518.19

451.55

0.87

12.8%

 

 

 

 

 

 

 

 

 

Total

662.57

529.21

0.80

20.1%

 

 

Finally, query times appear to be the same for the old and new methods, so things look good there too.

So again, thanks to the community for the feedback and we hope you like our new storage paradigm. We’re likely going to make the 256K row default a configurable option, so be watching for that. Of course, we’re happy to get more suggestions on how to make things even better, so shoot us your ideas today and please download our new 1.1 alpha and let us know what you think.


MySQL User Conference

I'll be presenting "The Thinking Person's Guide to Data Warehouse Design" at the upcoming MySQL User conference. While a lot of people think that bad SQL code is the #1 wrecking ball of data warehouses and marts, the fact is that poor database design is the first cause of both downtime and bad performance. In my presentation, I'll do my best to show how up-front work in a data model pays off and how to take that model into a MySQL physical design, with topics like scale-up/out designs, storage engine decisions, partitioning schemes, indexing issues, and much more being discussed. I'll then wrap up with tips on monitoring and tuning of the design.

Hope to see you there!


  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  Next 
  •  End 
  • »