Greenplum Community Forum Greenplum Community Forum

Go Back   Greenplum Community Forum > Community Topics > Greenplum Database Single-Node Support

Reply
 
Thread Tools Display Modes
  #1  
Old 01-29-2010, 01:27 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default FAA Benchmark Results using GP SNE and Column Store

There is an FAA benchmark published on a MySQL performance blog here:
http://www.mysqlperformanceblog.com/...t-and-monetdb/

I just ran this benchmark on a two CPU single server and my laptop, and the result is that we are the fastest result in most (9/13) queries when compared to the fastest column store databases (MonetDB and Infobright) and in the 4 cases where we weren’t we were very close. This is actually somewhat surprising, given that our executor is not particularly optimized and we don’t implement any of the fancy column aggregate optimizations that the others do and that this benchmark is a perfect use-case for.

Also – we load the data 13 times faster, 15 minutes compared to 3.5 hours, even when including the time to decompress the input data. If you read the blog, you’ll also see how difficult it was to load the data, where in Greenplum I just used COPY and unzipped it into STDIN with an error table to catch the extra 114 rows of XML in each csv file.

The below result on the server used quicklz compression with column-store in 3.4-EAP1. All of the IO fit in RAM, so the disk speed was not a factor (no IO during test). 16 segments on an dual 4 core Intel 2.6 GHz Nehalem server with hyperthreading enabled:

Query and load times with zlib6 were about the same, queries a bit (5%) slower, but compression is nearly 300% better, with the whole DB fitting inside 2.7GB, compared to the compressed input data of 3.9GB. The overall compression ratio is 14:1.

I’ve attached the benchmark scripts in a tarball that get the data, load it and run the queries.

- Luke
Attached Files
File Type: zip FAA-benchmark.zip (58.9 KB, 9 views)

Last edited by llonergan; 01-31-2010 at 09:25 PM.
Reply With Quote
  #2  
Old 01-29-2010, 02:01 PM
gabriele@2ndQuadrant.com gabriele@2ndQuadrant.com is offline
Junior Member
 
Join Date: Nov 2009
Location: Prato, Tuscany, Italy
Posts: 28
Send a message via Skype™ to gabriele@2ndQuadrant.com
Default Very good

Hi Luke,

thank you for sharing the scripts with us. At a first glance, results are excellent. Especially the results in terms of data loading and compression.

I am taking some time now to try the scripts myself and read more carefully the blog article you cited in your post.

Cheers,
Gabriele
Reply With Quote
  #3  
Old 01-30-2010, 03:23 PM
Amber Amber is offline
Junior Member
 
Join Date: Oct 2009
Posts: 24
Default

Hi Luke,
What is 3.4-EAP1? Version 3.4 of Greenplum DB?
Reply With Quote
  #4  
Old 01-30-2010, 03:26 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default

Hi Amber,

Yes - it's the "Early Access Program" or Beta version of 3.4/4.0 - it's what I had on my laptop and server (oops).

With respect to these results, it should be the same as the column store, compression and execution features used are identical.

- Luke
Reply With Quote
  #5  
Old 02-01-2010, 01:22 PM
gabriele@2ndQuadrant.com gabriele@2ndQuadrant.com is offline
Junior Member
 
Join Date: Nov 2009
Location: Prato, Tuscany, Italy
Posts: 28
Send a message via Skype™ to gabriele@2ndQuadrant.com
Default Tests on my Mac

Hi Luke,

as promised, I looked at your benchmark scripts during the weekend and tested everything on my Mac first with QuickLZ and second with zlib (compression level 6).

The over 250 zipped data files (about 3.8 GB) on my 2 segments installation occupy about 8.5GB on the QuickLZ scenario. Creating the table with ZLIB6 compression, I confirm I get 2.5GB, equally distributed on the two segments. Data loading took 45 minutes in the first case and 1 hour in the second - without considering ANALYSE.

Here is a report of the timings on the QuickLZ database:

Code:
Q1	32,53
Q2	27,56
Q3	21,8
Q4	3,11
Q5	7,4
Q6	23,84
Q7	104,76
Q8a	4,04
Q8b	4,14
Q8c	7,29
Q8d	12,55
Q8e	45,08
Q9	47,61
And here on the ZLIB6 database:

Code:
Q1	30,85
Q2	29,51
Q3	22,86
Q4	3,17
Q5	7,83
Q6	26,19
Q7	90,64
Q8a	4,39
Q8b	4,28
Q8c	7,84
Q8d	12,7
Q8e	48,95
Q9	48,11
Q1 and Q7 perform better on the compressed, the others are slower, varying from 1% to 10% (Q6). Of course this is just one single run report, but it is enough to confirm the general idea you outlined (about 70% disk usage reduction which as a result makes queries slightly slower).

I'd be interested in having some results for a standard heap storage of the data. I will try and do some experiments in this direction later.

Thanks,
Gabriele
Reply With Quote
  #6  
Old 02-01-2010, 03:53 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default

Hi Gabriele!

Note that ANALYZE is automatic by default, so the load time should include it and you should not have to run it.

This is controlled by the variable "gp_autostats_mode" which should default to "ON_NO_STATS", which means that if a table has an INSERT or COPY and it has no pre-existing statistics, it should generate an auto ANALYZE.

- Luke
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 11:04 PM.

Powered by: vBulletin® Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.


Copyright ©2009 Greenplum All rights reserved. Phone +1-650-286-8012