![]() |
|
#1
|
|||
|
|||
|
There is an FAA benchmark published on a MySQL performance blog here:
http://www.mysqlperformanceblog.com/...t-and-monetdb/ I just ran this benchmark on a two CPU single server and my laptop, and the result is that we are the fastest result in most (9/13) queries when compared to the fastest column store databases (MonetDB and Infobright) and in the 4 cases where we weren’t we were very close. This is actually somewhat surprising, given that our executor is not particularly optimized and we don’t implement any of the fancy column aggregate optimizations that the others do and that this benchmark is a perfect use-case for. Also – we load the data 13 times faster, 15 minutes compared to 3.5 hours, even when including the time to decompress the input data. If you read the blog, you’ll also see how difficult it was to load the data, where in Greenplum I just used COPY and unzipped it into STDIN with an error table to catch the extra 114 rows of XML in each csv file. The below result on the server used quicklz compression with column-store in 3.4-EAP1. All of the IO fit in RAM, so the disk speed was not a factor (no IO during test). 16 segments on an dual 4 core Intel 2.6 GHz Nehalem server with hyperthreading enabled: ![]() Query and load times with zlib6 were about the same, queries a bit (5%) slower, but compression is nearly 300% better, with the whole DB fitting inside 2.7GB, compared to the compressed input data of 3.9GB. The overall compression ratio is 14:1. I’ve attached the benchmark scripts in a tarball that get the data, load it and run the queries. - Luke Last edited by llonergan; 01-31-2010 at 09:25 PM. |
|
#2
|
|||
|
|||
|
Hi Luke,
thank you for sharing the scripts with us. At a first glance, results are excellent. Especially the results in terms of data loading and compression. I am taking some time now to try the scripts myself and read more carefully the blog article you cited in your post. Cheers, Gabriele |
|
#3
|
|||
|
|||
|
Hi Luke,
What is 3.4-EAP1? Version 3.4 of Greenplum DB? |
|
#4
|
|||
|
|||
|
Hi Amber,
Yes - it's the "Early Access Program" or Beta version of 3.4/4.0 - it's what I had on my laptop and server (oops). With respect to these results, it should be the same as the column store, compression and execution features used are identical. - Luke |
|
#5
|
|||
|
|||
|
Hi Luke,
as promised, I looked at your benchmark scripts during the weekend and tested everything on my Mac first with QuickLZ and second with zlib (compression level 6). The over 250 zipped data files (about 3.8 GB) on my 2 segments installation occupy about 8.5GB on the QuickLZ scenario. Creating the table with ZLIB6 compression, I confirm I get 2.5GB, equally distributed on the two segments. Data loading took 45 minutes in the first case and 1 hour in the second - without considering ANALYSE. Here is a report of the timings on the QuickLZ database: Code:
Q1 32,53 Q2 27,56 Q3 21,8 Q4 3,11 Q5 7,4 Q6 23,84 Q7 104,76 Q8a 4,04 Q8b 4,14 Q8c 7,29 Q8d 12,55 Q8e 45,08 Q9 47,61 Code:
Q1 30,85 Q2 29,51 Q3 22,86 Q4 3,17 Q5 7,83 Q6 26,19 Q7 90,64 Q8a 4,39 Q8b 4,28 Q8c 7,84 Q8d 12,7 Q8e 48,95 Q9 48,11 I'd be interested in having some results for a standard heap storage of the data. I will try and do some experiments in this direction later. Thanks, Gabriele |
|
#6
|
|||
|
|||
|
Hi Gabriele!
Note that ANALYZE is automatic by default, so the load time should include it and you should not have to run it. This is controlled by the variable "gp_autostats_mode" which should default to "ON_NO_STATS", which means that if a table has an INSERT or COPY and it has no pre-existing statistics, it should generate an auto ANALYZE. - Luke |
![]() |
| Thread Tools | |
| Display Modes | |
|
|