Greenplum Community Forum Greenplum Community Forum

Go Back   Greenplum Community Forum > Community Topics > Greenplum Database Single-Node Support

Reply
 
Thread Tools Display Modes
  #1  
Old 01-31-2010, 09:13 AM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default GP SNE versus Postgres 8.4.2 - 22 Query Benchmark Results

In this post we'll look at a set of basic DSS type queries typical of data warehouse / data mart applications in the late 1990s. There are 22 queries in this test case, and they can be run against different sizes of input data generated by a random data set generator. This benchmark is very similar to the queries used in the TPC-H benchmark, but this test is not TPC-H.

The intention of this test is to evaluate the calculation capabilities of GP SNE and compare them to other databases. In this first post, we'll run GP SNE and Postgres 8.4.2.

About pre-aggregation, materialized views, etc and benchmarks

Business users of analytic tools require the flexibility to ask ad-hoc questions of their data. Pre-calculation of answers in advance of the questions violates the principle that a system needs to flexible in this way. There are many available solutions that pre-aggregate data, including MOLAP cube servers (MSFT SSAS, Hyperion, SAP BW, etc) and there are many others that maintain aggregates or materialized views as part of their storage model (Infobright, MonetDB, etc).

Query benchmarks are particularly susceptible to "gaming" through pre aggregation because the usage pattern is fixed a-priori. Most of the work in composing a valid benchmark for an analytic database tool involves avoiding situations where the test can be trivialized by recognizing the repeating query pattern and pre-computing results.

I believe that the best way to estimate usability of the system is to dissallow the use of pre-computed aggregates and materialized views or "projections". In this and other benchmark tests, we will focus on evaluating the computational capability of the tools without the use of precomputed running sums, materialized views or projections.

On tuning

There are always ways to tune the database to achieve better performance.

The tunings allowed in this test are those that apply to all of the queries. Tuning settings for individual queries is not allowed. We do attempt to find the appropriate global tuning parameters for each tool in the test.

For query results, the fastest of 3 runs is used. If there is more than 10% variance in results between runs it is reported, otherwise one result is listed.

The test - Postgres 8.4.2 and GP SNE

The test machine

We use the same machine for all tests:
- Dell R710 server with 2 X5550 CPUs at 2.67GHz
- 64GB of RAM
- Dell md1120 storage unit with 24 disks
- All filesystems use XFS, tuned for sequential IO

The server is tested with gpcheckperf to determine the sequential IO, CPU and memory performance before the tests.
Here are the results of gpcheckperf for reference:
Code:
 disk write avg time (sec): 160.47
 disk write tot bytes: 134842155008
 disk write tot bandwidth (MB/s): 801.37

 disk read avg time (sec): 75.81
 disk read tot bytes: 134842155008
 disk read tot bandwidth (MB/s): 1696.29

 stream tot bandwidth (MB/s): 4332.54
Installation of the database instances

Postgres 8.4.2 was installed using the 64-bit installer from:
http://www.enterprisedb.com/getfile.jsp?fileid=808

A single data directory was implemented on a RAID5 volume with 6 disks in a (5+1) configuration with a measured sequential scan performance of 425 MB/s.

Greenplum SNE was installed using the 64-bit installer from:
http://www.greenplum.com/database/server-downloads/

16 data directories were used to host 16 segments. The 16 directories were located on 4 XFS filesystems. Each XFS filesystem was built on a (5+1) RAID5 configuration. 16 segments were used to match the number of hyperthreaded cores. Basic testing showed that using all 16 hyperthreaded cores was approximately 30% faster than using 8 real cores.

The configuration file used to create the GP SNE instance is:
Code:
ARRAY_NAME="MPP Test"
MACHINE_LIST_FILE=hosts
SEG_PREFIX=lal
PORT_BASE=51000
declare -a DATA_DIRECTORY=( \
/data1/llonergan/A /data1/llonergan/B /data1/llonergan/C /data1/llonergan/D \
/data2/llonergan/A /data2/llonergan/B /data2/llonergan/C /data2/llonergan/D \
/data3/llonergan/A /data3/llonergan/B /data3/llonergan/C /data3/llonergan/D \
/data4/llonergan/A /data4/llonergan/B /data4/llonergan/C /data4/llonergan/D )
MASTER_HOSTNAME=killnz1
MASTER_DIRECTORY=/data1/llonergan
MASTER_PORT=54321
IP_ALLOW=0.0.0.0/0
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
Data Loading

For each size of case tested, we created a new database inside
each instance and loaded the data using:
'cat file | psql -c "COPY ..."'

Results with 1GB of data

The 1GB scale factor is a good test of the computational capability of the database because it fits entirely in memory on most machines. Since there is no IO to/from the disk, theres no need to compare how effective the database engines are at managing IO, it's purely a test of how good the planning and execution of the computational work is.

Here we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file.

First the load timings:
Code:
DB                      Storage         Compression     Size    Time
-----                   -----------     ------------    -----   ----
Postgres 8.4.2          Heap            None            1.3GB   67s
Greenplum SNE 3.3.4     Heap            None            1.3GB   25s
Greenplum SNE 3.3.4     Column          None            1.1GB   23s
Greenplum SNE 3.3.4     Column          zlib6           .25GB   22s
Greenplum SNE 3.3.4     Column          quicklz         .38GB   18s
We see that Greenplum SNE is able to load 3 times faster than Postgres and with quicklz and column compression, it's almost 4 times faster. With GP SNE external tables it would likely be faster than these times due to the increased parallelism.

We note that the load was dominated by CPU time as shown by watching vmstat during the loading and observing that CPU was maxed while write IO was moderate at about 20-40 MB/s, far below the 250-900 MB/s capability of the filesystems for both the Postgres instances and the Greenplum instances. Increasing IO bandwidth or reducing WAL logging overhead would not improve load times.

Also note that Greenplum SNE achieved a 4.4:1 compression for column store with zlib6.

Next are the query times - first reported as the total of all 22 queries:
Code:
DB                      Storage         Compression     Time
-----                   -----------     ------------    ----
Postgres 8.4.2          Heap            None            187.2s
Greenplum SNE 3.3.4     Heap            None            14.3s
Greenplum SNE 3.3.4     Column          None            12.7s
Greenplum SNE 3.3.4     Column          zlib6           14.2s
Greenplum SNE 3.3.4     Column          quicklz         13.7s
Note that the work_mem for Postgres was set to 32MB globally to match the setting used in Greenplum by default.

Greenplum SNE is between 13 and 14 times faster than Postgres 8.4.2 in all configurations.

Now we look at a side-by-side comparison on each query:
Code:
                        GP SNE 3.3.4
Query   PG 8.4.2        Heap            Column
-----   -----------     ------------    ----------
1       49.7            2.1             2.1
2       1.0             .5              .74
3       3.7             .3              .27
4       5.9             .3              .22
5       2.9             .9              .35
6       2.6             .2              .16
7       3.9             .69             .74
8       2.5             .57             .7
9       44.4            1.3             .9
10      5.4             .4              .65
11      .55             .4              .65
12      5.0             .3              .2
13      2.7             .5              .74
14      2.8             .4              .2
15      5.4             .6              .3
16      4.4             1.              .8
17      7.0             .65             .4
18      15.7            1.              1.1
19      4.2             .24             .2
20      4.1             .5              .45
21      12.0            1.              .59
22      1.2             .3              .2
While there are some high runtime outliers in the Postgres 8.4.2 results, it's clear that there is a systematic difference between the two sets of results.

There is only one case where Postgres was close to the Greenplum query time and that was query 11, which is a join of three tables, partsupp, supplier and nation. I have the query plans from both, but they are too large to post here - I'll make a continuation in the next thread with the EXPLAIN plans on request.

Next Up: the 100GB test
Attached Files
File Type: zip runTPCH-exp.tar.zip (61.6 KB, 18 views)

Last edited by llonergan; 01-31-2010 at 10:41 AM.
Reply With Quote
  #2  
Old 02-01-2010, 03:44 AM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default 100GB case

Results with 100GB of data

The 100GB scale factor is 100 times the size of the 1GB test. This time we're running a case larger than the available memory (64GB), so this is a test of how much the DB helps us to run larger cases on small hardware.

Here we'll look for a couple of things:
1) How does the result scale with size? Does this test take 100 times as long? Worse? Better?
2) How does the 100GB sit on the machine? Does it fit in memory or does it require disk?

We'll run the case the same way as before, but we'll only use the column store with zlib6 compression for Greenplum, as that showed the best results previously and should fit this case in RAM.

As before, we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file.

First the load timings:
Code:
DB                      Storage         Compression     Size    Time
-----                   -----------     ------------    -----   ----
Postgres 8.4.2          Heap            None            130GB   8373s
Greenplum SNE 3.3.4     Column          zlib6            24GB   2602s
As expected, the GP load time is 1/3 of the Postgres load time. Also, during loading Greenplum the machine was 70% idle with the COPY CPU at full load, indicating that loading with EXTERNAL TABLES would likely run 3 times faster still.

As before, the IO was not the bottleneck on Postgres (or GP). For Postgres, it hovered between 20 and 60MB/s writes.

Greenplum SNE achieved a 4.4:1 compression for column store with zlib6.

Next are the query times - first reported as the total of all 22 queries:
Code:
DB                      Storage         Compression     Time
-----                   -----------     ------------    ----
Postgres 8.4.2          Heap            None            348 minutes
Greenplum SNE 3.3.4     Column          zlib6           18.1 minutes
Note that the work_mem for Postgres was set to 32MB globally to match the setting used in Greenplum by default.

Greenplum SNE is 19 times faster than Postgres 8.4.2 for the 100GB case.

We can compare the times for the 1GB case to the 100GB case now:
Code:
DB                      1GB Time    100GB Time     Ratio 100GB/1GB
-----                   --------    ----------     ---------------
Postgres 8.4.2          3.12m       348m           112:1
Greenplum SNE 3.3.4     0.237m      18.1m           76:1
Conclusion #1: With Greenplum SNE, we scaled 24% better from 1GB to 100GB than 100:1 and with Postgres 8.4.2 we scaled 12% worse than 100:1.

Now we look at a side-by-side comparison on each query:
Code:
 Query   PG 8.4.2        GP SNE 3.3.4
-----   -----------     ------------
1       3357            178
2       1397            11.1
3       692             28.8
4       536             20.6
5       1360            46.9
6       195             17.1
7       567             87.7
8       1350            32.7
9       4343            123
10      533             35.4
11       77             6.1
12      377             22.6
13      391             37.8
14      231             20.3
15      404             40.6
16      601             64.6
17      820             35.0
18      1235            108
19      297             28.2
20      453             52.7
21      1529            72.9
22      117             15.0
Again we note that there is a consistent behavior of performance across the 22 queries. In the larger case, even query #11 is more than 12 times faster on Greenplum than on Postgres 8.4.2.

In order to address our 2nd question, we need to evaluate how each product behaved during the test. Was there IO during the test or did the case fit in RAM? Based on the fact that the Postgres instance required 130GB to store the data and the Greenplum instance required 24GB, we expect that it should fit in the 64GB of RAM on the server for Greenplum and not for Postgres, and that was the case.

For example, during the execution of queries in Postgres 8.4.2, a snapshot of the system taken using "vmstat 1" looked like this:
Code:
During Postgres 8.4.2 query execution:
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0    336 168672    852 65288272    0    0 286720     3 2692  909  4  1 95  0  0
 1  0    336 174692    852 65282852    0    0 237568 34617 2523 1025  5  2 92  1  0
 1  0    336 171680    852 65282632    0    0 229376 69188 2515 1186  4  2 94  0  0
 1  0    336 173084    852 65274048    0    0 180224 75221 1918  797  4  2 93  1  0
 1  1    336 180620    852 65267824    0    0 131072 123470 1747  812  3  1 90  5  0
 1  2    336 176164    852 65278692    0    0 163840 69716 1740  796  4  1 87  8  0
 1  1    336 172848    860 65283404    0    0 221184 86220 2649  906  4  1 90  5  0
 2  1    336 173168    860 65274104    0    0 204800 70500 2197  840  4  1 87  8  0
 1  1    336 183552    860 65272748    0    0 196608 92996 2254  997  3  1 90  6  0
 1  1    336 167860    860 65287160    0    0 229376 65640 2488  853  5  2 85  8  0
 1  1    336 177976    860 65278640    0    0 252800 69436 2735  947  4  1 88  6  0
 1  1    336 183576    860 65273068    0    0 263296 57808 2766  995  5  2 86  8  0
 2  1    336 172168    860 65282740    0    0 163840 96292 1899  700  3  1 91  5  0
 1  1    336 183052    860 65273104    0    0 221184 61740 2428  905  5  1 86  8  0
 2  0    336 168264    860 65288800    0    0 229376 16424 2368  767  4  1 93  2  0
 1  0    336 174012    860 65283632    0    0 294912    11 2607  940  6  2 92  0  0
 1  0    336 190484    860 65266516    0    0 278528     0 2540  888  4  1 95  0  0
 1  0    336 178204    860 65276140    0    0 262144 40540 2476 1017  6  2 92  1  0
We see that the system is running on one core (6% CPU) while doing read IO at a max of 295MB/s and reading/writing while performing spill operations for hashjoins and/or hashaggs and/or sorting.

A similar snapshot taken during Greenplum query execution with "top" shows a very different story:
Code:
During Greenplum SNE 3.3.4 query execution:

Tasks: 333 total,  18 running, 315 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65840952k total, 65587728k used,   253224k free,     1740k buffers
Swap: 68155752k total,      264k used, 68155488k free, 65189792k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12965 llonerga  20   0  198m 8792 4240 R  100  0.0   0:28.98 postgres
12967 llonerga  20   0  198m 8848 4240 R  100  0.0   0:29.16 postgres
12969 llonerga  20   0  198m 8736 4240 R  100  0.0   0:28.78 postgres
12971 llonerga  20   0  198m 8816 4240 R  100  0.0   0:29.18 postgres
12975 llonerga  20   0  198m 8800 4240 R  100  0.0   0:28.80 postgres
12977 llonerga  20   0  198m 8832 4240 R  100  0.0   0:29.18 postgres
12979 llonerga  20   0  198m 8808 4240 R  100  0.0   0:29.14 postgres
12981 llonerga  20   0  198m 8768 4240 R  100  0.0   0:28.56 postgres
12983 llonerga  20   0  198m 8804 4240 R  100  0.0   0:29.18 postgres
12985 llonerga  20   0  198m 8792 4240 R  100  0.0   0:29.02 postgres
12987 llonerga  20   0  198m 8768 4236 R  100  0.0   0:28.60 postgres
12989 llonerga  20   0  198m 8796 4212 R  100  0.0   0:29.12 postgres
12991 llonerga  20   0  198m 8696 4212 R  100  0.0   0:28.64 postgres
12993 llonerga  20   0  198m 8736 4212 R  100  0.0   0:28.84 postgres
12995 llonerga  20   0  198m 8740 4240 R  100  0.0   0:28.84 postgres
12973 llonerga  20   0  198m 8808 4240 R   98  0.0   0:29.16 postgres
    1 root      20   0  1064  408  340 S    0  0.0   0:05.46 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.02 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.02 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:03.10 ksoftirqd/0
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.04 migration/1
    6 root      15  -5     0    0    0 S    0  0.0   0:01.00 ksoftirqd/1
    7 root      RT  -5     0    0    0 S    0  0.0   0:00.08 migration/2
Here we see that the system is CPU bound running all 16 cores at 100%.

Conclusion #2: Greenplum fits the entire 100GB case in RAM and does not require disk IO.

Next Up: the 1000GB test

Last edited by llonergan; 02-01-2010 at 03:48 AM.
Reply With Quote
  #3  
Old 02-02-2010, 01:12 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default 1000GB case (part 1)

Results with 1000GB of data

The 1000GB scale factor is 10 times the size of the 100GB test. We expect that the Greenplum column compression should result in a 4:1 compression ratio, which means that the size of data should be 250GB, which is about 4 times the size of RAM. In the 100GB test, the entire dataset fit in RAM for Greenplum, so this time we'll expect to see some disk IO.

The Postgres 8.4.2 result for 1000GB would require approximately 3 days of computing time to run so we won't compare the two results - this time we'll just run Greenplum.

We would expect that loading the 1000GB using COPY would take 8 hours, so this time we'll test the theory from our last run that we should be able to get 3x the load speed if we maxed out the CPU. We can do that using the EXTERNAL TABLE feature of Greenplum to load the data.

For the 1000GB case we'll ask three questions:
1) How much faster is loading data using EXTERNAL TABLE? Do we get 3x the speed of COPY and max out the CPU?
2) How does the result scale with size? Does this test take 10 times as long? Worse? Better?
3) What does the IO profile look like? What is the bottleneck? Is it CPU or disk speed?

We'll run the case the same way as before, but we'll only use the column store with zlib6 compression.

As before, we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file.

NoteThe 1000GB scale factor has more than 2B rows, which causes the key values in the lineitem table to overflow the INTEGER datatype. We had to change the datatype of l_orderkey and o_orderkey to BIGINT to fit the case, which may influence performance of some of the queries.

First the load timings:
Code:
DB                      Storage         Compression     Size    Time
-----                   -----------     ------------    -----   ----
Postgres 8.4.2          Heap            None            1300GB (estimated)
Greenplum SNE 3.3.4     Column          zlib6           249GB   10272s
The size of the data is a 1/2% larger due to the use of BIGINT (8 bytes each) versus INTEGER for the ORDERS and LINEITEM keys.

The previous load time for 100GB was 2602 seconds, so we loaded 10 times the data in 3.9 times the time. If we compute load rates:
Code:
Load Type                      Loading Speed
----------                     -------------
COPY from stdin                138 GB/hour
EXTERNAL TABLE from gpfdist    350 GB/hour
So our load rate was 2.5 times faster with EXTERNAL TABLE than with COPY.

Technique: Loading data with EXTERNAL TABLE
The Greenplum EXTERNAL TABLE feature allows us to define network data sources as tables that we can query. This is commonly used for data loading by either using a "CREATE TABLE AS SELECT ..." or "INSERT INTO foo (SELECT ...)" where the SELECT is performed on the EXTERNAL TABLE. External tables are currently read only, but in the next release of Greenplum they will be writable, so we could do "INSERT INTO foo_external (SELECT ...)" to export data.

We expect external tables to be faster at loading data because of the parallelism "pull" model used. When the SELECT is performed, all segments will pull data from the network data sources simultaneously and autonomously, without routing data through a master process.
Here is what one external table definition looked like here:
Code:
CREATE EXTERNAL TABLE LINEITEM_ext  ( LIKE LINEITEM )
  LOCATION ('gpfdist://localhost:8080/lineitem.tbl')
  FORMAT 'TEXT' (DELIMITER '|');
Tip You can use wildcards to specify filenames for LOCATION, like "*january*.tbl'

Note that the "LOCATION" specified here is a "gpfdist" URI. gpfdist is a special web server that Greenplum provides for use with external tables - it is special because it enables parallel access of the source files by clients, in this case our SELECT statements. By using gpfdist, we allow a single file, in this case 'lineitem.tbl' to be read in chunks by many parallel readers, effectively dividing it into parallel units of work.

The process works like this during the SELECT:
- each segment launches a slice that connects to the network source (the "LOCATION")
- the slices each request data from the gpfdist server
- gpfdist provides chunks of the file it's working on, a set of rows close to 32KB for each chunk
- the slices work on reformatting the input data, check for errors, etc
- the resulting rows are sent to the next slice of the query
- request another chunk until done

The actual "load" statement looks like this:
Code:
psql -c "INSERT INTO lineitem (SELECT * FROM lineitem_ext)"
Tip CREATE TABLE AS SELECT or CTAS is the fastest way to create a new table.

Did we saturate the CPU as we had hoped? We can look at the CPU time during loading:

Code:
A snapshot of "top" During loading of the customer table:

top - 17:04:28 up 10 days, 18:37,  1 user,  load average: 24.63, 22.52, 17.10
Tasks: 342 total,  27 running, 313 sleeping,   0 stopped,   2 zombie
Cpu(s): 80.2%us,  5.5%sy,  0.0%ni, 12.8%id,  0.0%wa,  0.0%hi,  1.4%si,  0.0%st
Mem:  65840952k total, 65667016k used,   173936k free,     1020k buffers
Swap: 68155752k total,    76804k used, 68078948k free, 65210868k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1977 llonerga  20   0  197m 7700 4000 R   66  0.0   1:10.72 postgres
 1971 llonerga  20   0  197m 7864 4000 R   66  0.0   1:08.04 postgres
 1981 llonerga  20   0  197m 7780 4000 R   65  0.0   1:10.12 postgres
 1969 llonerga  20   0  197m 7692 4000 R   64  0.0   1:09.17 postgres
 1975 llonerga  20   0  197m 7784 3996 R   64  0.0   1:08.88 postgres
 1973 llonerga  20   0  197m 7796 3996 R   64  0.0   1:09.24 postgres
 1967 llonerga  20   0  261m 7856 3996 R   64  0.0   1:09.07 postgres
 1987 llonerga  20   0  197m 7800 4000 R   64  0.0   1:09.23 postgres
 1991 llonerga  20   0  197m 7884 3996 R   64  0.0   1:08.78 postgres
 1995 llonerga  20   0  197m 7768 3996 R   63  0.0   1:09.21 postgres
 1985 llonerga  20   0  197m 7700 3996 S   63  0.0   1:09.71 postgres
 1983 llonerga  20   0  197m 7768 3996 R   63  0.0   1:09.43 postgres
 1993 llonerga  20   0  197m 7836 4000 R   63  0.0   1:08.49 postgres
 1965 llonerga  20   0  197m 7816 4004 R   62  0.0   1:08.98 postgres
 1979 llonerga  20   0  197m 7828 4000 R   62  0.0   1:07.79 postgres
 1989 llonerga  20   0  197m 7700 4000 R   62  0.0   1:08.93 postgres
 2000 llonerga  20   0  196m 6172 3876 R   25  0.0   0:26.20 postgres
 2002 llonerga  20   0  196m 6164 3868 S   25  0.0   0:26.00 postgres
 2006 llonerga  20   0  196m 6168 3868 R   25  0.0   0:25.96 postgres
 2008 llonerga  20   0  196m 6136 3868 R   25  0.0   0:25.80 postgres
 2012 llonerga  20   0  196m 6128 3868 S   25  0.0   0:26.14 postgres
 2014 llonerga  20   0  196m 6164 3868 R   25  0.0   0:25.92 postgres
 2018 llonerga  20   0  196m 6160 3868 S   25  0.0   0:26.02 postgres
 2020 llonerga  20   0  196m 6164 3868 S   25  0.0   0:25.90 postgres
 1998 llonerga  20   0  196m 6160 3860 R   24  0.0   0:25.90 postgres
 2004 llonerga  20   0  196m 6172 3868 R   24  0.0   0:26.04 postgres
 2016 llonerga  20   0  196m 6116 3856 R   24  0.0   0:25.94 postgres

A snapshot of "vmstat" during loading of customer:

llonergan@killnz1:~/runTPCH> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
25  0  67916 178888   1008 65228616    0    0   232   191    2    1  1  0 99  0  0
23  0  67920 173584   1008 65232344    0    0 139264     0 18500 41966 83  6 11  0  0
25  0  67936 187188   1008 65217040    0    0 131104   661 19414 43079 83  7 10  0  0
12  0  67936 174348   1008 65233476    0    0 139264     0 17958 38337 81  7 12  0  0
25  0  67936 171104   1004 65234284    0    0 131100     0 18710 39946 82  6 12  0  0
27  0  67936 169756   1004 65234932    0    0 139264   352 19531 42158 83  6 11  0  0
16  0  67940 175792   1012 65227912    0    0 131076    20 18698 39394 83  7 10  0  0
23  0  67940 180236   1012 65223120    0    0 139264     0 18155 38698 80  7 14  0  0
29  0  67940 180380   1012 65223544    0    0 131236   160 19035 39491 83  6 10  0  0
17  0  67940 172128   1012 65229880    0    0 90112     0 11513 23432 50  4 43  3  0
28  0  67940 180144   1012 65221868    0    0 131072     0 18524 39483 81  6 12  0  0
28  0  67940 168632   1012 65234592    0    0 131072   120 17802 37591 79  6 15  0  0
26  0  67940 170460   1012 65232604    0    0 131072     0 18011 42572 81  6 13  0  0
29  0  67940 180084   1012 65221220    0    0 139264     0 19059 40902 83  7 10  0  0
38  0  67940 187664   1012 65214072    0    0 131072     0 18110 37989 81  7 11  0  0
It looks like we're about 13% idle on the CPU and the IO is not bottlenecked at 140MB/s, so we've still got some CPU to spare. If we work a little harder we could probably soak up the extra 13% and speed up the load time. We could try using CTAS to speed things up or if all else fails, run multiple load statements.

Conclusion #1: Loading data with EXTERNAL TABLE is 2.5 times faster than using COPY. We consumed 87% CPU, so could improve load time by up to 13% more.

Last edited by llonergan; 02-02-2010 at 01:23 PM.
Reply With Quote
  #4  
Old 02-02-2010, 01:12 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default 1000GB case (part 2)

Next are the query times - first reported as the total of all 22 queries:
Code:
DB                      Storage         Compression     Time
-----                   -----------     ------------    ----
Greenplum SNE 3.3.4     Column          zlib6           339 minutes
We can compare the times for the 1000GB case to the 100GB and 1GB cases now:
Code:
DB                      1GB Time    100GB Time     1000GB Time
-----                   --------    ----------     -----------
Greenplum SNE 3.3.4     0.237m      18.1m          339m
How did we scale? If we compute the query per hour (QpH) metric from this for the 22 queries, we have a QpH of 73 for the 100GB case and a QpH of 3.9 for the 1000GB case. The speed of the queries would be 10 times less for the larger case if it were linear, so we would expect 7.3 QpH in the linear case. We're running these queries approximately 50% slower for the 1000GB size than for the 100GB size.

Let's look at a side-by-side comparison on each query between 100GB times multiplied by 10 (linear ext) and the actual 1000GB query times::
Code:
Query   1000GB Time     linear ext
-----   -----------     ------------
1       1783            1780
2       134             111
3       543             288
4       310             206
5       1421            469
6       172             171
7       1951            877
8       890             327
9       1475            1230
10      429             354
11       74             61
12      301             226
13      376             378
14      192             203
15      434             406
16      5325            646
17      390             350
18      1783            1080
19      270             282
20      574             527
21      1350            729
22      150             150
Looks like query 16 is not performing so well at scale and maybe 18 and 21 should be looked at also.
Tip When results are not as expected, if you have a support contract, you can log a support ticket in our customer portal at support ticket

Since I've got a support contract (:-), I've sent the query EXPLAIN plan for q16 to be looked at.

I looked at the EXPLAIN plans for the queries that seemed to not scale well from 100GB to 1000GB and q16 was the only one with a "bad" query plan. The other query plans looked great. Something else caused the runtime to increase non-linearly on queries {3,5,7,8,18 and 21}.

Here is the theory I came up with:
- For the queries in {3,5,7,8,18 and 21} the columns involved in joins and aggs grow in size for the TPC-H generated data with an accompanying growth in range of values, increasing the chain depth of the hash tables. In order to achieve a linear scale as the size grows, the work_mem must increase proportionally with the table sizes growth.

Tip: Work_mem sets how much memory can be used by sorting, hashing and other operations in GP. It can be set globally, per user, per queue or per statement. The default is 32MB.

So if I'm right, then we should be increasing the amount of memory used by the DB as the size of data grows, or perhaps just set work_mem globally larger. We should try 10x the default, or 320MB.

Here are the results of re-running the queries using work_mem=320MB:
Code:
                                                   320MB Workmem
DB                      1GB Time    100GB Time     1000GB Time
-----                   --------    ----------     -----------
Greenplum SNE 3.3.4     0.237m      18.1m          202m
Now things look like we should expect - the runtime for the 1000GB case is 11 times longer than for the 100GB case, pretty close to a linear 10:1.

The new breakdown across queries is here:
Code:
        32MB Workmem    100GB        320MB Workmem
Query   1000GB Time     linear ext   1000GB Time
-----   -----------     ------------ -------------
1       1783            1780         1785
2       134             111          103
3       543             288          430
4       310             206          255
5       1421            469          718
6       172             171          171
7       1951            877          1339
8       890             327          350
9       1475            1230         1297
10      429             354          293
11       74             61           60
12      301             226          263
13      376             378          380
14      192             203          190
15      434             406          399
16      5325            646          80
17      390             350          366
18      1783            1080         1524
19      270             282          258
20      574             527          647
21      1350            729          1085
22      150             150          152
Most queries look to be scaling linearly with outliers {5,7,18,21}. Use of additional workmem could potentially further improve the results.

Note the "new" time for query 16 - that's the effect of the re-write posted below. Query 16 is now the second fastest query behind q11.

Also interesting is that we don't seem to be IO limited on the server, as there is no significant difference between fitting the case in RAM and having it use disk.

Conclusion #2: Query speed for the 1000GB case was 10% slower than the 100GB case with a 10x larger setting of default workmem.

Here is a snapshot of the system during query #1:
Code:
A snapshot using 'top' during execution of query 1:

top - 13:54:52 up 11 days, 15:28,  1 user,  load average: 16.01, 15.92, 12.93
Tasks: 339 total,  18 running, 321 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.8%us,  0.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65840952k total, 56743948k used,  9097004k free,     5712k buffers
Swap: 68155752k total,    85748k used, 68070004k free, 55062604k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4036 llonerga  20   0  272m  82m 4236 R  100  0.1  21:06.91 postgres
 4020 llonerga  20   0  272m  82m 4240 R  100  0.1  21:07.33 postgres
 4022 llonerga  20   0  271m  81m 4240 R  100  0.1  21:06.51 postgres
 4024 llonerga  20   0  272m  82m 4236 R  100  0.1  21:07.18 postgres
 4032 llonerga  20   0  272m  82m 4236 R  100  0.1  21:07.31 postgres
 4034 llonerga  20   0  272m  82m 4236 R  100  0.1  21:06.88 postgres
 4038 llonerga  20   0  272m  82m 4236 R  100  0.1  21:07.22 postgres
 4040 llonerga  20   0  271m  81m 4236 R  100  0.1  21:07.68 postgres
 4042 llonerga  20   0  272m  82m 4244 R  100  0.1  21:06.85 postgres
 4044 llonerga  20   0  273m  83m 4236 R  100  0.1  21:07.24 postgres
 4046 llonerga  20   0  271m  81m 4244 R  100  0.1  21:07.64 postgres
 4051 llonerga  20   0  270m  80m 4236 R  100  0.1  21:06.81 postgres
 4026 llonerga  20   0  271m  81m 4236 R  100  0.1  21:06.86 postgres
 4028 llonerga  20   0  270m  80m 4236 R  100  0.1  21:06.92 postgres
 4030 llonerga  20   0  272m  82m 4236 R  100  0.1  21:06.74 postgres
 4048 llonerga  20   0  273m  83m 4244 R  100  0.1  21:07.39 postgres
 4653 llonerga  20   0 17076 1472  944 R    1  0.0   0:00.04 top

A snapshot using "vmstat" during query 1:
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
16  0  85748 5546120   5852 58503052    0    0   574   395    2    0  5  0 95  0  0
16  0  85748 5536444   5852 58511268    0    0  8192     0 4111  618 100  0  0  0  0
16  0  85748 5495372   5852 58552220    0    0 40960     0 4201  617 100  0  0  0  0
16  0  85748 5462076   5852 58585012    0    0 32768     0 4211  668 100  0  0  0  0
16  0  85748 5403548   5852 58642488    0    0 57344     0 4289  605 100  0  0  0  0
16  0  85748 5378024   5852 58667220    0    0 24576     0 4175  604 100  0  0  0  0
16  0  85748 5344108   5860 58700000    0    0 32768    20 4177  603 100  0  0  0  0
16  0  85748 5335128   5860 58708272    0    0  8192     0 4136  613 100  0  0  0  0
16  0  85748 5293044   5860 58749216    0    0 40960     0 4219  615 100  0  0  0  0
16  0  85748 5267272   5860 58773948    0    0 24576     0 4159  624 100  0  0  0  0
16  0  85748 5241496   5860 58798628    0    0 24576     0 4178  608 100  0  0  0  0
16  0  85748 5240076   5860 58798628    0    0     0     0 4088  619 100  0  0  0  0
We see that the system is CPU bound running all 16 cores at 100% and IO is moderate at about 40-50 MB/s.

Conclusion #3: Greenplum is not IO bound during execution of the 1000GB case.

Last edited by llonergan; 04-17-2010 at 03:25 PM.
Reply With Quote
  #5  
Old 02-02-2010, 09:41 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default 1000GB case (part 3)

A follow-up on query 16:

The GP engineering team found that the NOT NULL constraint was not propagating correctly to allow the choice of hash join for the subquery in query 16.

With a rewritten version of query 16, like this:
Code:
select
        p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
from
        part,
        partsupp left join supplier on (ps_suppkey=s_suppkey and s_comment like '%Customer%Complaints%' )
where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#15'
        and p_type not like 'LARGE BRUSHED%'
        and p_size in (4, 41, 17, 9, 15, 2, 25, 34)
        and s_suppkey is null
group by
        p_brand,
        p_type,
        p_size
order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size;
The planner uses a hash join and we get a much better time!

So with the re-written query 16, we move from 5300 seconds to 98 seconds of runtime.

GP engineering is working on a fix that will make this behavior happen on the original query.
Reply With Quote
  #6  
Old 02-08-2010, 01:24 PM
Amber Amber is offline
Junior Member
 
Join Date: Oct 2009
Posts: 24
Default

Can you do the test against Infobright Community Edition 3.3.1 and IBM DB2 9.5, one for open source and the other for commercial
Reply With Quote
  #7  
Old 04-15-2010, 09:56 PM
llonergan llonergan is offline
Member
 
Join Date: Oct 2009
Posts: 46
Default

Update: The optimizer enhancement for "NOT IN" clauses that provides a hash based plan for TPC-H query 16 and queries like it is included in Greenplum 4.0.
Reply With Quote
  #8  
Old 04-22-2010, 12:31 PM
pjohnson pjohnson is offline
Junior Member
 
Join Date: Jan 2010
Location: UK
Posts: 8
Thumbs up Great article Luke!

It's great to see the GP SNE value prop laid out so clearly with real-world timings.

Running parallel DBMS operations within a single commodity SMP node puts clear blue water between GP and PG, as the results attest.

The fact that GP SNE uses the same code base as the GP MPP version, allowing 'scale out' when required, is also a very important feature. Crippleware it is not.

The pricing model isn't bad either ;-)
__________________
Paul Johnson
VLDB Solutions Ltd
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 11:04 PM.

Powered by: vBulletin® Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.


Copyright ©2009 Greenplum All rights reserved. Phone +1-650-286-8012