![]() |
|
#1
|
|||
|
|||
|
In this post we'll look at a set of basic DSS type queries typical of data warehouse / data mart applications in the late 1990s. There are 22 queries in this test case, and they can be run against different sizes of input data generated by a random data set generator. This benchmark is very similar to the queries used in the TPC-H benchmark, but this test is not TPC-H.
The intention of this test is to evaluate the calculation capabilities of GP SNE and compare them to other databases. In this first post, we'll run GP SNE and Postgres 8.4.2. About pre-aggregation, materialized views, etc and benchmarks Business users of analytic tools require the flexibility to ask ad-hoc questions of their data. Pre-calculation of answers in advance of the questions violates the principle that a system needs to flexible in this way. There are many available solutions that pre-aggregate data, including MOLAP cube servers (MSFT SSAS, Hyperion, SAP BW, etc) and there are many others that maintain aggregates or materialized views as part of their storage model (Infobright, MonetDB, etc). Query benchmarks are particularly susceptible to "gaming" through pre aggregation because the usage pattern is fixed a-priori. Most of the work in composing a valid benchmark for an analytic database tool involves avoiding situations where the test can be trivialized by recognizing the repeating query pattern and pre-computing results. I believe that the best way to estimate usability of the system is to dissallow the use of pre-computed aggregates and materialized views or "projections". In this and other benchmark tests, we will focus on evaluating the computational capability of the tools without the use of precomputed running sums, materialized views or projections. On tuning There are always ways to tune the database to achieve better performance. The tunings allowed in this test are those that apply to all of the queries. Tuning settings for individual queries is not allowed. We do attempt to find the appropriate global tuning parameters for each tool in the test. For query results, the fastest of 3 runs is used. If there is more than 10% variance in results between runs it is reported, otherwise one result is listed. The test - Postgres 8.4.2 and GP SNE The test machine We use the same machine for all tests: - Dell R710 server with 2 X5550 CPUs at 2.67GHz - 64GB of RAM - Dell md1120 storage unit with 24 disks - All filesystems use XFS, tuned for sequential IO The server is tested with gpcheckperf to determine the sequential IO, CPU and memory performance before the tests. Here are the results of gpcheckperf for reference: Code:
disk write avg time (sec): 160.47 disk write tot bytes: 134842155008 disk write tot bandwidth (MB/s): 801.37 disk read avg time (sec): 75.81 disk read tot bytes: 134842155008 disk read tot bandwidth (MB/s): 1696.29 stream tot bandwidth (MB/s): 4332.54 Postgres 8.4.2 was installed using the 64-bit installer from: http://www.enterprisedb.com/getfile.jsp?fileid=808 A single data directory was implemented on a RAID5 volume with 6 disks in a (5+1) configuration with a measured sequential scan performance of 425 MB/s. Greenplum SNE was installed using the 64-bit installer from: http://www.greenplum.com/database/server-downloads/ 16 data directories were used to host 16 segments. The 16 directories were located on 4 XFS filesystems. Each XFS filesystem was built on a (5+1) RAID5 configuration. 16 segments were used to match the number of hyperthreaded cores. Basic testing showed that using all 16 hyperthreaded cores was approximately 30% faster than using 8 real cores. The configuration file used to create the GP SNE instance is: Code:
ARRAY_NAME="MPP Test" MACHINE_LIST_FILE=hosts SEG_PREFIX=lal PORT_BASE=51000 declare -a DATA_DIRECTORY=( \ /data1/llonergan/A /data1/llonergan/B /data1/llonergan/C /data1/llonergan/D \ /data2/llonergan/A /data2/llonergan/B /data2/llonergan/C /data2/llonergan/D \ /data3/llonergan/A /data3/llonergan/B /data3/llonergan/C /data3/llonergan/D \ /data4/llonergan/A /data4/llonergan/B /data4/llonergan/C /data4/llonergan/D ) MASTER_HOSTNAME=killnz1 MASTER_DIRECTORY=/data1/llonergan MASTER_PORT=54321 IP_ALLOW=0.0.0.0/0 TRUSTED_SHELL=ssh CHECK_POINT_SEGMENTS=8 ENCODING=UNICODE For each size of case tested, we created a new database inside each instance and loaded the data using: 'cat file | psql -c "COPY ..."' Results with 1GB of data The 1GB scale factor is a good test of the computational capability of the database because it fits entirely in memory on most machines. Since there is no IO to/from the disk, theres no need to compare how effective the database engines are at managing IO, it's purely a test of how good the planning and execution of the computational work is. Here we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file. First the load timings: Code:
DB Storage Compression Size Time ----- ----------- ------------ ----- ---- Postgres 8.4.2 Heap None 1.3GB 67s Greenplum SNE 3.3.4 Heap None 1.3GB 25s Greenplum SNE 3.3.4 Column None 1.1GB 23s Greenplum SNE 3.3.4 Column zlib6 .25GB 22s Greenplum SNE 3.3.4 Column quicklz .38GB 18s We note that the load was dominated by CPU time as shown by watching vmstat during the loading and observing that CPU was maxed while write IO was moderate at about 20-40 MB/s, far below the 250-900 MB/s capability of the filesystems for both the Postgres instances and the Greenplum instances. Increasing IO bandwidth or reducing WAL logging overhead would not improve load times. Also note that Greenplum SNE achieved a 4.4:1 compression for column store with zlib6. Next are the query times - first reported as the total of all 22 queries: Code:
DB Storage Compression Time ----- ----------- ------------ ---- Postgres 8.4.2 Heap None 187.2s Greenplum SNE 3.3.4 Heap None 14.3s Greenplum SNE 3.3.4 Column None 12.7s Greenplum SNE 3.3.4 Column zlib6 14.2s Greenplum SNE 3.3.4 Column quicklz 13.7s Greenplum SNE is between 13 and 14 times faster than Postgres 8.4.2 in all configurations. Now we look at a side-by-side comparison on each query: Code:
GP SNE 3.3.4 Query PG 8.4.2 Heap Column ----- ----------- ------------ ---------- 1 49.7 2.1 2.1 2 1.0 .5 .74 3 3.7 .3 .27 4 5.9 .3 .22 5 2.9 .9 .35 6 2.6 .2 .16 7 3.9 .69 .74 8 2.5 .57 .7 9 44.4 1.3 .9 10 5.4 .4 .65 11 .55 .4 .65 12 5.0 .3 .2 13 2.7 .5 .74 14 2.8 .4 .2 15 5.4 .6 .3 16 4.4 1. .8 17 7.0 .65 .4 18 15.7 1. 1.1 19 4.2 .24 .2 20 4.1 .5 .45 21 12.0 1. .59 22 1.2 .3 .2 There is only one case where Postgres was close to the Greenplum query time and that was query 11, which is a join of three tables, partsupp, supplier and nation. I have the query plans from both, but they are too large to post here - I'll make a continuation in the next thread with the EXPLAIN plans on request. Next Up: the 100GB test Last edited by llonergan; 01-31-2010 at 10:41 AM. |
|
#2
|
|||
|
|||
|
Results with 100GB of data
The 100GB scale factor is 100 times the size of the 1GB test. This time we're running a case larger than the available memory (64GB), so this is a test of how much the DB helps us to run larger cases on small hardware. Here we'll look for a couple of things: 1) How does the result scale with size? Does this test take 100 times as long? Worse? Better? 2) How does the 100GB sit on the machine? Does it fit in memory or does it require disk? We'll run the case the same way as before, but we'll only use the column store with zlib6 compression for Greenplum, as that showed the best results previously and should fit this case in RAM. As before, we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file. First the load timings: Code:
DB Storage Compression Size Time ----- ----------- ------------ ----- ---- Postgres 8.4.2 Heap None 130GB 8373s Greenplum SNE 3.3.4 Column zlib6 24GB 2602s As before, the IO was not the bottleneck on Postgres (or GP). For Postgres, it hovered between 20 and 60MB/s writes. Greenplum SNE achieved a 4.4:1 compression for column store with zlib6. Next are the query times - first reported as the total of all 22 queries: Code:
DB Storage Compression Time ----- ----------- ------------ ---- Postgres 8.4.2 Heap None 348 minutes Greenplum SNE 3.3.4 Column zlib6 18.1 minutes Greenplum SNE is 19 times faster than Postgres 8.4.2 for the 100GB case. We can compare the times for the 1GB case to the 100GB case now: Code:
DB 1GB Time 100GB Time Ratio 100GB/1GB ----- -------- ---------- --------------- Postgres 8.4.2 3.12m 348m 112:1 Greenplum SNE 3.3.4 0.237m 18.1m 76:1 Now we look at a side-by-side comparison on each query: Code:
Query PG 8.4.2 GP SNE 3.3.4 ----- ----------- ------------ 1 3357 178 2 1397 11.1 3 692 28.8 4 536 20.6 5 1360 46.9 6 195 17.1 7 567 87.7 8 1350 32.7 9 4343 123 10 533 35.4 11 77 6.1 12 377 22.6 13 391 37.8 14 231 20.3 15 404 40.6 16 601 64.6 17 820 35.0 18 1235 108 19 297 28.2 20 453 52.7 21 1529 72.9 22 117 15.0 In order to address our 2nd question, we need to evaluate how each product behaved during the test. Was there IO during the test or did the case fit in RAM? Based on the fact that the Postgres instance required 130GB to store the data and the Greenplum instance required 24GB, we expect that it should fit in the 64GB of RAM on the server for Greenplum and not for Postgres, and that was the case. For example, during the execution of queries in Postgres 8.4.2, a snapshot of the system taken using "vmstat 1" looked like this: Code:
During Postgres 8.4.2 query execution: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 336 168672 852 65288272 0 0 286720 3 2692 909 4 1 95 0 0 1 0 336 174692 852 65282852 0 0 237568 34617 2523 1025 5 2 92 1 0 1 0 336 171680 852 65282632 0 0 229376 69188 2515 1186 4 2 94 0 0 1 0 336 173084 852 65274048 0 0 180224 75221 1918 797 4 2 93 1 0 1 1 336 180620 852 65267824 0 0 131072 123470 1747 812 3 1 90 5 0 1 2 336 176164 852 65278692 0 0 163840 69716 1740 796 4 1 87 8 0 1 1 336 172848 860 65283404 0 0 221184 86220 2649 906 4 1 90 5 0 2 1 336 173168 860 65274104 0 0 204800 70500 2197 840 4 1 87 8 0 1 1 336 183552 860 65272748 0 0 196608 92996 2254 997 3 1 90 6 0 1 1 336 167860 860 65287160 0 0 229376 65640 2488 853 5 2 85 8 0 1 1 336 177976 860 65278640 0 0 252800 69436 2735 947 4 1 88 6 0 1 1 336 183576 860 65273068 0 0 263296 57808 2766 995 5 2 86 8 0 2 1 336 172168 860 65282740 0 0 163840 96292 1899 700 3 1 91 5 0 1 1 336 183052 860 65273104 0 0 221184 61740 2428 905 5 1 86 8 0 2 0 336 168264 860 65288800 0 0 229376 16424 2368 767 4 1 93 2 0 1 0 336 174012 860 65283632 0 0 294912 11 2607 940 6 2 92 0 0 1 0 336 190484 860 65266516 0 0 278528 0 2540 888 4 1 95 0 0 1 0 336 178204 860 65276140 0 0 262144 40540 2476 1017 6 2 92 1 0 A similar snapshot taken during Greenplum query execution with "top" shows a very different story: Code:
During Greenplum SNE 3.3.4 query execution:
Tasks: 333 total, 18 running, 315 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65840952k total, 65587728k used, 253224k free, 1740k buffers
Swap: 68155752k total, 264k used, 68155488k free, 65189792k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12965 llonerga 20 0 198m 8792 4240 R 100 0.0 0:28.98 postgres
12967 llonerga 20 0 198m 8848 4240 R 100 0.0 0:29.16 postgres
12969 llonerga 20 0 198m 8736 4240 R 100 0.0 0:28.78 postgres
12971 llonerga 20 0 198m 8816 4240 R 100 0.0 0:29.18 postgres
12975 llonerga 20 0 198m 8800 4240 R 100 0.0 0:28.80 postgres
12977 llonerga 20 0 198m 8832 4240 R 100 0.0 0:29.18 postgres
12979 llonerga 20 0 198m 8808 4240 R 100 0.0 0:29.14 postgres
12981 llonerga 20 0 198m 8768 4240 R 100 0.0 0:28.56 postgres
12983 llonerga 20 0 198m 8804 4240 R 100 0.0 0:29.18 postgres
12985 llonerga 20 0 198m 8792 4240 R 100 0.0 0:29.02 postgres
12987 llonerga 20 0 198m 8768 4236 R 100 0.0 0:28.60 postgres
12989 llonerga 20 0 198m 8796 4212 R 100 0.0 0:29.12 postgres
12991 llonerga 20 0 198m 8696 4212 R 100 0.0 0:28.64 postgres
12993 llonerga 20 0 198m 8736 4212 R 100 0.0 0:28.84 postgres
12995 llonerga 20 0 198m 8740 4240 R 100 0.0 0:28.84 postgres
12973 llonerga 20 0 198m 8808 4240 R 98 0.0 0:29.16 postgres
1 root 20 0 1064 408 340 S 0 0.0 0:05.46 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.02 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.02 migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:03.10 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.04 migration/1
6 root 15 -5 0 0 0 S 0 0.0 0:01.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0 0.0 0:00.08 migration/2
Conclusion #2: Greenplum fits the entire 100GB case in RAM and does not require disk IO. Next Up: the 1000GB test Last edited by llonergan; 02-01-2010 at 03:48 AM. |
|
#3
|
|||
|
|||
|
Results with 1000GB of data
The 1000GB scale factor is 10 times the size of the 100GB test. We expect that the Greenplum column compression should result in a 4:1 compression ratio, which means that the size of data should be 250GB, which is about 4 times the size of RAM. In the 100GB test, the entire dataset fit in RAM for Greenplum, so this time we'll expect to see some disk IO. The Postgres 8.4.2 result for 1000GB would require approximately 3 days of computing time to run so we won't compare the two results - this time we'll just run Greenplum. We would expect that loading the 1000GB using COPY would take 8 hours, so this time we'll test the theory from our last run that we should be able to get 3x the load speed if we maxed out the CPU. We can do that using the EXTERNAL TABLE feature of Greenplum to load the data. For the 1000GB case we'll ask three questions: 1) How much faster is loading data using EXTERNAL TABLE? Do we get 3x the speed of COPY and max out the CPU? 2) How does the result scale with size? Does this test take 10 times as long? Worse? Better? 3) What does the IO profile look like? What is the bottleneck? Is it CPU or disk speed? We'll run the case the same way as before, but we'll only use the column store with zlib6 compression. As before, we'll compare the databases by loading the data and computing planner statistics, then we'll run 22 queries sequentially all in a single run file. NoteThe 1000GB scale factor has more than 2B rows, which causes the key values in the lineitem table to overflow the INTEGER datatype. We had to change the datatype of l_orderkey and o_orderkey to BIGINT to fit the case, which may influence performance of some of the queries. First the load timings: Code:
DB Storage Compression Size Time ----- ----------- ------------ ----- ---- Postgres 8.4.2 Heap None 1300GB (estimated) Greenplum SNE 3.3.4 Column zlib6 249GB 10272s The previous load time for 100GB was 2602 seconds, so we loaded 10 times the data in 3.9 times the time. If we compute load rates: Code:
Load Type Loading Speed ---------- ------------- COPY from stdin 138 GB/hour EXTERNAL TABLE from gpfdist 350 GB/hour Technique: Loading data with EXTERNAL TABLE The Greenplum EXTERNAL TABLE feature allows us to define network data sources as tables that we can query. This is commonly used for data loading by either using a "CREATE TABLE AS SELECT ..." or "INSERT INTO foo (SELECT ...)" where the SELECT is performed on the EXTERNAL TABLE. External tables are currently read only, but in the next release of Greenplum they will be writable, so we could do "INSERT INTO foo_external (SELECT ...)" to export data. We expect external tables to be faster at loading data because of the parallelism "pull" model used. When the SELECT is performed, all segments will pull data from the network data sources simultaneously and autonomously, without routing data through a master process. Here is what one external table definition looked like here: Code:
CREATE EXTERNAL TABLE LINEITEM_ext ( LIKE LINEITEM )
LOCATION ('gpfdist://localhost:8080/lineitem.tbl')
FORMAT 'TEXT' (DELIMITER '|');
Note that the "LOCATION" specified here is a "gpfdist" URI. gpfdist is a special web server that Greenplum provides for use with external tables - it is special because it enables parallel access of the source files by clients, in this case our SELECT statements. By using gpfdist, we allow a single file, in this case 'lineitem.tbl' to be read in chunks by many parallel readers, effectively dividing it into parallel units of work. The process works like this during the SELECT: - each segment launches a slice that connects to the network source (the "LOCATION") - the slices each request data from the gpfdist server - gpfdist provides chunks of the file it's working on, a set of rows close to 32KB for each chunk - the slices work on reformatting the input data, check for errors, etc - the resulting rows are sent to the next slice of the query - request another chunk until done The actual "load" statement looks like this: Code:
psql -c "INSERT INTO lineitem (SELECT * FROM lineitem_ext)" Did we saturate the CPU as we had hoped? We can look at the CPU time during loading: Code:
A snapshot of "top" During loading of the customer table: top - 17:04:28 up 10 days, 18:37, 1 user, load average: 24.63, 22.52, 17.10 Tasks: 342 total, 27 running, 313 sleeping, 0 stopped, 2 zombie Cpu(s): 80.2%us, 5.5%sy, 0.0%ni, 12.8%id, 0.0%wa, 0.0%hi, 1.4%si, 0.0%st Mem: 65840952k total, 65667016k used, 173936k free, 1020k buffers Swap: 68155752k total, 76804k used, 68078948k free, 65210868k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1977 llonerga 20 0 197m 7700 4000 R 66 0.0 1:10.72 postgres 1971 llonerga 20 0 197m 7864 4000 R 66 0.0 1:08.04 postgres 1981 llonerga 20 0 197m 7780 4000 R 65 0.0 1:10.12 postgres 1969 llonerga 20 0 197m 7692 4000 R 64 0.0 1:09.17 postgres 1975 llonerga 20 0 197m 7784 3996 R 64 0.0 1:08.88 postgres 1973 llonerga 20 0 197m 7796 3996 R 64 0.0 1:09.24 postgres 1967 llonerga 20 0 261m 7856 3996 R 64 0.0 1:09.07 postgres 1987 llonerga 20 0 197m 7800 4000 R 64 0.0 1:09.23 postgres 1991 llonerga 20 0 197m 7884 3996 R 64 0.0 1:08.78 postgres 1995 llonerga 20 0 197m 7768 3996 R 63 0.0 1:09.21 postgres 1985 llonerga 20 0 197m 7700 3996 S 63 0.0 1:09.71 postgres 1983 llonerga 20 0 197m 7768 3996 R 63 0.0 1:09.43 postgres 1993 llonerga 20 0 197m 7836 4000 R 63 0.0 1:08.49 postgres 1965 llonerga 20 0 197m 7816 4004 R 62 0.0 1:08.98 postgres 1979 llonerga 20 0 197m 7828 4000 R 62 0.0 1:07.79 postgres 1989 llonerga 20 0 197m 7700 4000 R 62 0.0 1:08.93 postgres 2000 llonerga 20 0 196m 6172 3876 R 25 0.0 0:26.20 postgres 2002 llonerga 20 0 196m 6164 3868 S 25 0.0 0:26.00 postgres 2006 llonerga 20 0 196m 6168 3868 R 25 0.0 0:25.96 postgres 2008 llonerga 20 0 196m 6136 3868 R 25 0.0 0:25.80 postgres 2012 llonerga 20 0 196m 6128 3868 S 25 0.0 0:26.14 postgres 2014 llonerga 20 0 196m 6164 3868 R 25 0.0 0:25.92 postgres 2018 llonerga 20 0 196m 6160 3868 S 25 0.0 0:26.02 postgres 2020 llonerga 20 0 196m 6164 3868 S 25 0.0 0:25.90 postgres 1998 llonerga 20 0 196m 6160 3860 R 24 0.0 0:25.90 postgres 2004 llonerga 20 0 196m 6172 3868 R 24 0.0 0:26.04 postgres 2016 llonerga 20 0 196m 6116 3856 R 24 0.0 0:25.94 postgres A snapshot of "vmstat" during loading of customer: llonergan@killnz1:~/runTPCH> vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 25 0 67916 178888 1008 65228616 0 0 232 191 2 1 1 0 99 0 0 23 0 67920 173584 1008 65232344 0 0 139264 0 18500 41966 83 6 11 0 0 25 0 67936 187188 1008 65217040 0 0 131104 661 19414 43079 83 7 10 0 0 12 0 67936 174348 1008 65233476 0 0 139264 0 17958 38337 81 7 12 0 0 25 0 67936 171104 1004 65234284 0 0 131100 0 18710 39946 82 6 12 0 0 27 0 67936 169756 1004 65234932 0 0 139264 352 19531 42158 83 6 11 0 0 16 0 67940 175792 1012 65227912 0 0 131076 20 18698 39394 83 7 10 0 0 23 0 67940 180236 1012 65223120 0 0 139264 0 18155 38698 80 7 14 0 0 29 0 67940 180380 1012 65223544 0 0 131236 160 19035 39491 83 6 10 0 0 17 0 67940 172128 1012 65229880 0 0 90112 0 11513 23432 50 4 43 3 0 28 0 67940 180144 1012 65221868 0 0 131072 0 18524 39483 81 6 12 0 0 28 0 67940 168632 1012 65234592 0 0 131072 120 17802 37591 79 6 15 0 0 26 0 67940 170460 1012 65232604 0 0 131072 0 18011 42572 81 6 13 0 0 29 0 67940 180084 1012 65221220 0 0 139264 0 19059 40902 83 7 10 0 0 38 0 67940 187664 1012 65214072 0 0 131072 0 18110 37989 81 7 11 0 0 Conclusion #1: Loading data with EXTERNAL TABLE is 2.5 times faster than using COPY. We consumed 87% CPU, so could improve load time by up to 13% more. Last edited by llonergan; 02-02-2010 at 01:23 PM. |
|
#4
|
|||
|
|||
|
Next are the query times - first reported as the total of all 22 queries:
Code:
DB Storage Compression Time ----- ----------- ------------ ---- Greenplum SNE 3.3.4 Column zlib6 339 minutes Code:
DB 1GB Time 100GB Time 1000GB Time ----- -------- ---------- ----------- Greenplum SNE 3.3.4 0.237m 18.1m 339m Let's look at a side-by-side comparison on each query between 100GB times multiplied by 10 (linear ext) and the actual 1000GB query times:: Code:
Query 1000GB Time linear ext ----- ----------- ------------ 1 1783 1780 2 134 111 3 543 288 4 310 206 5 1421 469 6 172 171 7 1951 877 8 890 327 9 1475 1230 10 429 354 11 74 61 12 301 226 13 376 378 14 192 203 15 434 406 16 5325 646 17 390 350 18 1783 1080 19 270 282 20 574 527 21 1350 729 22 150 150 Tip When results are not as expected, if you have a support contract, you can log a support ticket in our customer portal at support ticket Since I've got a support contract (:-), I've sent the query EXPLAIN plan for q16 to be looked at. I looked at the EXPLAIN plans for the queries that seemed to not scale well from 100GB to 1000GB and q16 was the only one with a "bad" query plan. The other query plans looked great. Something else caused the runtime to increase non-linearly on queries {3,5,7,8,18 and 21}. Here is the theory I came up with: - For the queries in {3,5,7,8,18 and 21} the columns involved in joins and aggs grow in size for the TPC-H generated data with an accompanying growth in range of values, increasing the chain depth of the hash tables. In order to achieve a linear scale as the size grows, the work_mem must increase proportionally with the table sizes growth. Tip: Work_mem sets how much memory can be used by sorting, hashing and other operations in GP. It can be set globally, per user, per queue or per statement. The default is 32MB. So if I'm right, then we should be increasing the amount of memory used by the DB as the size of data grows, or perhaps just set work_mem globally larger. We should try 10x the default, or 320MB. Here are the results of re-running the queries using work_mem=320MB: Code:
320MB Workmem DB 1GB Time 100GB Time 1000GB Time ----- -------- ---------- ----------- Greenplum SNE 3.3.4 0.237m 18.1m 202m The new breakdown across queries is here: Code:
32MB Workmem 100GB 320MB Workmem Query 1000GB Time linear ext 1000GB Time ----- ----------- ------------ ------------- 1 1783 1780 1785 2 134 111 103 3 543 288 430 4 310 206 255 5 1421 469 718 6 172 171 171 7 1951 877 1339 8 890 327 350 9 1475 1230 1297 10 429 354 293 11 74 61 60 12 301 226 263 13 376 378 380 14 192 203 190 15 434 406 399 16 5325 646 80 17 390 350 366 18 1783 1080 1524 19 270 282 258 20 574 527 647 21 1350 729 1085 22 150 150 152 Note the "new" time for query 16 - that's the effect of the re-write posted below. Query 16 is now the second fastest query behind q11. Also interesting is that we don't seem to be IO limited on the server, as there is no significant difference between fitting the case in RAM and having it use disk. Conclusion #2: Query speed for the 1000GB case was 10% slower than the 100GB case with a 10x larger setting of default workmem. Here is a snapshot of the system during query #1: Code:
A snapshot using 'top' during execution of query 1: top - 13:54:52 up 11 days, 15:28, 1 user, load average: 16.01, 15.92, 12.93 Tasks: 339 total, 18 running, 321 sleeping, 0 stopped, 0 zombie Cpu(s): 99.8%us, 0.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 65840952k total, 56743948k used, 9097004k free, 5712k buffers Swap: 68155752k total, 85748k used, 68070004k free, 55062604k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4036 llonerga 20 0 272m 82m 4236 R 100 0.1 21:06.91 postgres 4020 llonerga 20 0 272m 82m 4240 R 100 0.1 21:07.33 postgres 4022 llonerga 20 0 271m 81m 4240 R 100 0.1 21:06.51 postgres 4024 llonerga 20 0 272m 82m 4236 R 100 0.1 21:07.18 postgres 4032 llonerga 20 0 272m 82m 4236 R 100 0.1 21:07.31 postgres 4034 llonerga 20 0 272m 82m 4236 R 100 0.1 21:06.88 postgres 4038 llonerga 20 0 272m 82m 4236 R 100 0.1 21:07.22 postgres 4040 llonerga 20 0 271m 81m 4236 R 100 0.1 21:07.68 postgres 4042 llonerga 20 0 272m 82m 4244 R 100 0.1 21:06.85 postgres 4044 llonerga 20 0 273m 83m 4236 R 100 0.1 21:07.24 postgres 4046 llonerga 20 0 271m 81m 4244 R 100 0.1 21:07.64 postgres 4051 llonerga 20 0 270m 80m 4236 R 100 0.1 21:06.81 postgres 4026 llonerga 20 0 271m 81m 4236 R 100 0.1 21:06.86 postgres 4028 llonerga 20 0 270m 80m 4236 R 100 0.1 21:06.92 postgres 4030 llonerga 20 0 272m 82m 4236 R 100 0.1 21:06.74 postgres 4048 llonerga 20 0 273m 83m 4244 R 100 0.1 21:07.39 postgres 4653 llonerga 20 0 17076 1472 944 R 1 0.0 0:00.04 top A snapshot using "vmstat" during query 1: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 16 0 85748 5546120 5852 58503052 0 0 574 395 2 0 5 0 95 0 0 16 0 85748 5536444 5852 58511268 0 0 8192 0 4111 618 100 0 0 0 0 16 0 85748 5495372 5852 58552220 0 0 40960 0 4201 617 100 0 0 0 0 16 0 85748 5462076 5852 58585012 0 0 32768 0 4211 668 100 0 0 0 0 16 0 85748 5403548 5852 58642488 0 0 57344 0 4289 605 100 0 0 0 0 16 0 85748 5378024 5852 58667220 0 0 24576 0 4175 604 100 0 0 0 0 16 0 85748 5344108 5860 58700000 0 0 32768 20 4177 603 100 0 0 0 0 16 0 85748 5335128 5860 58708272 0 0 8192 0 4136 613 100 0 0 0 0 16 0 85748 5293044 5860 58749216 0 0 40960 0 4219 615 100 0 0 0 0 16 0 85748 5267272 5860 58773948 0 0 24576 0 4159 624 100 0 0 0 0 16 0 85748 5241496 5860 58798628 0 0 24576 0 4178 608 100 0 0 0 0 16 0 85748 5240076 5860 58798628 0 0 0 0 4088 619 100 0 0 0 0 Conclusion #3: Greenplum is not IO bound during execution of the 1000GB case. Last edited by llonergan; 04-17-2010 at 03:25 PM. |
|
#5
|
|||
|
|||
|
A follow-up on query 16:
The GP engineering team found that the NOT NULL constraint was not propagating correctly to allow the choice of hash join for the subquery in query 16. With a rewritten version of query 16, like this: Code:
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
part,
partsupp left join supplier on (ps_suppkey=s_suppkey and s_comment like '%Customer%Complaints%' )
where
p_partkey = ps_partkey
and p_brand <> 'Brand#15'
and p_type not like 'LARGE BRUSHED%'
and p_size in (4, 41, 17, 9, 15, 2, 25, 34)
and s_suppkey is null
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
So with the re-written query 16, we move from 5300 seconds to 98 seconds of runtime. GP engineering is working on a fix that will make this behavior happen on the original query. |
|
#6
|
|||
|
|||
|
Can you do the test against Infobright Community Edition 3.3.1 and IBM DB2 9.5, one for open source and the other for commercial
|
|
#7
|
|||
|
|||
|
Update: The optimizer enhancement for "NOT IN" clauses that provides a hash based plan for TPC-H query 16 and queries like it is included in Greenplum 4.0.
|
|
#8
|
|||
|
|||
|
It's great to see the GP SNE value prop laid out so clearly with real-world timings.
Running parallel DBMS operations within a single commodity SMP node puts clear blue water between GP and PG, as the results attest. The fact that GP SNE uses the same code base as the GP MPP version, allowing 'scale out' when required, is also a very important feature. Crippleware it is not. The pricing model isn't bad either ;-) |
![]() |
| Thread Tools | |
| Display Modes | |
|
|