CC20 Artifact - Automatic Fusion

Main Authors: Hanfeng Chen, Alexander Krolik, Bettina Kemme, Clark Verbrugge, Laurie Hendren
Format: info dataset eJournal
Bahasa: eng
Terbitan: , 2020
Subjects:
IR
Online Access: https://zenodo.org/record/3608383
Daftar Isi:
  • 1. Getting started The title of our paper submitted to CC20 is Improving Database Query Performance with Automatic Fusion This repository is created for showing the reproducibility of our experiments in this paper. We provide the details of scripts and original data used in the experiments. There are mainly two systems: HorsePower and RDBMS MonetDB. We supply step-by-step instructions to configure and deploy both systems in the experiments. On this page, you will see: how to run experiments (Section 2); and the results used in the paper (Section 3); 2. Experiments All experiments were run on a server called sable-intel equipped with Ubuntu 16.04.6 LTS (64-bit) 4 Intel Xeon E7-4850 2.00 GHz total 40 cores with 80 threads 128GB RAM Docker setup Download the docker image: cc20-docker.tar (About 13GB) docker load < cc20-docker.tar Generate a named container (then exit) docker run --hostname sableintel -it --name=container-cc20 wukefe/cc20-docker exit Then, you can run the container docker start -ai container-cc20 Open a new terminal to access the container (optional) docker exec -it container-cc20 /bin/bash Introduction to MonetDB Work directory for MonetDB /home/hanfeng/cc20/monetdb Start MonetDB (use all available threads) ./run.sh start Login MonetDB using its client tool, mclient mclient -d tpch1 ## ... MonetDB version v11.33.3 (Apr2019) sql> SELECT 'Hello world'; +-------------+ | L2 | +=============+ | Hello world | +-------------+ 1 tuple Show the list of tables in the current database sql> \d TABLE sys.customer TABLE sys.lineitem TABLE sys.nation TABLE sys.orders TABLE sys.part TABLE sys.partsupp TABLE sys.region TABLE sys.supplier Leave the session sql> \q Stop MonetDB before we can continue our experiments ./run.sh stop Reference: How to install MonetDB and the introduction of server and client programs. Run MonetDB with TPC-H queries MonetDB: server mode Invoke MonetDB with a specific number of threads (e.g. 1) mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=1 Open a new terminal docker exec -it container-cc20 /bin/bash cd cc20/monetdb Note: Type \q to exit the server mode. Run with a specific number of threads (Two terminals required) 1 thread ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=1 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_1.log" 2 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=2 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_2.log" 4 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=4 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_4.log" 8 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=8 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_8.log" 16 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=16 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_16.log" 32 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=32 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_32.log" 64 threads ## terminal 1 mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=64 ## terminal 2 (time ./runtest | mclient -d tpch1) &> "log/log_thread_64.log" Post data processing - MonetDB Fetch average execution time (ms) grep -A 3 avg_query log/log_thread_1.log | python cut.py 699.834133333 // q1 85.9178666667 // q4 65.0172 // q6 101.730666667 // q12 58.212 // q14 60.1138666667 // q16 248.926466667 // q19 77.6482 // q22 grep -A 3 avg_query log/log_thread_2.log | python cut.py grep -A 3 avg_query log/log_thread_4.log | python cut.py grep -A 3 avg_query log/log_thread_8.log | python cut.py grep -A 3 avg_query log/log_thread_16.log | python cut.py grep -A 3 avg_query log/log_thread_32.log | python cut.py grep -A 3 avg_query log/log_thread_64.log | python cut.py Note: The above numbers can be copied to an Excel file for further analysis before plotting figures. Details can be found in Section 3. Run with HorseIR The HorsePower project can be found on GitHub. In the docker image, it has been placed in /home/hanfeng/cc20/horse. https://github.com/Sable/HorsePower Execution time We then run each query 15 times to get the average execution time (ms). (cd /home/hanfeng/cc20/horse/ && time ./run_all.sh) The script run_all.sh runs over three versions of generated C code based on different levels of optimizations. - naive : no optimization - opt1 : with optimizations - opt2 : with automatic fusion In each version, it first compiles its C code and runs the generated binary with a different number of threads (i.e. 1/2/4/8/16/32/64). Each run computes a query 15 times and returns the average. As a result, all output is saved into a log file, for example, log/naive/log_q6.log contains the result of query 6 in the naive version with all different number of threads. Log file structures log/naive/*.txt log/opt1/*.txt log/opt2/*.txt Fetch a brief summary of execution time from a log file cat log/naive/log_q6.txt | grep -E 'Run with 15 times' q06>> Run with 15 times, last 15 average (ms): 266.638 | 278.999 266.134 266.417 <12 more> # 1 thread q06>> Run with 15 times, last 15 average (ms): 138.556 | 144.474 137.837 137.579 <12 more> # 2 threads q06>> Run with 15 times, last 15 average (ms): 71.8851 | 75.339 72.102 72.341 <12 more> # 4 threads q06>> Run with 15 times, last 15 average (ms): 73.111 | 75.867 72.53 72.936 <12 more> # 8 threads q06>> Run with 15 times, last 15 average (ms): 56.1003 | 59.263 56.057 56.039 <12 more> # 16 threads q06>> Run with 15 times, last 15 average (ms): 56.8858 | 59.466 56.651 57.109 <12 more> # 32 threads q06>> Run with 15 times, last 15 average (ms): 53.4254 | 55.884 54.457 52.878 <12 more> # 64 threads It may become verbose when you have to extract information for all queries over three different kinds of versions. We provide a simple solution for it. ./run.sh fetch log | python gen_for_copy.py Output data in the following format // query id | naive | opt1 | opt2 | ----------------------- | ... | ... | ... | # 1 thread | ... | ... | ... | # 2 threads ... ... ... | ... | ... | ... | # 64 threads Note that we copy the generated numbers into an Excel described in Section 3. Within an Excel file, we compare the performance difference in MonetDB and different versions of the generated C code. Compilation time Work directory /home/hanfeng/cc20/horse/codegen Fetch compilation time for different kinds of C code ./run.sh compile naive &> log_cc20_compile_naive.txt ./run.sh compile opt1 &> log_cc20_compile_opt1.txt ./run.sh compile opt2 &> log_cc20_compile_opt2.txt Let's look into the result of query 1 in the log file log_cc20_compile_naive.txt. Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 1266 kB ( 18%) phase parsing : 0.07 ( 54%) 0.07 ( 88%) 0.14 ( 64%) 3897 kB ( 55%) phase opt and generate : 0.06 ( 46%) 0.01 ( 12%) 0.07 ( 32%) 1899 kB ( 27%) dump files : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 9%) 0 kB ( 0%) df reg dead/unused notes : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 31 kB ( 0%) register information : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 0 kB ( 0%) preprocessing : 0.03 ( 23%) 0.02 ( 25%) 0.08 ( 36%) 1468 kB ( 21%) lexical analysis : 0.00 ( 0%) 0.03 ( 38%) 0.05 ( 23%) 0 kB ( 0%) parser (global) : 0.04 ( 31%) 0.02 ( 25%) 0.01 ( 5%) 2039 kB ( 29%) tree SSA other : 0.00 ( 0%) 0.01 ( 12%) 0.00 ( 0%) 3 kB ( 0%) integrated RA : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 5%) 726 kB ( 10%) thread pro- & epilogue : 0.02 ( 15%) 0.00 ( 0%) 0.00 ( 0%) 41 kB ( 1%) shorten branches : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 0 kB ( 0%) final : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 56 kB ( 1%) initialize rtl : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 5%) 12 kB ( 0%) rest of compilation : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 62 kB ( 1%) TOTAL : 0.13 0.08 0.22 7072 kB The whole compilation time is split into many parts. We take the total wall time as the actual time spent on the code compilation. In this query, it needs 0.22 seconds to complete the whole compilation. (Note that manual work is required for retrieving the compilation time.) 3. Results We have a lot of numbers generated by our experiments. We use R and Excel to process these performance numbers and R for plotting figures. There are two kinds of data: (1) the data can be used in R directly, such as the execution time of various versions of C code with different number of threads; and (2) the data needs to pre-processed in Excel before it is sent to R. In order to make it easy for further data analysis, we intentionally provide scripts to make an Excel-friendly format that allows us to copy and paste numbers easily. Work directory /home/hanfeng/cc20/plot For example, the R script for generating figure 10 fig10/plot-gmeans.R The Excel file result-book.xlsx Note: We use RStudio Version 1.2.1335 to generate figures from our R scripts. Since RStudio is a GUI-based software, you can install it on another machine with GUI and run scripts from it.
  • See also the docker image on DockerHub: https://hub.docker.com/r/wukefe/cc20-docker.