Running TPC-H using SQL in Druid

Running TPC-H using SQL in Druid

Since 2016, there have been frequent requests to run SQL in the druid. As a result, Apache Druid introduced it as a experimental feature (http://druid.io/docs/latest/querying/sql). Using it, we improve the function to support more powerful SQL than original’s.

Here you can download the Metatron engine powered by Druid here.

Built-in SQL in Apache Druid

Apache Druid’s built-in SQL provides SQL end point in the Druid broker so that can run SQL queries.
Here is an example of the druid query:

{"query": "...", "context": {...}}

Also it provides JDBC driver.

jdbc:avatica:remote:url=http://broker/druid/v2/sql/avatica/

We ported this function to our Metatron engine last year.

Limitations

The Druid built-in SQL only supports DML not DDL, DCL, DTL. And there is no simple commands for searching schema. For example, “show tables” should be written in “select TABLE_NAME from INFO_SCHEMA.TABLES”. Also Druid in SQL supports only SELECT queries, not JOIN. Subqueries are only allowed to use “group by in group by”.

But Metatron engine supports join in broker and also can run and queries on any sub-queries. So that we thought it could be possible to test TPC-H on the engine.

Running TPC-H on Metatron engine

What is the TPC-H?

A TPC-H Benchmark is a transaction processing and database benchmark specific to decision support – i.e. analytics, run and managed by the Transaction Processing Performance Council. It requires system to examine large volumes of data and query with a high degree of complexity.

Here are examples of TPC-H queries. It contains multi-joins, in-subqueries and CTEs.

Running SQL in Druid

In the master branch of Druid, it uses Apache Calcite to run SQL in the Druid.
Apache Calcite does parsing jobs listed below:
– From SQL to SQLNode using SQL parser
– From SQLNode to Logical RelNode
– From Logical RelNode to Druid RelNode using rules

After Calcite parses SQL into Druid RelNodes, we can make physical JSON query to run on Druid.

Make improvements

Cause SQL Parser and Rules are extensible, we can add new grammar easily by importing and modifying some files in Apache Calcite. We modified the files imported from the Calcite to implement the additional functionality we needed.

Extending Grammar
– New parser definition in parserImpls.ftl
– New parser template to Parser.jj
– New extensions to config.fmpp
– Rewrite “show tables” to select

Extending Rules
– Define pattern for applying target nodes
– Convert DruidRelNode
– Implement DruidRelNode
– Return druid query in here

Result of running TPC-H on Druid

The result was impressive that the success rate is about 95%. But some queries are too slow in plan or process.

Next Plan

We can think of list of further improvements in performance and usability below. Wish to share them in next meetup.
– Join optimization like bush-tree or multi-way
– Join reordering via Calcite MetaQuery
– Grammar for window functions which we already has
– CTAS/LOAD/Materialized View
– TPC-DS

You can download the Metatron engine powered by Druid here.

One Response

  1. Lee jung ryong says:

    Metatron distributed Druid can execute “show tables”.

Leave a Reply

Your email address will not be published. Required fields are marked *