How to Use Hive UDFs with Druid

Created with Sketch.

How to Use Hive UDFs with Druid

Metatron distributed Druid supports SQL like Apache Druid. It also provides more usability by supporting the Hive’s UDF and UDAF. In this article, we will see how to register and use UDFs written in Hive with the Metatron distributed Druid.

Note: You must download the latest hotfix Metatron distributed Druid for using the UDF plugin. Some bugs exist in previous versions.

First, prepare the UDF jar file used by Hive. In this example, we will create a simple UDF function by referring to the following URL.

https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-java-udf

If you create a UDF by referring to the URL above and complete the build, the Example UDF-1.0-SNAPSHOT.jar file is generated. Copy this file to $ DRUID_HOME / extensions / druid-hive-udf-extensions.

> cp ExampleUDF-1.0-SNAPSHOT.jar $DRUID_HOME/extensions/druid-hive-udf-extensions

Then create hive.function.properties in the $ DRUID_HOME/extensions/druid-hive-udf-extensions directory and write the following to register the UDF in Druid.

tolower=com.microsoft.examples.ExampleUDF

Finally, modify the configuration file of Druid. Add the hive udf plugin to the $ DRUID_HOME/conf/druid/_common/ common.runtime.properties file as follows:

druid.extensions.loadList=["druid-jdbc-firehose", "druid-histogram", "druid-datasketches", "druid-hive-extensions", "mysql-metadata-storage", "druid-hdfs-storage", "druid-range", "druid-area", "druid-stats", "druid-jdbc-firehose", "druid-orc-extensions", "druid-kafka-indexing-service", "druid-lucene-extensions", "druid-geotools-extensions", "druid-hive-udf-extensions"]

Now restart the server.

stop-single.sh
start-single.sh

When Druid starts up successfully, the following message will be displayed in the log.

...
2019-11-20T05:09:12,626 INFO [main] io.druid.hive.HiveUDFDruidModule - > 'tolower' is registered with class com.microsoft.examples.ExampleUDF
...

Now check if UDF works properly using node shell.

bin/node.sh tools shell
>sql
sql>select tolower(City) from sales_geo limit 10;
[EXPR$0]
  --------
  [houston]
  [naperville]
  [naperville]
  [naperville]
  [philadelphia]
  [henderson]
  [athens]
  [henderson]
  [henderson]
  [henderson]

Use Hive UDAF through the registration in a similar way. If you activate a Hive UDF plugin, many already developed UDFs will be registered. Check out the list below and use the functions you need. If you are running a distributed environment, do the same for all Druid nodes.

sql> select * from sys.functions;
[name, type, external]
  ----------------------
  [$assign, UDF, false]
  [$assignfirst, UDF, false]
  [$avg, UDF, false]
  [$delta, UDF, false]
  [$dense_rank, UDF, false]
  [$first, UDF, false]
  [$histogram, UDF, false]
  [$irr, UDF, false]
  [$lag, UDF, false]
  [$last, UDF, false]
  [$lead, UDF, false]
  [$max, UDF, false]
  [$mean, UDF, false]
  [$min, UDF, false]
  [$mirr, UDF, false]
  [$next, UDF, false]
  [$npv, UDF, false]
  [$nth, UDF, false]
  [$percentile, UDF, false]
  [$prev, UDF, false]
  [$rank, UDF, false]
  [$row_num, UDF, false]
…
  [tolower, UDF, true]
  [toradians, UDF, false]
  [trim, UDF, false]
  [trunc, UDF, false]
  [truncate, UDF, false]
  [truncatedrecent, UDF, true]
  [ulp, UDF, false]
  [unix_timestamp, UDF, false]
  [upper, UDF, false]
  [variance, UDAF, true]
  [weekofweekyear, UDF, false]
  [weekyear, UDF, false]
  [year, UDF, false]
  [||, UDF, false]
> Retrieved 483 rows in 22 msec

If you have any questions about the post, leave a comment or post questions to our forum. Hope this post helps you!

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *