How to Use Hive UDFs with Druid

Created with Sketch.

How to Use Hive UDFs with Druid

0
(0)

Metatron distributed Druid supports SQL like Apache Druid. It also provides more usability by supporting the Hive’s UDF and UDAF. In this article, we will see how to register and use UDFs written in Hive with the Metatron distributed Druid.

Note: You must download the latest hotfix Metatron distributed Druid for using the UDF plugin. Some bugs exist in previous versions.

First, prepare the UDF jar file used by Hive. In this example, we will create a simple UDF function by referring to the following URL.

https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-java-udf

If you create a UDF by referring to the URL above and complete the build, the Example UDF-1.0-SNAPSHOT.jar file is generated. Copy this file to $ DRUID_HOME / extensions / druid-hive-udf-extensions.

> cp ExampleUDF-1.0-SNAPSHOT.jar $DRUID_HOME/extensions/druid-hive-udf-extensions

Then create hive.function.properties in the $ DRUID_HOME/extensions/druid-hive-udf-extensions directory and write the following to register the UDF in Druid.

tolower=com.microsoft.examples.ExampleUDF

Finally, modify the configuration file of Druid. Add the hive udf plugin to the $ DRUID_HOME/conf/druid/_common/ common.runtime.properties file as follows:

druid.extensions.loadList=["druid-jdbc-firehose", "druid-histogram", "druid-datasketches", "druid-hive-extensions", "mysql-metadata-storage", "druid-hdfs-storage", "druid-range", "druid-area", "druid-stats", "druid-jdbc-firehose", "druid-orc-extensions", "druid-kafka-indexing-service", "druid-lucene-extensions", "druid-geotools-extensions", "druid-hive-udf-extensions"]

Now restart the server.

stop-single.sh
start-single.sh

When Druid starts up successfully, the following message will be displayed in the log.

...
2019-11-20T05:09:12,626 INFO [main] io.druid.hive.HiveUDFDruidModule - > 'tolower' is registered with class com.microsoft.examples.ExampleUDF
...

Now check if UDF works properly using node shell.

bin/node.sh tools shell
>sql
sql>select tolower(City) from sales_geo limit 10;
[EXPR$0]
  --------
  [houston]
  [naperville]
  [naperville]
  [naperville]
  [philadelphia]
  [henderson]
  [athens]
  [henderson]
  [henderson]
  [henderson]

Use Hive UDAF through the registration in a similar way. If you activate a Hive UDF plugin, many already developed UDFs will be registered. Check out the list below and use the functions you need. If you are running a distributed environment, do the same for all Druid nodes.

sql> select * from sys.functions;
[name, type, external]
  ----------------------
  [$assign, UDF, false]
  [$assignfirst, UDF, false]
  [$avg, UDF, false]
  [$delta, UDF, false]
  [$dense_rank, UDF, false]
  [$first, UDF, false]
  [$histogram, UDF, false]
  [$irr, UDF, false]
  [$lag, UDF, false]
  [$last, UDF, false]
  [$lead, UDF, false]
  [$max, UDF, false]
  [$mean, UDF, false]
  [$min, UDF, false]
  [$mirr, UDF, false]
  [$next, UDF, false]
  [$npv, UDF, false]
  [$nth, UDF, false]
  [$percentile, UDF, false]
  [$prev, UDF, false]
  [$rank, UDF, false]
  [$row_num, UDF, false]
…
  [tolower, UDF, true]
  [toradians, UDF, false]
  [trim, UDF, false]
  [trunc, UDF, false]
  [truncate, UDF, false]
  [truncatedrecent, UDF, true]
  [ulp, UDF, false]
  [unix_timestamp, UDF, false]
  [upper, UDF, false]
  [variance, UDAF, true]
  [weekofweekyear, UDF, false]
  [weekyear, UDF, false]
  [year, UDF, false]
  [||, UDF, false]
> Retrieved 483 rows in 22 msec

If you have any questions about the post, leave a comment or post questions to our forum. Hope this post helps you!

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Share this post on your social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Leave a Reply

Your email address will not be published. Required fields are marked *