Time Series Data Analysis Using Metatron Discovery
Metatron Discovery utilizes Druid as a data processing engine, and thus has many strengths in analyzing large time series based data. Metatron Discovery was developed to take full advantage of these features of Druid. In this example, we will explain how to analyze time series data using Metatron Discovery through an example.
Visualizing time series data
Basically, time series data is expressed as the relationship between Druid’s __time field and measure. Which field to use for the Druid’s __time field can be determined at the ingestion stage. If not specified, Metatron Discovery defines the current time as the event_time field name. First, let’s simply visualize the time series data. When visualizing using a bar chart, this can be accomplished by placing the event_time defined as a time field on the column shelf and placing the measure field on the measure shelf.
[Basic bar chart]
The event_time field is defined as a time field in Discovery, so you can set the unit of time. By changing the unit of time, you can check the statistical information in the unit of time you want.
[Changing the unit of time field]
You can also define units of time as continuous and discontinuous. The following is an example of checking the number of weekly user visits every month using units of continuous and discontinuous time.
[Combination of continuous and discontinuous time]
Add series to compare
To compare series by adding series to the chart, you can add series by adding measurements or additionally placing dimensions on the shelf. First, if you want to compare multiple measurements together, simply put all the measures you want to compare on the shelf.
[Adding measure fields]
Next, you can add a measure to compare using the dimension field. If you add a dimension to the shelf, you can split the measure into the added dimension (Cube’s split concept). For example, in a bar chart, you can place the dimension you want to use for splitting anywhere in the row, column, and measure shelves. At this time, the measured value is splitted according to the shelf where the dimension is placed.
[Example of adding a dimension field to a row or column shelf]
Identifying series by changing color
Metatron Discovery supports the function to define the color according to Series, Dimension and Measure. ‘Color by series’ defines color by measurement field, ‘Color by dimension’ defines color by dimension field, and ‘Color by measure’ defines color by measurement value.
[Example of ‘Color by series’]
Create a new measurement field using a formula
Metatron Discovery supports calculation formulas to create user-defined measurements. Calculation expressions provide more than 100 functions in addition to basic operators. You can combine these functions to create your own formula.
[Calculation of the number of visitors when a user ID exists]
If you can’t create the measurement you want using the calculation formula, you can develop your own function using the Hive UDF/UDAF add-on provided by Metatron distribution Druid. For the Hive UDF/UDAF add-on, see “How to Use Hive UDFs with Druid“.
So far, we have looked at various methods of analyzing time series data. Discovery provides a variety of other functions besides what is described (eg, time series analysis using a time type dimension other than the __time field, sorting data, etc.). For more information, please refer the Discovery Manual, user group, or contact firstname.lastname@example.org.
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?