Currently, there are five types of Supported DB in Metatron Discovery.

  • MySQL
  • Hive
  • PostgreSQL
  • Presto
  • Druid

But you may need to connect to a DB other than those listed above.
For example, Microsoft SQL Server.
It is not so difficult if the DB you want to connect to provides a JDBC driver.
Add the connection extension for Microsoft SQL Server through the following guide.

Connection extension basic

The connection extension consists of three main interfaces: JdbcDialect, JdbcAcceseor, and JdbcConnector.

The functions of each interface are summarized as follows.

  • JdbcDialect: Dialect for a specific DB. Defines the JDBC connection url format, Query, Format, and so on.
  • JdbcConnector: manage connection with DB
  • JdbcAcceseor: Uses JdbcConnector and JdbcDialect to query the data of actual DB.

JdbcConnector and JdbcAccessor are abstract implementations, so you do not have to add anything else, just define JdbcDialect.
Then, when building the module, all of the Library Dependencies including the JDBC Driver are packaged and made into a zip file.
After distributing the generated extension binary to Metatron Discovery’s extensions path and restarting Metatron Discovery, that extension becomes available in Discovery.
Let’s create a Microsoft SQL Server connection extension step by step.

Step 1. create an extension module

You can create a simple module project using the maven archetype.

1. Navigate to the directory of the discovery-extensions module.
metatron-discovery $ cd discovery-extensions
2. Create an extension module using archetype.
discovery-extensions $ mvn archetype:generate -DarchetypeGroupId=app.metatron.discovery -DarchetypeArtifactId=discovery-extension-connection-archetype -DarchetypeVersion=1.0.0
3. In interactive mode, enter the desired value.
$ mvn archetype:generate -DarchetypeGroupId=app.metatron.discovery -DarchetypeArtifactId=discovery-extension-connection-archetype -DarchetypeVersion=1.0.0

After entering the above command, enter the following in interactive mode to simplify the basic structure of the Extension Module.

  • groupId : app.metatron.discovery
  • artifactId : mssql-connection
  • version : 7.3.0
  • package :
[INFO] Scanning for projects...
 [WARNING]
 [WARNING] Some problems were encountered while building the effective model for app.metatron.discovery:discovery-extensions:pom:3.2.0
 [WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ app.metatron.discovery:metatron-discovery:3.2.0, /../metatron-discovery/pom.xml, line 105, column 21
 [WARNING]
 [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
 [WARNING]
 [WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
 [WARNING]
 [INFO]
 [INFO] ------------------------------------------------------------------------
 [INFO] Building discovery-extensions 3.3.0-rc1
 [INFO] ------------------------------------------------------------------------
 [INFO]
 [INFO] >>> maven-archetype-plugin:3.0.1:generate (default-cli) > generate-sources @ discovery-extensions >>>
 [INFO]
 [INFO] <<< maven-archetype-plugin:3.0.1:generate (default-cli) < generate-sources @ discovery-extensions <<<
 [INFO]
 [INFO]
 [INFO] --- maven-archetype-plugin:3.0.1:generate (default-cli) @ discovery-extensions ---
 [INFO] Generating project in Interactive mode
 [INFO] Archetype repository not defined. Using the one from [app.metatron.discovery:discovery-extension-connection-archetype:1.0.0] found in catalog local
 Define value for property 'groupId': app.metatron.discovery
 Define value for property 'artifactId': custom-connection
 Define value for property 'version' 1.0-SNAPSHOT: : 7.3.0
 Define value for property 'package' app.metatron.discovery: :
 Confirm properties configuration:
 groupId: app.metatron.discovery
 artifactId: mssql-connection
 version: 7.3.0
 package: app.metatron.discovery
 Y: : Y
 [INFO] ----------------------------------------------------------------------------
 [INFO] Using following parameters for creating project from Archetype: discovery-extension-connection-archetype:1.0.0
 [INFO] ----------------------------------------------------------------------------
 [INFO] Parameter: groupId, Value: app.metatron.discovery
 [INFO] Parameter: artifactId, Value: mssql-connection
 [INFO] Parameter: version, Value: 7.3.0
 [INFO] Parameter: package, Value: app.metatron.discovery
 [INFO] Parameter: packageInPathFormat, Value: app/metatron/discovery
 [INFO] Parameter: package, Value: app.metatron.discovery
 [INFO] Parameter: version, Value: 7.3.0
 [INFO] Parameter: groupId, Value: app.metatron.discovery
 [INFO] Parameter: artifactId, Value: mssql-connection
 [INFO] Parent element not overwritten in /../metatron-discovery/discovery-extensions/mssql-connection/pom.xml
 [INFO] Project created from Archetype in dir: /../metatron-discovery/discovery-extensions/mssql-connection
 [INFO] ------------------------------------------------------------------------
 [INFO] BUILD SUCCESS
 [INFO] ------------------------------------------------------------------------
5. Created module structure

The files generated by Maven archetype are simply a class. (WelcomeConnectionExtension.java)

discovery-extensions $ tree mssql-connection
mssql-connection
 ├── pom.xml
 └── src
     └── main
         ├── assembly
         │   └── assembly.xml
         ├── java
         │   └── app
         │       └── metatron
         │           └── discovery
         │               └── WelcomeConnectionExtension.java
         └── resources

 

Step 2. Customize extension

Let’s modify the extension project to support Microsoft SQL Server.

1. Rename Extension Class

The first thing to do is to rename the Welcome classes.

  • Modify the name of the WelcomeConnectionExtension class to MssqlConnectionExtension. (modify the file name too)
  • Modify the name of the WelcomeDataAccessor class to MssqlDataAccessor.
  • Modify the name of the WelcomeDialect class to MssqlDialect.
2. Open and modify the created pom.xml.

– In the parent.version property, type metatron-discovery version (ex: 3.3.0-rc1)

<parent>
  <artifactId>discovery-extensions</artifactId>
  <groupId>app.metatron.discovery</groupId>
  <version>3.3.0-rc1</version>
</parent>

– Change the properties as desired. The rest of the properties need not be modified and only the explicit name of the plugin class needs to be modified.
Just change the plugin.class property to the explicit name of the class you rename above. Please refer to the sample below.

<properties>
  <project.distribution.path>${basedir}/../../discovery-distribution</project.distribution.path>

  <!-- extension id-->
  <plugin.id>${project.artifactId}-extension</plugin.id>

  <!-- extension Class (needs to be modified) -->
  <plugin.class>app.metatron.discovery.MssqlConnectionExtension</plugin.class>

  <!-- extension version -->
  <plugin.version>${project.version}-${project.parent.version}</plugin.version>

  <plugin.provider></plugin.provider>
  <plugin.dependency></plugin.dependency>
</properties>
3. Add the Microsoft SQL Server JDBC Driver as a dependency

The version of mssql-jdbc that corresponds to the version of SQL Server you want to connect to can be found here:
https://docs.microsoft.com/en-us/sql/connect/jdbc/system-requirements-for-the-jdbc-driver.

<dependencies>
  <dependency>
    <groupId>com.microsoft.sqlserver</groupId>
    <artifactId>mssql-jdbc</artifactId>
    <version>7.3.0.jre8-preview</version>
  </dependency>
</dependencies>
4. Modify MssqlConnectionExtension.java

Let’s look at the Extension Class.
The class only serves to link the application with the @Extension annotated class, and the actual implementation is JdbcDialect, as described above.
Although the JdbcDialect interface consists of many functions, implementing the essential parts below is sufficient for functional operation.

  • Connection Name : This is the name of Connetion to show to the user. Simply enter MS-SQL.
@Override
public String getName() {
  return "MS-SQL";
}

 

  • Connetion Implementor : You can input the unique delimiter of the connection.
@Override
public String getImplementor() {
  return "MSSQL";
}

 

  • InputSpec : This is a spec that the user must enter in order to establish a connection to the DB. Host and Port are default values. In the case of SQL Server, the three values of Username, Password, and Database are required.
@Override
public InputSpec getInputSpec() {
  return (new InputSpec())
      .setAuthenticationType(InputMandatory.MANDATORY)
      .setUsername(InputMandatory.MANDATORY)
      .setPassword(InputMandatory.MANDATORY)
      .setCatalog(InputMandatory.NONE)
      .setSid(InputMandatory.NONE)
      .setDatabase(InputMandatory.MANDATORY);
}

 

  • DriverClass : This is the explicit name of the DriverClass to create the Connection. For the mssql-jdbc dependency added above, use a driver named “com.microsoft.sqlserver.jdbc.SQLServerDriver”.
@Override
public String getDriverClass(JdbcConnectInformation connectInfo) {
  return "com.microsoft.sqlserver.jdbc.SQLServerDriver";
}

 

  • DataAccessorClass : The explicit name of the DataAccessorClass to use. Create an instance of the class and use it as a DataAccessor class to query the actual DB data.
@Override
public String getDataAccessorClass(JdbcConnectInformation connectInfo) {
  return "app.metatron.discovery.MssqlConnectionExtension$MssqlDataAccessor";
}

 

jdbc:sqlserver://<hostname>:<port>;database=<database>
@Override
public String makeConnectUrl(JdbcConnectInformation connectInfo, String database, boolean includeDatabase) {
  if(StringUtils.isNotEmpty(connectInfo.getUrl())) {
    return connectInfo.getUrl();
  }

  StringBuilder builder = new StringBuilder();
  builder.append("jdbc:sqlserver:/");

  // Set HostName
  builder.append(URL_SEP);
  builder.append(connectInfo.getHostname());

  // Set Port
  if(connectInfo.getPort() != null) {
    builder.append(":").append(connectInfo.getPort());
  }

  // Set DataBase
  if(StringUtils.isNotEmpty(connectInfo.getDatabase()) && includeDatabase) {
    builder.append(";");
    builder.append("database=");
    builder.append(connectInfo.getDatabase());
  }
  return builder.toString();
}

You are now ready to connect to Microsoft SQL Server through Metatron Discovery.

 

Step 3. Build & Test Extension

1. Build Metatron Discovery
metatron-discovery $ mvn clean install -DskipTests
2. Start Metatron Discovery

Untar the archive binary file of Metatron Discovery.

$ tar zxf metatron-discovery-{VERSION}-{TIMESTAMP}-bin.tar.gz
$ bin/metatron.sh
3. Create Connection to Microsoft SQL Server

Log in as the admin account admin and go to Management -> Data Storage -> Data Connection. In the Data Connection list, select the “Create new Data Connection” button to create a Data Connection.

Create data connection

Connectable DB Type You can see that “MS-SQL” type is added to the right side.
Enter the Microsoft SQL Server information to test and create a Data Connection.

4. Create Workbench

Now create a workbench using the newly created Data Connection. Go to Admin’s workspace and select the Workbench button in the bottom right corner.

 

Create workbench

Select the Microsoft SQL Server Connection you just created and create a workbench.

5. Execute SQL

Execute simple query

You can see the result successfully by executing a simple query.

 

An additional feature for Microsoft SQL Server only

So far we have been talking about the usual DB Connection Extension, and let’s add functionality for Microsoft SQL Server only.

Schema Browser

The Workbench’s Schema Browser has an Information Tab that allows you to view table details.

 

Information Tab before modification

Currently, if you look up the Information tab, it says No Data.

 

Let’s implement additional extensions to query the sys schema in Microsoft SQL Server for additional information and display the results in the Information tab.

/**
 * Gets table desc query.
 *
 * @param connectInfo the connect info
 * @param catalog the catalog
 * @param schema the schema
 * @param table the table
 * @return the table desc query
 */
 String getTableDescQuery(JdbcConnectInformation connectInfo, String catalog, String schema, String table);

 

Below is a simple query sample that can be used to query the creation date and time of the table and the modification date.

SELECT
  t.name as TableName,
  t.create_date as CreateDate,
  t.modify_date as ModifyDate
FROM
  sys.tables t
WHERE t.NAME = 'some_table_name';

 

Implement the getTableDescQuery Method using this query.

MssqlConnectionExtenstion.java
@Override
public String getTableDescQuery(JdbcConnectInformation connectInfo, String catalog, String schema, String table) {
  return "SELECT \n" +
      "    t.name as TableName,\n" +
      "    t.create_date as CreateDate,\n " +
      "    t.modify_date as ModifyDate\n" +
      "FROM \n" +
      "    sys.tables t\n" +
      "WHERE t.NAME = '" + table + "';";
}

 

Once you have modified the code above, you can build and deploy it again and restart Metatron Discovery.

 

Information Tab after modification

If you go to the Workbench’s Schema Browser, you can see that additional information appears on the information tab.

 

Conclusion

We have created a connection extension for Microsoft SQL Server.
There are various database services in Microsoft Azure. Of course, there are some databases not yet supported by Metatron Discovery.
However, you can easily create additional connection extensions like above.

Again, we bring you new release news from Metatron Discovery! Starting with this release, the release cycle has changed to 3 weeks instead of 2 weeks to pay more attention to the completeness of the product. As the development period has increased, we intend to distribute it with more care for the integrity of the product. This time, we bring you two releases. One has up-to-date new features for 3.3.0 minor release but not stabilized yet. The other is a stable version of the last releases. Then let’s see what has been improved.

Metatron Discovery 3.3.0_rc1

For 3.3.0 release, we are improving data source usabilities for ingestion and metadata management.

Support changing column schema on ingestion stage (#1202, #1920)

Now you can do more when creating data sources. When setup schema of data source for ingestion stage, now you can change the column name & can delete user defined columns. Edit your own data schema for using proper analysis job from now on.

Enhance re-ingestion datasource with file(#2178)

When druid data source ingestion fails, you had to re-do the whole steps for data ingestion. But from the 3.3.0-rc1, it can be processed with just a one click. Quick re-ingestion is only available for local file type data source yet. But by next release we will cover db-connection type data source also.

Creating a metadata by database in data connection(#1539)

Manage your metadata on all db connections supported by Metatron Discovery. Now we support straight registration for metadata on your databases. Are you a data manager? Make a connection with your DBs and register multiple metadata tables at once.

Metatron Discovery 3.2.4

This is a stable version for the previous 3.2.x releases. We’ve efforted to catch the bugs in this release. Hope you will experience more stability. For instance, we fixed an error related to a DB connection when loading the dataset in data preparation. Check out our release note for the details.

Metatron optimized Druid 3.2.4

From now on, we highly recommend to upgrade your Druid according to the Metatron Discovery version. Use druid-metatron-3.2.4.tar.gz. We got massive updates as follows:

Update easymock to 3.4 (#2162)
Support 한글 token in SQL parser (#2160)
Support router in single mode (#2155)
Fix tasks command at DruidShell (#2143)
Support more result formats and column header at SQL (#2141)
Support full param to supervisor api (#2140)
Support for router forwarding request to active coordinator/overlord (#2139)
Implment SQLMetadataStorageActionHandler (#2138)
Improve task retrieval APIs on Overlord (#2137)
Apply java8 (#2136)
Introduce SystemSchema tables (#2135)
Add Unified web console (#2134)
Hadoop ingestion in ReduceMerge mode does not progress from second index (#2151)
Do not keep index in memory for index viewer (#2121)
Minor log message improvements (#2124)
Remove classpath from logging in MiddleManager (#2123)
Regard U+00A0 as whitespace (#2122)
Fix NPE when dimension spec is not exists (#2117)
Result of hive udaf is not deterministic (#2116)

How to get version 3.2.4 or 3.3.0-rc1

You can always download the latest version of Metatron Discovery at metatron.app/download/. Choose one from 3 types of installation: Binary, VM, Docker image. Leave your problems to our forum if you get stuck or need help for installation.

Thanks always for using Metatron Discovery! We’ll back to soon with a new release.

Metatron Discovery was launched four years ago to facilitate SK Telecom‘s in-house network analysis. Over the past four years, SK Telecom has made several innovations in data analysis with Metatron Discovery and has grown to become a daily platform for over 900 users today. We believe that this platform has had a very significant impact on improving analytical work, and we have been working to extend it to other industries as well. We have successfully applied Metatron Discovery to several other companies in Korea, and we are confident that we can be extended to a wider area. Above all, what we want is to help more users to improve their analysis environment using Metatron Discovery. This will lead to many improvements and improve our product also.

Metatron Discovery Usecases

We are proud that Metatron Discovery is part of the open source revolution, and it is the most important that we must support for the open source community. We also noticed that every issue arising from the open source community is very important and valuable than anything else. We will continue to maintain Metatron Discovery as an open source, and continue to develop a community where more developers and vendors can participate. Based on this belief, we will be releasing new features such as Stream Analyzer and Meta Data Management, which are currently under development, as an open source soon.

Metatron Discovery Github Repository

However, there are some areas where it is difficult to quickly respond as an open source in the enterprise analysis environment. Many difficulties in the enterprise environment, such as the installation of clusters, the resolution of urgent errors, and the support for management operations of complex systems, are difficult to solve with open source and community alone. We have been working with our partners and professional developers to quickly resolve many of the challenges of applying Metatron Discovery in the enterprise, and finally, we are releasing that enterprise version today.

Data Portal: Special feature for Enterprise Edition

The enterprise edition includes not only the products of the Metatron Discovery team but also the partners’ products that we have worked with. We are happy to develop and expand with our partners together. Check out the pricing plan and compare with open source version.

We believe that the enterprise edition will provide the foundation for using Metatron Discovery in enterprise environments or for building products based on it. We are sure Metatron Discovery will grow even further. Feel free to Contact us if you have questions about enterprise edition.

Apache Druid implemented unified web consle from 0.14 version. Before the console, Druid has three web consles for the coordinator, overlord and old-coordinator which are disconnected with each other. That’s why Apache Druid proposed new unified web console. (Still old consoles are existed.) Detail improvement history is linked here.

We also have our own engine monitoring console. But for compatibility with Apache Druid, this time we also backported the unified console of Apache Druid.

To run the unified consle, you need to follow the below process.

Run router

Let’s execute router which web consle runs at.

Configuration

Configuration file should be placed as follows like any other Druid servers.

conf/druid
        +----_common
        +----broker
        +----coordinator
        +----historical
        +----middleManager
        +----overlord
        +----router
                +----jvm.config
                +----runtime.properties

jvm.config example:

-server
-Xms512m
-Xmx512m
-XX:+UseG1GC
-XX:MaxDirectMemorySize=512m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

runtime.properties example:

druid.service=druid/router
druid.port=8888
 
# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
 
# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
 
# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true

Start router

Now start the router with a simple shell script.

$ bin/router.sh start

bin/router.sh example:

#!/bin/bash -eu
 
usage="Usage: router.sh (start|stop)"
 
if [ $# -lt 1 ]; then
  echo $usage
  exit 1
fi
 
sh ./bin/node.sh router $1

Access to web console

If router is runningm then you can access to the console of the router.

http://{router_host:router_port}/unified-console.html 

For example, http://localhost:8888/unified-console.html can be the URL.

That’s it! now you can see running console on your browser.

In a previous post we looked at how to put data into a druid via Metatron Discovery, and summarized some things to note.

This time, I’ll show you how to deal with data ingestion failures.

Check for data source load failures

When you create a data source in Metatron Discovery, a new list item is created in the “Data Storage> Data Source” in the Preparing state.

Click on the data source list to enter the detail window and you can check the status of the load in real time. If the data source load status is fail, or if you are still in the preparing state over time, then there is a problem loading the data source.

There are two things you need to know to figure out what the problem is.

  • Druid ingestion task
  • Hadoop Map Reduce job

Checking Druid ingestion task status

There are five node types in the druid.

  • Coordinator processes manage data availability on the cluster.
  • Overlord processes control the assignment of data ingestion workloads.
  • Broker processes handle queries from external clients.
  • Historical processes store queryable data.
  • MiddleManager processes are responsible for ingesting data.

The overload manages the ingestion of data. So if you have a problem with ingestion, you should check the detail log on the druid’s overload console. The port of the overload console is set to 8090 by default. Find your data source task and see detail logs from it.

If you can’t access to the server, you can easily check the log in the Datasource details. If you search by “error”, detailed error message can be retrieved.

If it is an “index” error, it may occur because there is a record whose type is not correct or null. In this case, you can fix the problem by taking a snapshot with “HIVE” after preprocessing the data using the data preparation feature of Metatron Discovery. I’ll handle this case later with another post.

2019-06-05T07:42:39,744 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_apartment_trade_3_2019-06-05T07:41:30.109Z, type=index_hadoop, dataSource=apartment_trade_3}] 

If the overload task appears to be successful but still the data source state is preparing, make sure the data source is created in the druid’s coordinator (port number is 8081).

Even if the task is terminated normally, there may be cases in which no data is available due to an error in the parsing rule. Check that the delimiter is set correctly or whether the timestamp format matches when ingesting the data source.

Checking Hadoop Map Reduce job

But if the reason for the fail occurs in the MR job and you can not see it in the overlord log, you should check the Hadoop YARN log now.

You have to go through the log to figure out the problem, fix it and try again. In particular, it is a good idea to check if there is a sufficient number of mapper, that there is not enough memory to stop the operation, or if the library load has failed.

If we have enough error experience in the future, we will try to summarize the possible trobleshooting when loading the data source. If you have a problem right now, please feel free to contact us in our discussion channel.

With Metatron Discovery, you can analyze various data using ‘Workbook’ and ‘Workbench’.
In additionally for more advanced analysis, it supports interconnect with 3rd party Notebook application.

In this post, we will learn how to install the Jupyter and Zeppelin Notebook server.

Jupyter

Install Jupyter through Anaconda. Anaconda installation is recommended because data analysis requires a lot of Python Library.

Anaconda3

  • https://www.anaconda.com/distribution/(shows the latest version of Anaconda)
  • We need Python 3.x You can download here
$ ~/.Anaconda3-2018.12-MacOSX-x86_64.sh

After the installation, install R-kernel. (Only Python3-kernel comes with the package)

$ conda install -c r r –yes
$ conda install -c r r-essentials –yes
$ conda install -c r r-httr
$ conda install -c r r-jsonlite

// if you want to install more packages…

$ conda install -c r r-rserve --yes
$ conda install -c r r-devtools --yes
$ conda install -c r r-rcurl --yes
$ conda install -c r r-RJSONIO --yes
$ conda install -c r r-jpeg --yes
$ conda install -c r r-png --yes

//  if you want to update latest r packages 

$ conda update -c r --all

To use R-kernel on Jupyter, install the native library and set links as below, and verify the version. (for CentOS)

$ /usr/lib64/libpng12.so.0 -> /usr/lib64/libpng12.so.0.50.0
$ /usr/lib64/libXrender.so.1 -> /usr/lib64/libXrender.so.1.3.0
$ /usr/lib64/libXext.so.6 -> /usr/lib64/libXext.so.6.4.0
$ /usr/lib64/libc.so.6 -> libc-2.17.so

In addition, if you’d like to install Deep learning library or sparklyr, command as follows: (for CentOS)

$ conda install -c conda-forge tensorflow
$ conda install -c conda-forge keras
$ conda install -c r r-sparklyr

And to use matplotlib on Jupyter, install native library and set links as below, and verify the version. (for CentOS)

$ /usr/lib64/libGL.so.1 -> /usr/lib64/libGL.so.1.2.0
$ /usr/lib64/libxshmfence.so.1 -> /usr/lib64/libxshmfence.so.1.0.0
$ /usr/lib64/libglapi.so.0 -> /usr/lib64/libglapi.so.0.0.0
$ /usr/lib64/libXdamage.so.1 -> /usr/lib64/libXdamage.so.1.1.0
$ /usr/lib64/libXfixes.so.3 -> /usr/lib64/libXfixes.so.3.1.0
$ /usr/lib64/libXxf86vm.so.1 -> /usr/lib64/libXxf86vm.so.1.0.0

Generate-config

Generate a jupyter-config file for configuring pgcontents.

$ jupyter notebook --generate-config
$ vi /home/metatron/.jupyter/jupyter_notebook_config.py

Open the config file and add the codes below.

c.NotebookApp.notebook_dir = '/user/Metatron/jupyter'// common config

// Basically, it is assumed that the notebook server connected with discovery does not support authentication.
c.NotebookApp.allow_origin = '*'
c.NotebookApp.disable_check_xsrf = True
c.NotebookApp.token = ''
 
// no localhost
c.NotebookApp.ip = '0.0.0.0'

Custom Packages

pymetis

A utility package for Python-kernel used by metatron on Jupyter.

git clone https://github.com/metatron-app/discovery-jupyter-py-utils.git
 
$ cd discovery-jupyter-py-utils/

$ python setup.py sdist
$ pip uninstall pymetis

$ cp dist/pymetis-0.0.3.tar.gz {ANACONDA_HOME}/anaconda3/pkgs/

$ pip install {ANACONDA_HOME}/pkgs/pymetis-x.x.x.tar.gz (current ver. 0.0.3)

RMetis

A utility package for R-kernel used by metatron on Jupyter.

git clone https://github.com/metatron-app/discovery-jupyter-r-utils

$ cd discovery-jupyter-r-utils
$ R CMD build ${This Source Directory – Relative or Absolute path ok. Ex. /home/metatron/discovery-jupyter-r-utils}
  

$ cp RMetis_0.0.3.tar.gz ${ANACONDA_HOME}/pkgs/
$ R CMD INSTALL --no-multiarch ${ANACONDA_HOME}/pkgs/RMetis_x.x.x.tar.gz (current ver. 0.0.3)

Run

When all the above configurations are done, start the Jupyter process with the commands below. After that, connect to http://localhost:8888 and check if everything works fine.

If you need, you can change the port in ~/.jupyter/jupyter_notebook_config.py

$ mkdir {ANACONDA_HOME}/logs
$ nohup jupyter notebook >> {ANACONDA_HOME}/logs/jupyter.log 2>&1 &

Set a Spart directory

To execute the scripts created with Jupyter as an API, you need to install Spark on the same server as that of Metatron. (Run as a spark-driver-node)

After installation, set a directory in the METATRON_SPARK_HOME environment variable.

$ conf/metaron-env.sh
export METATRON_JAVA_OPTS="-Dspark.home.dir={SPARK_HOME}"

 

 


Zeppelin

Download and extract the installer from the link below.

Install

Download binary package from zeppelin home : http://zeppelin.apache.org/download.htmland extract package. (You can follow install guide in zeppelin home)

Custom Packages

Discovery-interpreter

A utility package for Spark-interpreter used by metatron on Zeppelin.

$ git clone https://github.com/metatron-app/discovery-zeppelin-interpreter.git
 $ mvn clean package -P prod -P spark-2.2 -DskipTests (Use “-Dspark.version=${spark version}” instead of -P “spark-2.2”)
 $ cp target/discovery-zeppelin-interpreter-{spark.version}-1.0.0.jar {ZEPPELIN_HOME}/lib/interpreter

Run

When all the above configurations are done, start the Zeppelin process with the command below. After that, connect to http://localhost:8080 and check if everything works fine.

If you need, you can change the port in conf/zeppelin-site.xml

$ ./{ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start

(optional) run in yarn-client mode

If you want to run Zeppelin Spark-interpreter’s master in yarn-client mode, you need to install and setup Zeppelin-Spark-Hadoop configuration.

from https://zeppelin.apache.org/docs/0.7.3/install/yarn_install.html

$ vi {ZEPPELIN_HOME}/conf/zeppelin-env.sh
  
 export MASTER=yarn-client
 export SPARK_HOME=/home/metatron/servers/spark-2.2.0-bin-hadoop2.7
 export HADOOP_CONF_DIR=/home/metatron/servers/hadoop-2.7.2/etc/hadoop

(optional) run with R interpreter

from https://zeppelin.apache.org/docs/0.7.3/interpreter/r.html

To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing conf/zeppelin-env.sh. If it is not set, the R Interpreter will not be able to interface with Spark. You should also copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml. That will ensure that Zeppelin sees the R Interpreter the first time it starts up.

Hello! We’re posting again for the newly released Metatron Discovery 3.2.3. We wanted to improve chart usability and give more information to developers who use Metatron Discovery. And so it is! we humbly introduce the version 3.2.3 main improvments:

Highlights

Metatron Discovery API document

Finally we open our RESTful API document from now on! You can look up the API list through the URL link within the server. Currently, it is a draft, but will add more API contents soon. To use the API listed, please refer to this document first.

Better “Map Chart” (#1965)

Now we support “MulitLineString” WKT type at Map chart and focused on solving some of minor problems at Map chart so that users can better usability from chart creation.

Fixed chart selection filter bug(#1907) & workspace permission(#1850)

There are serveral bugs about filter which should run when selecting a part of chart. Also we found that user priviliges some what act weirdly at some point. Now fixed and works like a charm.

More improvements and bug fixes

More than 40 other fixes and improvements were crammed into this jam-packed release. Check out the full list of enhancement.

How to get version 3.2.3

You can always download the latest version of Metatron Discovery at metatron.app/download/. Choose one from 3 types of installation: Binary, VM, Docker image. Leave your problems here if you get stuck or need help for installation.

Getting data into a druid is not an easy job. This is especially difficult for those who are new to Druid. You have to set up unfamiliar options such as segment and granularity. And it is not easy to find out why when ingestion fails. So, today, I’ve just made a point of what to keep in mind while creating a data source.

Creating a data source

What is a data source?

It is a unit of data contained in the druid, the data processing engine of Metatron Discovery. If you want to analyze your data with Metatron Discovery, you must add data to the Druid.

Where can I get the data?

There are six ways to load data sources.

  • File Load files (.csv, .xlsx) stored on your personal computer
  • Database Load data from jdbc method from SQL engine
  • Staging DB Load large amounts of data from Druid’s staging DB (mostly Hive) via MapReduce
  • Stream Real-time loading of data coming in via Kafka
  • Data Snapshot Loading from the result of processing in data preparation
  • Metatron Engine Load data from another version or already installed druid engine

The Staging Database is the intermediate waypoint for data loading. Druid uses Hive as staging. Importing data from staging db means importing the data table already created in Hive.

Setting up the data source schema

Next, you need to specify the role for each column. If you look at a sample of data, set it to a normal dimension for strings and measure for numbers. (Not always the case.) For a description of dimension and measure here: link

The important thing in this part is that the timestamp column, which is the basis for data partitioning, is a must, and the time representation must match the data. Otherwise, the load will fail with a high probability.

Setting the druid ingestion options

Next, set the data source loading options in detail. The defaults are set, but you need to fix them for your own data.

  • Partition keys If partition key exists in Hive table, it is possible to set to load only specific partition data
  • Query Granularity Unit of data query
  • Segment Granularity Data separation unit for distributed processing
  • Data Range A setting that limits the range of data to load on a periodic basis. The number of reducers is determined later in the mr job.
  • Rollup Whether data pre-aggregation is set when loading data. If you want the data source to be saved as it is, you can disable it. If you need query performance, you can enable it.
  • Advanced Setting Other settings can be added in json format. The classpath setting due to a library conflict issue is mainly used, and the memory setting due to an out of memory issue is frequently used.

Later, when you enter the name and description of the data source, you begin to create the data source.

Summarize

To summarize, when you create a data source, you must match the actual data with the type of data column you are setting up. Note the timestamp format in particular. Also remember that you may need to set the classpath or memory when you set the druid ingestion option.

In the next post, I’ll see what you need to know when loading fails. Let’s meet again!

Did you create a workbook with Metatron Discovery?
Then you can share your insightful workbook with your colleagues.

1. Create a shared workspace

The first thing you need to do is, create a shared workspace.
Go to the Workspace List and click the button on the Shared Workspace page.
Then just type a name and description.

 

You need to set up the Permission schema, but if you give the same permission to use workbook, notebook, and workbench, then using them by default will suffice.

 

2. Add members and groups

Click the icon at the top-right of the shared workspace home, and click Set shared member & group.

Simply click the name and group you want to share your workbook.

 

3. Move your workbook to the shared workspace

Click the check box in the workbook, then click Move selections.

 

If you can’t find the name of workspace you want to move, then check the datasource which is used in the workbook is published to the workspace.

 

The workbook has been shared in 3 simple steps.
I hope you can share your insights with Metatron Discovery.
If you’ve analyzed your business data and created a great dashboard or chart, you might want to include it on another system or web page.
Fortunately, Metatron Discovery provides a way to embed dashboards or charts in other applications.
First, let’s assume that a dashboard and chart have been pre-created through a particular user ‘metatron’.

1. Check the ID of a chart or dashboard

Use your browser’s developer tools to check the ID of each dashboard and chart.
These IDs are used for subsequent API calls.

2. Authenticate users and get tokens

Metatron Discovery allows users to issue OAuth Tokens and proceed with the authentication process.
To do this, Discovery provides an API to generate and deliver JWT-style tokens.

(POST) /oauth/token

Header Name Description Note
Authorization set encoded clientId/secret infomation using base64 discovery default client sample : “Basic cG9sYXJpc19jbGllbnQ6cG9sYXJpcw==”
Param Name Description Note
grant_type set “password”
scope set “read”
username target username
password password
Sample Request
var data = new FormData();

data.append("grant_type", "password");

data.append("scope", "read");

data.append("username", "metatron");

data.append("password", "metatron");




var xhr = new XMLHttpRequest();

xhr.withCredentials = true;




xhr.open("POST", "https://dev-discovery.metatron.app/oauth/token");

xhr.setRequestHeader("Authorization", "Basic cG9sYXJpc19jbGllbnQ6cG9sYXJpcw==");




xhr.send(data);
Sample Response
{

  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1NTYxMjY5MTUsInVzZXJfbmFtZSI6ImFkbWluIiwiYXV0aG9yaXRpZXMiOlsiUEVSTV9TWVNURU1fTUFOQUdFX0RBVEFTT1VSQ0UiLCJQRVJNX1NZU1RFTV9NQU5BR0VfUFJJVkFURV9XT1JLU1BBQ0UiLCJQRVJNX1NZU1RFTV9NQU5BR0VfVVNFUiIsIlBFUk1fU1lTVEVNX01BTkFHRV9TWVNURU0iLCJfX1BFUk1JU1NJT05fTUFOQUdFUiIsIl9fQURNSU4iLCJQRVJNX1NZU1RFTV9NQU5BR0VfU0hBUkVEX1dPUktTUEFDRSIsIl9fU0hBUkVEX1VTRVIiLCJQRVJNX1NZU1RFTV9WSUVXX1dPUktTUEFDRSIsIl9fREFUQV9NQU5BR0VSIiwiUEVSTV9TWVNURU1fTUFOQUdFX01FVEFEQVRBIiwiUEVSTV9TWVNURU1fTUFOQUdFX1dPUktTUEFDRSIsIl9fUFJJVkFURV9VU0VSIl0sImp0aSI6IjA2MWIxMGIzLWMzNTYtNGFkMC05YWI1LTA3MDJhYWI5MzVjMyIsImNsaWVudF9pZCI6ImpvdnRLciIsInNjb3BlIjpbInJlYWQiXX0.yqJhBDHZ3U6t2e9g2v6SlOcSyUn6JsMRHjSwDwsdiA4",

  "token_type": "bearer",

  "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX25hbWUiOiJhZG1pbiIsInNjb3BlIjpbInJlYWQiXSwiYXRpIjoiMDYxYjEwYjMtYzM1Ni00YWQwLTlhYjUtMDcwMmFhYjkzNWMzIiwiZXhwIjoxNTU4Njc1NzE1LCJhdXRob3JpdGllcyI6WyJQRVJNX1NZU1RFTV9NQU5BR0VfREFUQVNPVVJDRSIsIlBFUk1fU1lTVEVNX01BTkFHRV9QUklWQVRFX1dPUktTUEFDRSIsIlBFUk1fU1lTVEVNX01BTkFHRV9VU0VSIiwiUEVSTV9TWVNURU1fTUFOQUdFX1NZU1RFTSIsIl9fUEVSTUlTU0lPTl9NQU5BR0VSIiwiX19BRE1JTiIsIlBFUk1fU1lTVEVNX01BTkFHRV9TSEFSRURfV09SS1NQQUNFIiwiX19TSEFSRURfVVNFUiIsIlBFUk1fU1lTVEVNX1ZJRVdfV09SS1NQQUNFIiwiX19EQVRBX01BTkFHRVIiLCJQRVJNX1NZU1RFTV9NQU5BR0VfTUVUQURBVEEiLCJQRVJNX1NZU1RFTV9NQU5BR0VfV09SS1NQQUNFIiwiX19QUklWQVRFX1VTRVIiXSwianRpIjoiYzM4ZjgzYWQtYmNkMy00MzZhLTkzOWUtYjE3NzE5N2UwMWE4IiwiY2xpZW50X2lkIjoiam92dEtyIn0.NsJabkUQLmaLnnd7sPlzVecJfGNMQFDEVUd79J0D14Q",

  "expires_in": 43199,

  "scope": "read",

  "jti": "061b10b3-c356-4ad0-9ab5-0702aab935c3"

}

3. Include dashboards and charts in iFrame

Discovery provides APIs for passing charts and dashboard pages with authentication information.
This allows you to include widgets in external applications.

(POST) /api/sso

Param Name Description Note
token OAuth token
refreshToken OAuth refresh token
type Set “bearer”
userId username to view
forwardUrl forward embedded URL * Chart : http://{discovery domain}/app/v2/embedded/page/{chart_widget_id} * Dashboard : http://{discovery domain}/app/v2/embedded/dashboard/{dashbard_id}
Sample Request
http://{discovery domain}/api/sso?token=token&refreshToken=refreshToken&type=bearer&userId=metatron&forwardUrl=http://discovery.metatron.app/app/v2/embedded/dashboard/id
Iframe Sample Code

(for JavaScript)

function openMetatron(token, refreshToken, type, userId, redirectUri) {

    var target = 'metatron';

    var formName = 'metatronForm';

    let existForm = document.getElementsByName(formName)[ 0 ];

    if (existForm) {

        existForm.remove();

    }

    let form = document.createElement('form');

    form.setAttribute('name', formName);

    form.setAttribute('method', 'post');

    form.setAttribute('action', 'https://discovery.metatron.app/api/sso?token='+token+'&refreshToken='+refreshToken+'&type='+type+'&userId='+userId+'&forwardUrl='+redirectUri);

    form.setAttribute('target', target);

    document.getElementsByTagName('body')[ 0 ].appendChild(form);

    window.open('', target);

    form.submit();

}

(for TypeScript)

private openMetatron(returnUrl: string) {

    const target = 'metatron';

    const formName = 'metatronForm';

    const token = this.cookieService.get(CookieConstant.KEY.METATRON_TOKEN);

    const refreshToken = this.cookieService.get(CookieConstant.KEY.METATRON_REFRESH_TOKEN);

    const type = this.cookieService.get(CookieConstant.KEY.METATRON_TOKEN_TYPE);

    const userId = this.cookieService.get(CookieConstant.KEY.USER_ID);

    let existForm = document.getElementsByName(formName)[ 0 ];

    if (existForm) {

        existForm.remove();

    }

    let form = document.createElement('form');

    form.setAttribute('name', formName);

    form.setAttribute('method', 'post');

    form.setAttribute('action', `https://discovery.metatron.app/api/sso?token=${token}&refreshToken=${refreshToken}&type=${type}&userId=${userId}&forwardUrl=${returnUrl}`);

    form.setAttribute('target', target);

    document.getElementsByTagName('body')[ 0 ].appendChild(form);

    window.open('', target);

    form.submit();

}

4. It’s finished!

Above process allows you to embed a dashboard or chart  as shown below image