Apache Druid
A native connector to Druid ships with Superset (behind the DRUID_IS_ACTIVE
flag) but this is
slowly getting deprecated in favor of SQLAlchemy / DBAPI connector made available in the
pydruid library.
The connection string looks like:
druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
Customizing Druid Connection
When adding a connection to Druid, you can customize the connection a few different ways in the Add Database form.
Custom Certificate
You can add certificates in the Root Certificate field when configuring the new database connection to Druid:
When using a custom certificate, pydruid will automatically use https scheme.
Disable SSL Verification
To disable SSL verification, add the following to the Extras field:
engine_params:
{"connect_args":
{"scheme": "https", "ssl_verify_cert": false}}
Aggregations
Common aggregations or Druid metrics can be defined and used in Superset. The first and simpler use case is to use the checkbox matrix exposed in your datasource’s edit view (Sources -> Druid Datasources -> [your datasource] -> Edit -> [tab] List Druid Column).
Clicking the GroupBy and Filterable checkboxes will make the column appear in the related dropdowns while in the Explore view. Checking Count Distinct, Min, Max or Sum will result in creating new metrics that will appear in the List Druid Metric tab upon saving the datasource.
By editing these metrics, you’ll notice that their JSON element corresponds to Druid aggregation definition. You can create your own aggregations manually from the List Druid Metric tab following Druid documentation.
Post-Aggregations
Druid supports post aggregation and this works in Superset. All you have to do is create a metric,
much like you would create an aggregation manually, but specify postagg
as a Metric Type
. You
then have to provide a valid json post-aggregation definition (as specified in the Druid docs) in
the JSON field.