Running on Kubernetes
Running on Kubernetes is supported with the provided Helm chart found in the official Superset helm repository.
Prerequisites
- A Kubernetes cluster
- Helm installed
Running
- Add the Superset helm repository
helm repo add superset https://apache.github.io/superset
"superset" has been added to your repositories
- View charts in repo
helm search repo superset
NAME CHART VERSION APP VERSION DESCRIPTION
superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b...
- Configure your setting overrides
Just like any typical Helm chart, you'll need to craft a values.yaml
file that would define/override any of the values exposed into the default values.yaml, or from any of the dependent charts it depends on:
More info down below on some important overrides you might need.
- Install and run
helm upgrade --install --values my-values.yaml superset superset/superset
You should see various pods popping up, such as:
kubectl get pods
NAME READY STATUS RESTARTS AGE
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
superset-init-db-zlm9z 0/1 Completed 0 111s
superset-postgresql-0 1/1 Running 0 6d20h
superset-redis-master-0 1/1 Running 0 6d20h
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s
The exact list will depend on some of your specific configuration overrides but you should generally expect:
- N
superset-xxxx-yyyy
andsuperset-worker-xxxx-yyyy
pods (depending on yourreplicaCount
value) - 1
superset-postgresql-0
depending on your postgres settings - 1
superset-redis-master-0
depending on your redis settings - 1
superset-celerybeat-xxxx-yyyy
pod if you havesupersetCeleryBeat.enabled = true
in your values overrides
- Access it
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:
- Configure the Service as a
LoadBalancer
orNodePort
- Set up an
Ingress
for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...) - Run
kubectl port-forward superset-xxxx-yyyy :8088
to directly tunnel one pod's port into your localhost
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with:
- user:
admin
- password:
admin
Important settings
Security settings
Default security settings and passwords are included but you SHOULD override those with your own, in particular:
postgresql:
postgresqlPassword: superset
Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate a sufficiently random sequence.
- To generate a good key you can run,
openssl rand -base64 42
configOverrides:
secret: |
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
If you want to change the previous secret key then you should rotate the keys.
Default secret key for kubernetes deployment is thisISaSECRET_1234
configOverrides:
my_override: |
PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY'
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
init:
command:
- /bin/sh
- -c
- |
. {{ .Values.configMountPath }}/superset_bootstrap.sh
superset re-encrypt-secrets
. {{ .Values.configMountPath }}/superset_init.sh
Dependencies
Install additional packages and do any other bootstrap configuration in this script. For production clusters it's recommended to build own image with this step done in CI. The following example installs the Big Query and Elasticsearch database drivers so that you can connect to those datasources in your Superset installation.
bootstrapScript: |
#!/bin/bash
pip install psycopg2==2.9.1 \
redis==3.2.1 \
pybigquery==2.26.0 \
elasticsearch-dbapi==0.2.5 &&\
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
superset_config.py
The default superset_config.py
is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in configOverrides
, e.g.:
configOverrides:
my_override: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
FEATURE_FLAGS = {
"DYNAMIC_PLUGINS": True
}
Those will be evaluated as Helm templates and therefore will be able to reference other values.yaml
variables e.g. {{ .Values.ingress.hosts[0] }}
will resolve to your ingress external domain.
The entire superset_config.py
will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that.
Full python files can be provided by running helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py
Environment Variables
Those can be passed as key/values either with extraEnv
or extraSecretEnv
if they're sensitive. They can then be referenced from superset_config.py
using e.g. os.environ.get("VAR")
.
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SMTP_PASSWORD: xxxx
configOverrides:
smtp: |
import ast
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
System packages
If new system packages are required, they can be installed before application startup by overriding the container's command
, e.g.:
supersetWorker:
command:
- /bin/sh
- -c
- |
apt update
apt install -y somepackage
apt autoremove -yqq --purge
apt clean
# Run celery worker
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
Data sources
Data source definitions can be automatically declared by providing key/value yaml definitions in extraConfigs
:
extraConfigs:
import_datasources.yaml: |
databases:
- allow_file_upload: true
allow_ctas: true
allow_cvas: true
database_name: example-db
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\
}"
sqlalchemy_uri: example://example-db.local
tables: []
Those will also be mounted as secrets and can include sensitive parameters.
Configuration Examples
Setting up OAuth
extraEnv:
AUTH_DOMAIN: example.com
extraSecretEnv:
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
configOverrides:
enable_oauth: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
from flask_appbuilder.security.manager import (AUTH_OAUTH, AUTH_DB)
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"client_id": os.getenv("GOOGLE_KEY"),
"client_secret": os.getenv("GOOGLE_SECRET"),
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"client_kwargs": {"scope": "email profile"},
"request_token_url": None,
"access_token_url": "https://accounts.google.com/o/oauth2/token",
"authorize_url": "https://accounts.google.com/o/oauth2/auth",
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
},
}
]
# Map Authlib roles to superset roles
AUTH_ROLE_ADMIN = 'Admin'
AUTH_ROLE_PUBLIC = 'Public'
# Will allow user self registration, allowing to create Flask users from Authorized User
AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Admin"
Enable Alerts and Reports
For this, as per the Alerts and Reports doc, you will need to:
Install a supported webdriver in the Celery worker
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the command
. Here's a working example for chromedriver
:
supersetWorker:
command:
- /bin/sh
- -c
- |
# Install chrome webdriver
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
apt update
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
mv chromedriver /usr/bin
apt autoremove -yqq --purge
apt clean
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
# Run
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
Run the Celery beat
This pod will trigger the scheduled tasks configured in the alerts and reports UI section:
supersetCeleryBeat:
enabled: true
Configure the appropriate Celery jobs and SMTP/Slack settings
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SLACK_API_TOKEN: xoxb-xxxx-yyyy
SMTP_PASSWORD: xxxx-yyyy
configOverrides:
feature_flags: |
import ast
FEATURE_FLAGS = {
"ALERT_REPORTS": True
}
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
celery_conf: |
from celery.schedules import crontab
class CeleryConfig(object):
broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
task_annotations = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 600,
'soft_time_limit': 600,
'ignore_result': True,
},
}
beat_schedule = {
'reports.scheduler': {
'task': 'reports.scheduler',
'schedule': crontab(minute='*', hour='*'),
},
'reports.prune_log': {
'task': 'reports.prune_log',
'schedule': crontab(minute=0, hour=0),
},
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute='*/30', hour='*'),
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 10,
'since': '7 days ago',
},
}
}
CELERY_CONFIG = CeleryConfig
reports: |
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
WEBDRIVER_TYPE= "chrome"
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--disable-gpu",
"--disable-dev-shm-usage",
# This is required because our process runs as root (in order to install pip packages)
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-extensions",
]