flynn.gg

Christopher Flynn

Data Scientist
Data Engineer
PhD Mathematician

Home
Projects
Open Source
Blog
Résumé

GitHub
LinkedIn

Blog


Airflow (1.9) Admin Extensions

2018-10-18 Feed

I’ve been maintaining an Apache Airflow cluster in production for the last six months. In this time, the number of automated jobs has grown from about 20 to nearly 100, ingesting data from over 30 sources and handling distributed workflows with Celery workers and outsourced assistance from AWS ECS Fargate and Spark clusters on our Databricks platform.

A lot of the previously written DAGs were created from legacy ingestion jobs, which were carelessly written for single use cases. When onboarding new clients for an existing data source, a lot of extra work was going into generalizing the code for that source’s data retrieval, as well as generalizing the credential structure for the service stored as an Airflow Variable in its backend metastore (PostgreSQL).

It was relatively time consuming to regularly update the codebase in this way. In addition, some of these tasks involved provisioning or modifying existing infrastructure elsewhere in the backend stack or adding entries to an external centralized database with configuration or credentials. Since Airflow had already become the core component of our ETL, it made sense to add some additional functionality by extending its web UI that would automate these processes.

The Airflow web application is built in Flask, and the UI is exposed using Flask-Admin. Airflow provides plugin support such that one can extend functionality by adding additional admin views, blueprints, and templates. Since figuring out how to extend the web UI, I’ve used this feature extensively to automate the addition (and validation) of new credential sets, new ingestion DAGs, and automatic configuration and provisioning of backend components.

Here is a simple example of a plugin that adds a Services tab, with a Service Management link to the Airflow admin UI menu at the top of the web view. It consists of a Flask blueprint and an instance of a subclassed Flask-Admin BaseView that includes your endpoints and functionality.

from airflow.plugins_manager import AirflowPlugin
from flask import Blueprint
from flask_admin import BaseView
from flask_admin import expose


# example dynamic data for the view
SERVICES = [
    "service_a",
    "service_b",
    "service_c",
]

# This blueprint is necessary to expose the templates folder to the admin view.
service_blueprint = Blueprint(
    "services", __name__,
    template_folder="templates",  # templates folder relative to this file
    static_folder="static",  # static folder relative to this file
    static_url_path="/static/self_service"
)


class ServiceView(BaseView):

    @expose("/")
    def services(self):
        # business logic here
        return self.render("main.html", content=SERVICES)

    @expose("/<service>", methods=["GET", "POST"])
    def service(self, service):
        # business logic here
        return self.render("service.html", content=content)


service_view = ServiceView(
    category="Services",  # The name of the tab on the airflow top menu
    name="Service Management",  # The name of the drop down item from the tab
    endpoint="extension"  # The path of the view, e.g. airflow.example.com/admin/extension
)


class AirflowServicePlugin(AirflowPlugin):
    # The name of your plugin (str)
    name = "service"
    # A list of objects created from a class derived
    # from flask_admin.BaseView
    admin_views = [service_view]
    # A list of Blueprint object created from flask.Blueprint
    flask_blueprints = [service_blueprint]

In order to maintain the existing layout of Airflow, you can extend the admin/master.html that Airflow uses for all its admin pages. Here’s the templates/main.html jinja2 template, for example.

{% extends 'admin/master.html' %}
{% block body %}
    <h1>Services</h1>
    <h3>Select a service to modify configs</h3>
    {% for entity in content %}
        <p><a href="{{ url_for('extension.service'|replace('service', entity), service=entity)}}">{{ entity }}</a></p>
    {% endfor %}
{% endblock %}

Airflow’s web interface also uses Bootstrap, so you can include the Bootstrap library’s components in your templates to make them look a bit more consistent with the rest of the admin views.

Further reading

Airflow

Flask

Bootstrap

Apache Airflow

Back to the posts.