ZEMOSO ENGINEERING STUDIO
November 30, 2018
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

ZEMOSO ENGINEERING STUDIO

Deploying Airflow on Kubernetes 

November 30, 2018
2 min read

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
Testing what doesn’t exist with a Wizard of Oz twist
Testing what doesn’t exist with a Wizard of Oz twist
October 12, 2022
Docs, Guides, Resources: Getting developer microsites right in a product-led world
Docs, Guides, Resources: Getting developer microsites right in a product-led world
September 20, 2022
Beyond methodologies: Five engineering do's for an agile product build
Beyond methodologies: Five engineering do's for an agile product build
September 5, 2022
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
ZEMOSO ENGINEERING STUDIO
November 30, 2018
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications

ZEMOSO ENGINEERING STUDIO

Testing what doesn’t exist with a Wizard of Oz twist

October 12, 2022
7 min read
ZEMOSO ENGINEERING STUDIO

Beyond methodologies: Five engineering do's for an agile product build

September 5, 2022
6 min read
ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read
ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

How to use BERT and DNN to build smarter NLP algorithms for products

February 14, 2020
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL with Java Spring Boot and Apollo Angular for Agile platforms

April 30, 2019
9 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2023  Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
May 25, 2023