ZEMOSO ENGINEERING STUDIO
November 30, 2018
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

ZEMOSO ENGINEERING STUDIO

Deploying Airflow on Kubernetes 

November 30, 2018
2 min read

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications
Zemoso’s next big move: Entering Europe with new offices open in London
Zemoso’s next big move: Entering Europe with new offices open in London
August 29, 2022
Beyond Methodologies: 5 engineering do’s for an agile product
Beyond Methodologies: 5 engineering do’s for an agile product
August 28, 2022
Docs, Guides, Resources: Getting developer microsites right in a product-led world
Docs, Guides, Resources: Getting developer microsites right in a product-led world
August 16, 2022
Drone Technology: The upswing of its usage and value realization in EnergyTech
Drone Technology: The upswing of its usage and value realization in EnergyTech
August 10, 2022
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
July 13, 2022
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
June 14, 2022
How we built a big data platform for a futuristic AgriTech product
How we built a big data platform for a futuristic AgriTech product
June 3, 2022
Zemoso Labs starts operations in Waterloo, Canada
Zemoso Labs starts operations in Waterloo, Canada
May 25, 2022
Zemoso’s next big move: Entering Europe with new offices open in London
Zemoso’s next big move: Entering Europe with new offices open in London
August 29, 2022
Beyond Methodologies: 5 engineering do’s for an agile product
Beyond Methodologies: 5 engineering do’s for an agile product
August 28, 2022
Docs, Guides, Resources: Getting developer microsites right in a product-led world
Docs, Guides, Resources: Getting developer microsites right in a product-led world
August 16, 2022
Drone Technology: The upswing of its usage and value realization in EnergyTech
Drone Technology: The upswing of its usage and value realization in EnergyTech
August 10, 2022
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
July 13, 2022
ZEMOSO ENGINEERING STUDIO
November 30, 2018
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications

ZEMOSO ENGINEERING STUDIO

Beyond Methodologies: 5 engineering do’s for an agile product

August 28, 2022
6 min read
ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read
ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

How to use BERT and DNN to build smarter NLP algorithms for products

February 14, 2020
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL with Java Spring Boot and Apollo Angular for Agile platforms

April 30, 2019
9 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2021 Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
September 19, 2022