ZEMOSO ENGINEERING STUDIO
August 1, 2022
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

ZEMOSO ENGINEERING STUDIO

Deploying Airflow on Kubernetes 

November 30, 2018
2 min read

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
Winning first place at O'Reilly Media’s Architectural Katas — Spring 2022
July 13, 2022
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
June 14, 2022
How we built a big data platform for a futuristic AgriTech product
How we built a big data platform for a futuristic AgriTech product
June 3, 2022
Zemoso Labs starts operations in Waterloo, Canada
Zemoso Labs starts operations in Waterloo, Canada
May 25, 2022
Deconstructing Elon Musk’s dog ate my homework answer for Twitter: More validation will be asked of startups
Deconstructing Elon Musk’s dog ate my homework answer for Twitter: More validation will be asked of startups
May 20, 2022
Real Talk: Lessons learned and evolved from 3M and Post-it®’s adoption of Crazy 8 methodology
Real Talk: Lessons learned and evolved from 3M and Post-it®’s adoption of Crazy 8 methodology
April 10, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 1
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 1
February 7, 2022
Understanding dynamic multi-column search with JPA Criteria for product development
Understanding dynamic multi-column search with JPA Criteria for product development
January 2, 2022
Product engineering zeitgeist: Implement OOPs in React JS using atomic design
Product engineering zeitgeist: Implement OOPs in React JS using atomic design
July 8, 2021
Honorable mention at O’Reilly’s Architectural Katas event
Honorable mention at O’Reilly’s Architectural Katas event
May 17, 2021
Product dev with testable spring boot applications, from day one
Product dev with testable spring boot applications, from day one
May 4, 2021
When not to @Autowire in Spring or Spring Boot applications
When not to @Autowire in Spring or Spring Boot applications
May 1, 2021
Refactored our Android code from zero to hero for a product rescue project
Refactored our Android code from zero to hero for a product rescue project
April 4, 2021
Efficiently handle data and integrations in Spring Boot
Efficiently handle data and integrations in Spring Boot
January 24, 2021
Transforming product engineering with atomic design and a theming library — Part 2
Transforming product engineering with atomic design and a theming library — Part 2
November 15, 2020
Unlock the power of atomic design in React and React Native using a theming library — Part 1
Unlock the power of atomic design in React and React Native using a theming library — Part 1
November 2, 2020
Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions
Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions
October 25, 2020
How to use BERT and DNN to build smarter NLP algorithms for products
How to use BERT and DNN to build smarter NLP algorithms for products
February 14, 2020
Kubernetes — What is it, what problems does it solve, and how does it compare with alternatives?
Kubernetes — What is it, what problems does it solve, and how does it compare with alternatives?
June 21, 2019
GraphQL — Why is it essential for agile product development?
GraphQL — Why is it essential for agile product development?
April 30, 2019
GraphQL with Java Spring Boot and Apollo Angular for Agile platforms
GraphQL with Java Spring Boot and Apollo Angular for Agile platforms
April 30, 2019
Orchestrating backend services for product development with AWS Step Functions
Orchestrating backend services for product development with AWS Step Functions
April 1, 2019
How To Decide When to Use Amazon Working Backwards, Business Model Canvas and Lean Canvas
How To Decide When to Use Amazon Working Backwards, Business Model Canvas and Lean Canvas
November 30, 2018
How to validate your Innovation: Mastering Experiment Design
How to validate your Innovation: Mastering Experiment Design
November 22, 2018
Working Backwards: Amazon's Culture of Innovation: My notes
Working Backwards: Amazon's Culture of Innovation: My notes
November 19, 2018
Product developer POV: Caveats when building with Spark
Product developer POV: Caveats when building with Spark
November 5, 2018
ZEMOSO ENGINEERING STUDIO
November 30, 2018
2 min read

Deploying Airflow on Kubernetes 

Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.

Kubernetes is an open-source product by Google for managing production environments at mega-scale. 

Architecture
Architecture 

Why Airflow on Kubernetes?

Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster. 

KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time? 

Execution of KubernetesPodOperator
Execution of KubernetesPodOperator

The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified. 

Then, how do we do that?

Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.

Modified execution of KubernetesPodOperator
Modified execution of KubernetesPodOperator

Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.

Instead of running as below:

We run it as follows: 

Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator

An earlier version of this blog was published on Medium by the author 

Recent Publications

ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read
ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

How to use BERT and DNN to build smarter NLP algorithms for products

February 14, 2020
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL with Java Spring Boot and Apollo Angular for Agile platforms

April 30, 2019
9 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2021 Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
August 1, 2022