ZEMOSO ENGINEERING STUDIO
June 5, 2022
8 min read

How we built a big data platform for a futuristic AgriTech product

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

Recent Publications
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
How to remix Amazon’s Working Backwards with Google’s Venture’s User Journey: The Dr. Strange Way
June 14, 2022
Zemoso Labs starts operations in Waterloo, Canada
Zemoso Labs starts operations in Waterloo, Canada
May 25, 2022
Deconstructing Elon Musk’s dog ate my homework answer for Twitter: More validation will be asked of startups
Deconstructing Elon Musk’s dog ate my homework answer for Twitter: More validation will be asked of startups
May 20, 2022
Real Talk: Lessons learned and evolved from 3M and Post-it®’s adoption of Crazy 8 methodology
Real Talk: Lessons learned and evolved from 3M and Post-it®’s adoption of Crazy 8 methodology
April 10, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 1
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 1
February 7, 2022
Understanding dynamic multi-column search with JPA Criteria for product development
Understanding dynamic multi-column search with JPA Criteria for product development
January 2, 2022
Product engineering zeitgeist: Implement OOPs in React JS using atomic design
Product engineering zeitgeist: Implement OOPs in React JS using atomic design
July 8, 2021
Honorable mention at O’Reilly’s Architectural Katas event
Honorable mention at O’Reilly’s Architectural Katas event
May 17, 2021
Product dev with testable spring boot applications, from day one
Product dev with testable spring boot applications, from day one
May 4, 2021
When not to @Autowire in Spring or Spring Boot applications
When not to @Autowire in Spring or Spring Boot applications
May 1, 2021
Efficiently handle data and integrations in Spring Boot
Efficiently handle data and integrations in Spring Boot
January 24, 2021
Unlock the power of atomic design in React and React Native using a theming library — Part 1
Unlock the power of atomic design in React and React Native using a theming library — Part 1
November 2, 2020
Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions
Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions
October 25, 2020
Kubernetes — What is it, what problems does it solve, and how does it compare with alternatives?
Kubernetes — What is it, what problems does it solve, and how does it compare with alternatives?
June 21, 2019
GraphQL — Why is it essential for agile product development?
GraphQL — Why is it essential for agile product development?
April 30, 2019
Orchestrating backend services for product development with AWS Step Functions
Orchestrating backend services for product development with AWS Step Functions
April 1, 2019
How To Decide When to Use Amazon Working Backwards, Business Model Canvas and Lean Canvas
How To Decide When to Use Amazon Working Backwards, Business Model Canvas and Lean Canvas
November 30, 2018
How to validate your Innovation: Mastering Experiment Design
How to validate your Innovation: Mastering Experiment Design
November 22, 2018
Working Backwards: Amazon's Culture of Innovation: My notes
Working Backwards: Amazon's Culture of Innovation: My notes
November 19, 2018
Product developer POV: Caveats when building with Spark
Product developer POV: Caveats when building with Spark
November 5, 2018
ZEMOSO ENGINEERING STUDIO
June 3, 2022
8 min read

How we built a big data platform for a futuristic AgriTech product

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

Recent Publications

ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2021 Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
June 7, 2022