ZEMOSO ENGINEERING STUDIO
June 3, 2022
8 min read

How we built a big data platform for a futuristic AgriTech product

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

Recent Publications
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
Testing what doesn’t exist with a Wizard of Oz twist
Testing what doesn’t exist with a Wizard of Oz twist
October 12, 2022
Docs, Guides, Resources: Getting developer microsites right in a product-led world
Docs, Guides, Resources: Getting developer microsites right in a product-led world
September 20, 2022
Beyond methodologies: Five engineering do's for an agile product build
Beyond methodologies: Five engineering do's for an agile product build
September 5, 2022
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
ZEMOSO ENGINEERING STUDIO
June 3, 2022
8 min read

How we built a big data platform for a futuristic AgriTech product

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.  

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers. 

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.   

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had: 

1. Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)

2. Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites

3. Data from United Nations Food and Agriculture Organization (FAO)

4. Data from World Health Organization (WHO)

5. Commodities market data

6. Data from grain markets

 Agricultural data sets ingested and analyzed to build a SaaS platform

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale. 

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions. 

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities. 

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work. 

We accomplished in days, what would have taken months to process. 

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years. 

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability. 

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world. 

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team. 

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market. 

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information. 

Recent Publications

ZEMOSO ENGINEERING STUDIO

Testing what doesn’t exist with a Wizard of Oz twist

October 12, 2022
7 min read
ZEMOSO ENGINEERING STUDIO

Beyond methodologies: Five engineering do's for an agile product build

September 5, 2022
6 min read
ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

How to use BERT and DNN to build smarter NLP algorithms for products

February 14, 2020
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL with Java Spring Boot and Apollo Angular for Agile platforms

April 30, 2019
9 min read
ZEMOSO ENGINEERING STUDIO

Deploying Airflow on Kubernetes 

November 30, 2018
2 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2023  Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
May 25, 2023