Engineering

How We Built a

Big Data Platform

for a Futuristic AgriTech Product

Zemoso Engineering Studio

Wednesday, July 6, 2022

[data-wf-bgvideo-fallback-img] { display: none; } @media (prefers-reduced-motion: reduce) { [data-wf-bgvideo-fallback-img] { position: absolute; z-index: -100; display: inline-block; height: 100%; width: 100%; object-fit: cover; } }

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

The world population is expected to increase to 9.7 billion by 2050, and the demand for food by 60%. Will the agricultural industry be able to keep up? How will the world ensure food security? AgriTech products are leveraging modern technologies to fill the gap here.

There's enough data generated by the agricultural industry and many adjacent industries to bring predictability to forecasting food production and create contingency plans. This data is scattered, siloed, unstructured, and warehoused by different private and public organizations using different metrics, local languages, and formats. The different types of data (though siloed) are production, demand, supply, crop prices, satellite images, topography reports, weather forecasts, and soil quality reports. There are many companies that are trying to break through these barriers.

We worked with one such company and built a comprehensive data and analytics platform, which provided insights on demand-supply, and climate and soil conditions. It aided stakeholders across the agriculture value chain to make informed decisions, faster.

How did we do it?

Zemoso helped them ingest the diverse data sets, analyze it, and build a Software-as-a-Service (SaaS) platform with dashboards and modeling systems.

The data included global and regional data with daily updates gathered through both scraping and Application Programming Interface (API). It also had:

Raw satellite imagery such as Normalized Difference Vegetation Index (NDVI)
Rainfall and Evapotranspiration data from National Oceanic and Atmospheric Administration (NOAA) satellites
Data from United Nations Food and Agriculture Organization (FAO)
Data from World Health Organization (WHO)
Commodities market data
Data from grain markets

Agricultural data sets ingested and analyzed to build a SaaS platform

‍

Deep tech solutions applied here include image processing, data pipelines, and distributed cluster architectures.

Our team normalized the data and automated continuous quality checks on the ingested data from external sources. We implemented the computing logic and built the data processing platform for managing the data at scale.

The big data processing challenge

In addition to incrementally processing daily feeds, one recurring challenge was to process massive satellite raster datasets covering the entire globe. This data is vital to analyze the progression of planetary climate and identify patterns that help forecast impending weather conditions.

We helped our client develop innovative processing techniques like overlaying pre-generated geo-rasters for efficient processing, leveraging distributed storage and workers, and scheduling and orchestrating the activities.

Instead of using supra-large hardware, we maximized using smaller instances, which enabled us to iterate and scale rapidly. We built a processing cluster using an open-source distributed scheduler that could be scaled up and down based on the requirement to speed up or slow down work.

We accomplished in days, what would have taken months to process.

Consider this, data that captures every 10m x 10m block of the planet and does this every day for over 15 years.

We did memory optimizations and virtual machine (VM) sizing to ensure that no task was starved of Random Access Memory (RAM). We used libraries like Mapnik C++ to re-render these rasters so that consumers could view them and perform time travel over a region. Additionally, we performed multiple layering and mapping optimizations to enable this capability.

The data (including imagery) was fed to other pipelines to compute metrics, which in turn helped create aggregate visualizations, both map-based, and charts-based. We generated data and map layers to visualize the data and imagery over time and across the world.

The whole platform was packaged as a very customizable platform that allowed users to select and dissect indicators of interest, compare them, visualize them, and share them.

The future

With the essentials firmly in place, Zemoso transitioned the future growth activities for the platform to the client’s internal team.

The platform continues to evolve, incorporating new data feeds and capabilities such as Artificial Intelligence (AI) - based learning models. That coupled with human insights enabled by powerful data visualization capabilities — is a vital industry asset to navigate uncertainty of the global food and agri-market.

‍P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information.

Zemoso Technologies

Dallas, USA

5151 Headquarters Dr, Suite 185, Plano, Texas, 75024

Waterloo, Canada

180 Northfield Dr W Unit # 4, Waterloo, Ontario N2L 0C7

Hyderabad, India

802/803, MJR Magnifique, X Road, Raidurgam, Khajaguda, Hyderabad, Telangana: 500008

Get in touch

Careers

Zemoso