How did we do this?

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

The team aligned quickly, pivoted as needed, tested rapidly, and delivered consistent, incremental wins throughout the partnership.

We helped our client speed up development and evolution cycles, consistently delivering well-designed applications. After evaluating multiple providers, we chose Amazon Web Services (AWS) for its greater flexibility, scalability, and lowest developer friction with SDKs. In partnership with their internal engineering team, we designed a multi-layered platform using a microservices architecture for a modular and agile system. The team used DevOps best practices for continuous integration and deployment to expand capabilities more quickly.

Upgrading the tech stack

For each functionality, we evaluated many competing tech solutions and selected the best-suited solutions with the client’s engineering leadership.

User interface:‍

Polymer JS was chosen to create the next-gen user interface with added controls for tiered access protocols. It eased setting up different application elements and their relations.

Events stream processing:

The events stream processing (key to developing deduplication functionality) was built using Apache Storm, a highly vetted solution for real-time stream processing. The team conducted a thorough evaluation between Apache Spark and Apache Storm. Apache Storm was a better fit than Spark, which is a far more complex, general purpose computation engine.

Messaging layer:

To ensure smart deduplication whenever a record is added or updated, Kafka messaging layer transfers the data between applications. It is a fast and scalable event distributor. The ‘deduplication’ and ‘match and merge’ queues were robust and processed high volumes of data with minimal downtime or data loss. It integrated seamlessly with Apache Storm for real-time streaming data analysis, which helped their clients gain faster insights.

Improved searchability:

For search, the team chose Elasticsearch. It is a distributed, free and open-sourced search and analytics engine that powers search solutions for global giants like Microsoft, Netflix, Slack, and Uber. This text-based, NoSQL search tool proved highly useful in indexing data points needed to fulfill search parameters. The team used advanced indexing techniques like Ngram to generate superior match results. For instance, with only the first three letters as input, the search engine could match and reflect the product name.

Custom deduplication algorithm

As different operators, suppliers, and vendors enter product data into the platform, it is important to maintain the integrity of the product information. One key functionality that enables that is deduplication. Core functions provided by a retailer depend on these systems accessing and displaying accurate information. We helped our MDM client ensure that the same product is not listed twice under different unique ids.

Our team developed and deployed the deduplication algorithm on top of their newly upgraded tech stack. This algorithm showcased that the tech stack we evaluated and set up worked as intended.

The algorithm used similarity triggers around name, brand, color, etc. to flag potential duplicates with a probability index in a database of over a million records. Every change in the system is an ‘event’. Each event then goes through a ‘match and merge’ protocol, which is added to a queue. In the case of a suspect match, the event is assigned to a different queue to be manually resolved, thus maintaining the right records for critical business functions.

Reporting Dashboards

We co-created dashboards to improve access to learnings from the master data that the business can leverage. These were designed to help client’ customers visualize data and create intelligent reports for better analysis. Some of these dashboards helped analyze change trends, update summaries, workflow SLAs, governance summaries, and so on.

The team used Kibana, which is a browser-based analytics and search dashboard for Elasticsearch. This enabled quicker analysis and faster compilation of the data, and users could easily share these reports with stakeholders. It also helped detect learnings in Elasticsearch data with machine learning features.

P.S. Since we work on early-stage products, many of them in stealth mode, we have strict Non-disclosure agreements (NDAs). The data, insights, and capabilities discussed in this blog have been anonymized to protect our client’s identity and don’t include any proprietary information.