ZEMOSO ENGINEERING STUDIO
October 12, 2022
7 min read

Testing what doesn’t exist with a Wizard of Oz twist

When Dorothy, Toto, Scarecrow, Tin Woodman, and the Cowardly Lion set off for The Emerald City, [SPOILER ALERT: the city that doesn’t really exist], L. Frank Baum hardly imagined that there was an analogy for the world of product innovation to use somewhere in there. 

And yet, our industry has found it and is using it (and allow us to celebrate our nerdy side here) to test all kinds of things that don't exist. We used it extensively when we built our own proprietary product: Textractive, a Natural language processing (NLP)-led application which summarizes meetings and intuitively extracts action items and lists them for follow-up.

Zappos' CEO would buy the shoes from the store after he got an order. Stripe was processing credit card payments manually on the back end. Decades ago: IBM Speech-to-Text Experiment was a couple of people in a backroom, hidden from view, with incredible typing speeds. Aardvark did something similar as well. 

Why does it make sense to do Wizard of Oz testing?

Why does one need to test by creating an illusion? Why do you need Wizard of Oz testing at all? Simply put, artificial intelligence, machine learning (ML), augmented reality, or NLP-led product initiatives are expensive, require multiple iterations, and are time-consuming. They involve complex development cycles and incredible engineering expertise. Therefore, before any entrepreneur goes down that route, validating its value to customers before investing time, effort, and money is the only smart thing to do. 

Wizard of Oz testing fantastically allows you to test the problem-solution proposition in one fell swoop in a world where 35% of start-ups fail due to lack of demand and 8% fail due to flawed products

So, what is Wizard of Oz Testing?

Wizard of Oz testing stimulates an experience for a user as-is, but the backend system is actually being operated by a person, and answers are human-generated, as opposed to algorithm-generated.

For example, and we are oversimplifying this immensely, on the front end — a user wants to see which is the most frequently brought moisturizer with the serum that they just purchased. Now, instead of an algorithm doing the Application Programming Interface (API) call, and having a recommendation engine populate the answer, an actual person is feeding the answer back to the user based on certain rules. 

The most important questions to ask and answer are: 

   Is this a solution that businesses are actively looking for? 

   Are people willing to pay for this? 

When do we use Wizard of Oz testing?

Wizard of Oz testing is best employed when you need to verify and check the functionality of a product feature or capability and see if that has value to the user. It's ideally done when the build itself, even a pared-down version of the offering, is expensive, requires specific expertise, and requires a lot of time to build. 

It is the ultimate way to test ideas and the value they will have for a buyer/user where the traditional wire-framing, high-fidelity mock-up just won’t cut it. 

For many situations, a wireframe is sufficient. We would not recommend using this kind of testing investment to test simple user journey elements, but will insist on using it to test an Artificial Intelligence (AI) algorithm, a sensor-led solution, an NLP-driven platform, and so on. That would basically indicate - anywhere where the technical investment is significant and the outcomes are unknown. 

A couple of examples of companies that have done this spectacularly well: Zappos, Stripe, Expensify, and so on. 

What are the challenges in setting up a Wizard-of-Oz test?

Legality

The ideal environment to get the most informative results is when the user doesn’t know about the test. However, in most jurisdictions, we must ensure data transparency: share exact details on what elements, how, and where the data is used without giving the test away. In Healthcare, security, and FinTech industries this can get especially tricky.   

Ethics

While setting up a Wizard of Oz test is an exercise in learning, the last thing you want is your users and buyers walking away feeling like they have been leveraged. And Theranos hasn’t done a world of skeptics any favors with their fast one. 

Feasibility

Machines are doing a really good job of imitating humans these days, but it can be difficult to train a human to imitate a machine. In order for a Wizard of Oz to give you consistent data, the human needs to follow machine logic in most circumstances, so you will need to write up incredibly detailed instructions for this to be successful and get you the data that you are after. 

You need to have a clear understanding of how the product will operate, and what is technically feasible, and then train humans to provide this information in the same way that you architected it. 

How do you overcome the challenges of setting up a Wizard of Oz Hack?

In general, overcoming the first two hurdles is consulting with counsel, and being transparent. If you are in a highly regulated market with low trust, make sure to go over the details of what you propose to do, and if you can, fill in the users while you are providing a service. They are the lucky recipients of an early version where they will get the same service, and a bit of hand-holding as some things are still being worked out. For the last one: feasibility - hire experts who’ve done it before and can create the right setup. It might have become harder/easier with remote environments, depending on who you ask though. 

Case studies:

Pretending to be a human language processing algorithm

It can take months (if not years) to train a machine learning algorithm with data, and most successful entrepreneurs we’ve encountered are unwilling to pour in that kind of investment without some validation first or wait that long. 

The product we were building, in this case, was an NLP solution that summarizes phone calls, takes notes, and detects intent to complete a financial transaction. 

Humans are better at summarizing human speech and detecting intent than computers are (tone being an incredible giveaway). So the trick with this one was to work with data scientists, data engineers, and natural language processing gurus (all of whom are at Zemoso) to develop a human protocol that mimicked the proposed machine protocol. 

It was decided that the key components of this algorithm would be 1) transcription 2) summarization, and 3) action items. We would deliver the service via email, with a two-minute delay. 

For transcription, we used readily available transcription software and copy-pasted that into the email in a predefined format. In the summary, we copy-pasted key sentences directly from the transcription, taking out pauses. In action items, we looked at what each party agreed to do, and copied and pasted those portions into the action items portion of the email, providing scores for higher/lower intent conversations. 

The key here was to make it something a machine could feasibly do, using keyword cues around verb usage, time commitments, etc. We ensured that  the tone, intonations, etc. weren’t accounted for because the first version of the Minimum Viable Product (MVP) wouldn’t be able to do that. 

This Wizard-of-Oz setup tested well on value parameters for usage and monetization benchmarks for buyers. This product is the core of a billion-dollar offering. Though significantly refined later, that early validation is what these startup visionaries needed to double down on it. 

Pretending to be an insurance marketplace

The healthcare industry, and by extension the related insurance industry, is highly regulated. Trust and compliance are keys to success in this market, and several apparent Wizard of Oz testing setups have made the news (and even some movies) that betrayed customer and industry trust. Therefore, we're exceptionally conscious of that when setting up testing scenarios for the healthcare industry. 

This is a solution for businesses looking to offer benefits digitally, which employees can use in their everyday lives to stay healthier and reduce costs. This entrepreneurial team has relationships with insurance providers, and data handling was set up in a way to ensure privacy and compliance with Health Insurance Portability and Accountability Act (HIPAA) regulations. 

It was decided here that key components of this platform would be identifying lower-cost, vetted and highly rated providers nearer to the employee’s address. This allows employees to track and earn rewards for meeting healthy goals, and working one-on-one with a health coach. 

This experiment was run with one large office of one large company. Employees were told that this was a new service, that the company was evaluating, and that this service would only be provided on an opt-in basis. 

Employees were provided with a human-researched list of providers in their area, indexed by services, reviews, office hours, and distance from the office. They were provided with a spreadsheet, made with a popular spreadsheet tool, where they could input their goals and track their progress towards the goal, and had weekly meetings with a certified health coach. 

Based on the test results, businesses that took this approach could save $100 million annually on healthcare. Now that this is a product, the actual savings are considerably more. 

Pretending to be a FinTech platform

The financial services industry is another highly regulated industry. We mocked a technical solution to get an idea about market and user viability. 

The idea here was for a buy-now, pay-later solution for certain high-spends, where acceptance would be decided by a machine learning algorithm. It was decided that key components of this platform would be an easy application process with an algorithm making the final decision, and an easy interface for users to input information that would determine the result. 

Success would be measured by a business' time to close the deal, increase in deal closures, and successful payback completions. Initially, some guardrails were put in place to limit risks around the max deal amount, etc.

We created a questionnaire, which asked the applicant standard financial questions. Users knew that an actual person would be looking at their data, and that it was stored and handled in compliance with privacy and industry regulations. Once the form was filled, a human looked through the data and decided whether to approve the application using the same criteria that the algorithm would eventually use. They would send a text in five minutes with a go, no-go decision. 

The business received the full amount for the deal, upfront. The test ultimately proved that there were plenty of businesses willing to pay for this. In fact, many of the Wizard of Oz test people became early adopters. 

Conclusion

Wizard of Oz testing is tricky to get right, but when it is done well, it answers important questions about the business and financial viability of an idea while limiting risk. They are best applicable when the cost of being wrong is too high, and when the investment in time or money is extensive. 

We generally recommend them for technology products that involve a machine learning algorithm that needs to be trained with extensive data sets to mimic a human, and platforms that need to integrate with a bevy of other products. You need to have the significant technical knowledge to design an experiment where you are mimicking a technology product, and how best to set up an experiment to get data on whether to make a significant investment, and when to hide the curtain or show the curtain and how that'll affect relevant data.

ZEMOSO ENGINEERING STUDIO

Testing what doesn’t exist with a Wizard of Oz twist

October 12, 2022
7 min read

When Dorothy, Toto, Scarecrow, Tin Woodman, and the Cowardly Lion set off for The Emerald City, [SPOILER ALERT: the city that doesn’t really exist], L. Frank Baum hardly imagined that there was an analogy for the world of product innovation to use somewhere in there. 

And yet, our industry has found it and is using it (and allow us to celebrate our nerdy side here) to test all kinds of things that don't exist. We used it extensively when we built our own proprietary product: Textractive, a Natural language processing (NLP)-led application which summarizes meetings and intuitively extracts action items and lists them for follow-up.

Zappos' CEO would buy the shoes from the store after he got an order. Stripe was processing credit card payments manually on the back end. Decades ago: IBM Speech-to-Text Experiment was a couple of people in a backroom, hidden from view, with incredible typing speeds. Aardvark did something similar as well. 

Why does it make sense to do Wizard of Oz testing?

Why does one need to test by creating an illusion? Why do you need Wizard of Oz testing at all? Simply put, artificial intelligence, machine learning (ML), augmented reality, or NLP-led product initiatives are expensive, require multiple iterations, and are time-consuming. They involve complex development cycles and incredible engineering expertise. Therefore, before any entrepreneur goes down that route, validating its value to customers before investing time, effort, and money is the only smart thing to do. 

Wizard of Oz testing fantastically allows you to test the problem-solution proposition in one fell swoop in a world where 35% of start-ups fail due to lack of demand and 8% fail due to flawed products

So, what is Wizard of Oz Testing?

Wizard of Oz testing stimulates an experience for a user as-is, but the backend system is actually being operated by a person, and answers are human-generated, as opposed to algorithm-generated.

For example, and we are oversimplifying this immensely, on the front end — a user wants to see which is the most frequently brought moisturizer with the serum that they just purchased. Now, instead of an algorithm doing the Application Programming Interface (API) call, and having a recommendation engine populate the answer, an actual person is feeding the answer back to the user based on certain rules. 

The most important questions to ask and answer are: 

   Is this a solution that businesses are actively looking for? 

   Are people willing to pay for this? 

When do we use Wizard of Oz testing?

Wizard of Oz testing is best employed when you need to verify and check the functionality of a product feature or capability and see if that has value to the user. It's ideally done when the build itself, even a pared-down version of the offering, is expensive, requires specific expertise, and requires a lot of time to build. 

It is the ultimate way to test ideas and the value they will have for a buyer/user where the traditional wire-framing, high-fidelity mock-up just won’t cut it. 

For many situations, a wireframe is sufficient. We would not recommend using this kind of testing investment to test simple user journey elements, but will insist on using it to test an Artificial Intelligence (AI) algorithm, a sensor-led solution, an NLP-driven platform, and so on. That would basically indicate - anywhere where the technical investment is significant and the outcomes are unknown. 

A couple of examples of companies that have done this spectacularly well: Zappos, Stripe, Expensify, and so on. 

What are the challenges in setting up a Wizard-of-Oz test?

Legality

The ideal environment to get the most informative results is when the user doesn’t know about the test. However, in most jurisdictions, we must ensure data transparency: share exact details on what elements, how, and where the data is used without giving the test away. In Healthcare, security, and FinTech industries this can get especially tricky.   

Ethics

While setting up a Wizard of Oz test is an exercise in learning, the last thing you want is your users and buyers walking away feeling like they have been leveraged. And Theranos hasn’t done a world of skeptics any favors with their fast one. 

Feasibility

Machines are doing a really good job of imitating humans these days, but it can be difficult to train a human to imitate a machine. In order for a Wizard of Oz to give you consistent data, the human needs to follow machine logic in most circumstances, so you will need to write up incredibly detailed instructions for this to be successful and get you the data that you are after. 

You need to have a clear understanding of how the product will operate, and what is technically feasible, and then train humans to provide this information in the same way that you architected it. 

How do you overcome the challenges of setting up a Wizard of Oz Hack?

In general, overcoming the first two hurdles is consulting with counsel, and being transparent. If you are in a highly regulated market with low trust, make sure to go over the details of what you propose to do, and if you can, fill in the users while you are providing a service. They are the lucky recipients of an early version where they will get the same service, and a bit of hand-holding as some things are still being worked out. For the last one: feasibility - hire experts who’ve done it before and can create the right setup. It might have become harder/easier with remote environments, depending on who you ask though. 

Case studies:

Pretending to be a human language processing algorithm

It can take months (if not years) to train a machine learning algorithm with data, and most successful entrepreneurs we’ve encountered are unwilling to pour in that kind of investment without some validation first or wait that long. 

The product we were building, in this case, was an NLP solution that summarizes phone calls, takes notes, and detects intent to complete a financial transaction. 

Humans are better at summarizing human speech and detecting intent than computers are (tone being an incredible giveaway). So the trick with this one was to work with data scientists, data engineers, and natural language processing gurus (all of whom are at Zemoso) to develop a human protocol that mimicked the proposed machine protocol. 

It was decided that the key components of this algorithm would be 1) transcription 2) summarization, and 3) action items. We would deliver the service via email, with a two-minute delay. 

For transcription, we used readily available transcription software and copy-pasted that into the email in a predefined format. In the summary, we copy-pasted key sentences directly from the transcription, taking out pauses. In action items, we looked at what each party agreed to do, and copied and pasted those portions into the action items portion of the email, providing scores for higher/lower intent conversations. 

The key here was to make it something a machine could feasibly do, using keyword cues around verb usage, time commitments, etc. We ensured that  the tone, intonations, etc. weren’t accounted for because the first version of the Minimum Viable Product (MVP) wouldn’t be able to do that. 

This Wizard-of-Oz setup tested well on value parameters for usage and monetization benchmarks for buyers. This product is the core of a billion-dollar offering. Though significantly refined later, that early validation is what these startup visionaries needed to double down on it. 

Pretending to be an insurance marketplace

The healthcare industry, and by extension the related insurance industry, is highly regulated. Trust and compliance are keys to success in this market, and several apparent Wizard of Oz testing setups have made the news (and even some movies) that betrayed customer and industry trust. Therefore, we're exceptionally conscious of that when setting up testing scenarios for the healthcare industry. 

This is a solution for businesses looking to offer benefits digitally, which employees can use in their everyday lives to stay healthier and reduce costs. This entrepreneurial team has relationships with insurance providers, and data handling was set up in a way to ensure privacy and compliance with Health Insurance Portability and Accountability Act (HIPAA) regulations. 

It was decided here that key components of this platform would be identifying lower-cost, vetted and highly rated providers nearer to the employee’s address. This allows employees to track and earn rewards for meeting healthy goals, and working one-on-one with a health coach. 

This experiment was run with one large office of one large company. Employees were told that this was a new service, that the company was evaluating, and that this service would only be provided on an opt-in basis. 

Employees were provided with a human-researched list of providers in their area, indexed by services, reviews, office hours, and distance from the office. They were provided with a spreadsheet, made with a popular spreadsheet tool, where they could input their goals and track their progress towards the goal, and had weekly meetings with a certified health coach. 

Based on the test results, businesses that took this approach could save $100 million annually on healthcare. Now that this is a product, the actual savings are considerably more. 

Pretending to be a FinTech platform

The financial services industry is another highly regulated industry. We mocked a technical solution to get an idea about market and user viability. 

The idea here was for a buy-now, pay-later solution for certain high-spends, where acceptance would be decided by a machine learning algorithm. It was decided that key components of this platform would be an easy application process with an algorithm making the final decision, and an easy interface for users to input information that would determine the result. 

Success would be measured by a business' time to close the deal, increase in deal closures, and successful payback completions. Initially, some guardrails were put in place to limit risks around the max deal amount, etc.

We created a questionnaire, which asked the applicant standard financial questions. Users knew that an actual person would be looking at their data, and that it was stored and handled in compliance with privacy and industry regulations. Once the form was filled, a human looked through the data and decided whether to approve the application using the same criteria that the algorithm would eventually use. They would send a text in five minutes with a go, no-go decision. 

The business received the full amount for the deal, upfront. The test ultimately proved that there were plenty of businesses willing to pay for this. In fact, many of the Wizard of Oz test people became early adopters. 

Conclusion

Wizard of Oz testing is tricky to get right, but when it is done well, it answers important questions about the business and financial viability of an idea while limiting risk. They are best applicable when the cost of being wrong is too high, and when the investment in time or money is extensive. 

We generally recommend them for technology products that involve a machine learning algorithm that needs to be trained with extensive data sets to mimic a human, and platforms that need to integrate with a bevy of other products. You need to have the significant technical knowledge to design an experiment where you are mimicking a technology product, and how best to set up an experiment to get data on whether to make a significant investment, and when to hide the curtain or show the curtain and how that'll affect relevant data.

Recent Publications
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
Docs, Guides, Resources: Getting developer microsites right in a product-led world
Docs, Guides, Resources: Getting developer microsites right in a product-led world
September 20, 2022
Beyond methodologies: Five engineering do's for an agile product build
Beyond methodologies: Five engineering do's for an agile product build
September 5, 2022
Zemoso’s next big move: Entering Europe with new offices open in London
Zemoso’s next big move: Entering Europe with new offices open in London
August 29, 2022
Actual access control without getting in the way of actual work: 2023
Actual access control without getting in the way of actual work: 2023
March 13, 2023
Breaking the time barrier: Test Automation and its impact on product launch cycles
Breaking the time barrier: Test Automation and its impact on product launch cycles
January 20, 2023
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
Product innovation for today and the future! It’s outcome-first, timeboxed, and accountable
January 9, 2023
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
From "great potential" purgatory to "actual usage" reality: getting SDKs right in a product-led world
December 6, 2022
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
Why Realm trumps SQLite as database of choice for complex mobile apps — Part 2
October 13, 2022
ZEMOSO ENGINEERING STUDIO
October 12, 2022
7 min read

Testing what doesn’t exist with a Wizard of Oz twist

When Dorothy, Toto, Scarecrow, Tin Woodman, and the Cowardly Lion set off for The Emerald City, [SPOILER ALERT: the city that doesn’t really exist], L. Frank Baum hardly imagined that there was an analogy for the world of product innovation to use somewhere in there. 

And yet, our industry has found it and is using it (and allow us to celebrate our nerdy side here) to test all kinds of things that don't exist. We used it extensively when we built our own proprietary product: Textractive, a Natural language processing (NLP)-led application which summarizes meetings and intuitively extracts action items and lists them for follow-up.

Zappos' CEO would buy the shoes from the store after he got an order. Stripe was processing credit card payments manually on the back end. Decades ago: IBM Speech-to-Text Experiment was a couple of people in a backroom, hidden from view, with incredible typing speeds. Aardvark did something similar as well. 

Why does it make sense to do Wizard of Oz testing?

Why does one need to test by creating an illusion? Why do you need Wizard of Oz testing at all? Simply put, artificial intelligence, machine learning (ML), augmented reality, or NLP-led product initiatives are expensive, require multiple iterations, and are time-consuming. They involve complex development cycles and incredible engineering expertise. Therefore, before any entrepreneur goes down that route, validating its value to customers before investing time, effort, and money is the only smart thing to do. 

Wizard of Oz testing fantastically allows you to test the problem-solution proposition in one fell swoop in a world where 35% of start-ups fail due to lack of demand and 8% fail due to flawed products

So, what is Wizard of Oz Testing?

Wizard of Oz testing stimulates an experience for a user as-is, but the backend system is actually being operated by a person, and answers are human-generated, as opposed to algorithm-generated.

For example, and we are oversimplifying this immensely, on the front end — a user wants to see which is the most frequently brought moisturizer with the serum that they just purchased. Now, instead of an algorithm doing the Application Programming Interface (API) call, and having a recommendation engine populate the answer, an actual person is feeding the answer back to the user based on certain rules. 

The most important questions to ask and answer are: 

   Is this a solution that businesses are actively looking for? 

   Are people willing to pay for this? 

When do we use Wizard of Oz testing?

Wizard of Oz testing is best employed when you need to verify and check the functionality of a product feature or capability and see if that has value to the user. It's ideally done when the build itself, even a pared-down version of the offering, is expensive, requires specific expertise, and requires a lot of time to build. 

It is the ultimate way to test ideas and the value they will have for a buyer/user where the traditional wire-framing, high-fidelity mock-up just won’t cut it. 

For many situations, a wireframe is sufficient. We would not recommend using this kind of testing investment to test simple user journey elements, but will insist on using it to test an Artificial Intelligence (AI) algorithm, a sensor-led solution, an NLP-driven platform, and so on. That would basically indicate - anywhere where the technical investment is significant and the outcomes are unknown. 

A couple of examples of companies that have done this spectacularly well: Zappos, Stripe, Expensify, and so on. 

What are the challenges in setting up a Wizard-of-Oz test?

Legality

The ideal environment to get the most informative results is when the user doesn’t know about the test. However, in most jurisdictions, we must ensure data transparency: share exact details on what elements, how, and where the data is used without giving the test away. In Healthcare, security, and FinTech industries this can get especially tricky.   

Ethics

While setting up a Wizard of Oz test is an exercise in learning, the last thing you want is your users and buyers walking away feeling like they have been leveraged. And Theranos hasn’t done a world of skeptics any favors with their fast one. 

Feasibility

Machines are doing a really good job of imitating humans these days, but it can be difficult to train a human to imitate a machine. In order for a Wizard of Oz to give you consistent data, the human needs to follow machine logic in most circumstances, so you will need to write up incredibly detailed instructions for this to be successful and get you the data that you are after. 

You need to have a clear understanding of how the product will operate, and what is technically feasible, and then train humans to provide this information in the same way that you architected it. 

How do you overcome the challenges of setting up a Wizard of Oz Hack?

In general, overcoming the first two hurdles is consulting with counsel, and being transparent. If you are in a highly regulated market with low trust, make sure to go over the details of what you propose to do, and if you can, fill in the users while you are providing a service. They are the lucky recipients of an early version where they will get the same service, and a bit of hand-holding as some things are still being worked out. For the last one: feasibility - hire experts who’ve done it before and can create the right setup. It might have become harder/easier with remote environments, depending on who you ask though. 

Case studies:

Pretending to be a human language processing algorithm

It can take months (if not years) to train a machine learning algorithm with data, and most successful entrepreneurs we’ve encountered are unwilling to pour in that kind of investment without some validation first or wait that long. 

The product we were building, in this case, was an NLP solution that summarizes phone calls, takes notes, and detects intent to complete a financial transaction. 

Humans are better at summarizing human speech and detecting intent than computers are (tone being an incredible giveaway). So the trick with this one was to work with data scientists, data engineers, and natural language processing gurus (all of whom are at Zemoso) to develop a human protocol that mimicked the proposed machine protocol. 

It was decided that the key components of this algorithm would be 1) transcription 2) summarization, and 3) action items. We would deliver the service via email, with a two-minute delay. 

For transcription, we used readily available transcription software and copy-pasted that into the email in a predefined format. In the summary, we copy-pasted key sentences directly from the transcription, taking out pauses. In action items, we looked at what each party agreed to do, and copied and pasted those portions into the action items portion of the email, providing scores for higher/lower intent conversations. 

The key here was to make it something a machine could feasibly do, using keyword cues around verb usage, time commitments, etc. We ensured that  the tone, intonations, etc. weren’t accounted for because the first version of the Minimum Viable Product (MVP) wouldn’t be able to do that. 

This Wizard-of-Oz setup tested well on value parameters for usage and monetization benchmarks for buyers. This product is the core of a billion-dollar offering. Though significantly refined later, that early validation is what these startup visionaries needed to double down on it. 

Pretending to be an insurance marketplace

The healthcare industry, and by extension the related insurance industry, is highly regulated. Trust and compliance are keys to success in this market, and several apparent Wizard of Oz testing setups have made the news (and even some movies) that betrayed customer and industry trust. Therefore, we're exceptionally conscious of that when setting up testing scenarios for the healthcare industry. 

This is a solution for businesses looking to offer benefits digitally, which employees can use in their everyday lives to stay healthier and reduce costs. This entrepreneurial team has relationships with insurance providers, and data handling was set up in a way to ensure privacy and compliance with Health Insurance Portability and Accountability Act (HIPAA) regulations. 

It was decided here that key components of this platform would be identifying lower-cost, vetted and highly rated providers nearer to the employee’s address. This allows employees to track and earn rewards for meeting healthy goals, and working one-on-one with a health coach. 

This experiment was run with one large office of one large company. Employees were told that this was a new service, that the company was evaluating, and that this service would only be provided on an opt-in basis. 

Employees were provided with a human-researched list of providers in their area, indexed by services, reviews, office hours, and distance from the office. They were provided with a spreadsheet, made with a popular spreadsheet tool, where they could input their goals and track their progress towards the goal, and had weekly meetings with a certified health coach. 

Based on the test results, businesses that took this approach could save $100 million annually on healthcare. Now that this is a product, the actual savings are considerably more. 

Pretending to be a FinTech platform

The financial services industry is another highly regulated industry. We mocked a technical solution to get an idea about market and user viability. 

The idea here was for a buy-now, pay-later solution for certain high-spends, where acceptance would be decided by a machine learning algorithm. It was decided that key components of this platform would be an easy application process with an algorithm making the final decision, and an easy interface for users to input information that would determine the result. 

Success would be measured by a business' time to close the deal, increase in deal closures, and successful payback completions. Initially, some guardrails were put in place to limit risks around the max deal amount, etc.

We created a questionnaire, which asked the applicant standard financial questions. Users knew that an actual person would be looking at their data, and that it was stored and handled in compliance with privacy and industry regulations. Once the form was filled, a human looked through the data and decided whether to approve the application using the same criteria that the algorithm would eventually use. They would send a text in five minutes with a go, no-go decision. 

The business received the full amount for the deal, upfront. The test ultimately proved that there were plenty of businesses willing to pay for this. In fact, many of the Wizard of Oz test people became early adopters. 

Conclusion

Wizard of Oz testing is tricky to get right, but when it is done well, it answers important questions about the business and financial viability of an idea while limiting risk. They are best applicable when the cost of being wrong is too high, and when the investment in time or money is extensive. 

We generally recommend them for technology products that involve a machine learning algorithm that needs to be trained with extensive data sets to mimic a human, and platforms that need to integrate with a bevy of other products. You need to have the significant technical knowledge to design an experiment where you are mimicking a technology product, and how best to set up an experiment to get data on whether to make a significant investment, and when to hide the curtain or show the curtain and how that'll affect relevant data.

Recent Publications

ZEMOSO ENGINEERING STUDIO

Beyond methodologies: Five engineering do's for an agile product build

September 5, 2022
6 min read
ZEMOSO ENGINEERING STUDIO

How we built a big data platform for a futuristic AgriTech product

June 3, 2022
8 min read
ZEMOSO NEWS

Zemoso Labs starts operations in Waterloo, Canada

May 25, 2022
5 min read
ZEMOSO ENGINEERING STUDIO

Honorable mention at O’Reilly’s Architectural Katas event

May 17, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Product dev with testable spring boot applications, from day one

May 4, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

When not to @Autowire in Spring or Spring Boot applications

May 1, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Efficiently handle data and integrations in Spring Boot

January 24, 2021
5 min read
ZEMOSO ENGINEERING STUDIO

Our favorite CI/CD DevOps Practice: Simplify with GitHub Actions

October 25, 2020
5 min read
ZEMOSO ENGINEERING STUDIO

How to use BERT and DNN to build smarter NLP algorithms for products

February 14, 2020
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL — Why is it essential for agile product development?

April 30, 2019
12 min read
ZEMOSO ENGINEERING STUDIO

GraphQL with Java Spring Boot and Apollo Angular for Agile platforms

April 30, 2019
9 min read
ZEMOSO ENGINEERING STUDIO

Deploying Airflow on Kubernetes 

November 30, 2018
2 min read
ZEMOSO PRODUCT STUDIO

How to validate your Innovation: Mastering Experiment Design

November 22, 2018
8 min read
ZEMOSO PRODUCT STUDIO

Working Backwards: Amazon's Culture of Innovation: My notes

November 19, 2018
8 min read
ZEMOSO ENGINEERING STUDIO

Product developer POV: Caveats when building with Spark

November 5, 2018
2 min read

Want more best practices?

Access thought-leadership and best practice content across
the product development lifecycle

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2023  Zemoso Technologies
Privacy Policy

Terms of Use
LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies

© 2021 Zemoso Technologies
Privacy Policy

LinkedIn Page - Zemoso TechnologiesFacebook Page - Zemoso TechnologiesTwitter Account - Zemoso Technologies
May 25, 2023