Machine Learning and Artisanal Testing: An Interview with Daria Mehra

interview

February 9, 2018

Summary

In this interview, Daria Mehra, the director of quality engineering at Quid, explains how people can use machine learning to better contextualize data, details the complexity of test automation and how to be sure you have enough test coverage, and defines the term “artisanal testing.”

Jennifer Bonine: All right, we are back with another interview. Daria, thanks for joining us.

Daria Mehra: Thank you, I'm glad to be here.

Jennifer Bonine: We're excited you're here with our virtual audience. So, you work for a really interesting organization, so maybe for the folks out there, first start with the company you work for, what they do, and then how maybe we can transition into talking a little bit about machine learning and where you see that going, and the analytics piece of it, and getting smarter as we look at patterns and repeatability ... things like that. So maybe start with who you work for and what they're doing, which is exciting in and of itself, and then we can talk a little bit about the machine learning piece of it.

Daria Mehra: Sounds good. I work at a San Francisco startup called Quid. I'm the director of quality engineering for Quid, and Quid is in the natural language processing data analytics space. And what we do is business decision-making support for executives, for enterprise customers. We've presented to the World Economic Forum on several occasions now, we're kind of big in the data science space.

So, imagine Google search, but you don't get a long page of results from your search. You get this amazing-looking network where there are clusters of information collected based on the content of the news articles that match your search. So you can quickly look at different opinions on the subject, different events that happened, and you don't have to dig through pages and pages of results about the same thing, we just take care of all of that.

Jennifer Bonine: Right.

Daria Mehra: And then once you've made sense of the topic, you can slice and dice with different charts and make PowerPoint presentations, or take it to your board of directors ... It's really great stuff. So you can do analysis on current news. We're sitting on a year's worth of English language news-

Jennifer Bonine: Wow.

Daria Mehra: Company data, patent data ... or you can upload your own data.

Jennifer Bonine: Wow.

Daria Mehra: Amazon reviews, or whatever have you ... It's huge, it's complex, there's a lot of data science core to this thing.

Jennifer Bonine: Yeah.

Daria Mehra: And interesting. So it's a really exciting product to be testing.

Jennifer Bonine: Well, it sounds like it. Very complex, though, I mean, in nature, right?

Daria Mehra: Very complex.

Jennifer Bonine: We're not just talking ... I mean, the nature of what you're trying to do and provide in the accuracy of the search is so critical, and providing good, quality information back to the consumer is what they're looking for. So, how does the product work? Is it ... You purchase it through licensing, or is it ... How is it available to folks I guess?

Daria Mehra: It's enterprise licenses, so we work with some really large companies, some smaller clients with just one or two licenses.

Jennifer Bonine: Right.

Daria Mehra: So you have to come with quality guarantees, because it's not like a social media product where if your Facebook went down for ten minutes, your Facebook went down ...

Jennifer Bonine: Right.

Daria Mehra: I mean, I've heard people call 911 when Facebook goes down.

Jennifer Bonine: Oh no.

Daria Mehra: True story. But when Quid goes down, that's a bit more impactful to the decision-makers of the world.

Jennifer Bonine: Yep. No, exactly, not a good thing for you or your organization that has to test. Now, one of the things you talked about here at the conference was your experience in test automation, because this is, again, complex, and needing to have assurances that it is working, and having the appropriate coverage and level of testing. So I believe you compared multiple techniques. Do you want to talk about what those are that you compared, and your experience with those?

Daria Mehra: Yes. So for Quid, we absolutely need interim tests that interact with the product like a user would. So one known way to do that would be through automating tests with Selenium, Selenium WebDriver.

Jennifer Bonine: Absolutely.

Daria Mehra: It's a tool that a lot of people use, and to make those tests reliable, or try to best that you can make them reliable. You have to write a lot of code, so you have to have very highly-qualified coding test engineers who will write code in Python, or Ruby, or JavaScript or what have you-

Jennifer Bonine: Yeah.

Daria Mehra: That is layered with Page Object Model in the middle, and then talking to the browser through Selenium so you can get this automated workflow happening. I've done this. It can be made to work, but it's a lot of work for a highly-qualified team, and I've found that you need to have at least a one to four ratio of test engineers to developers. One to three is preferred for that path ...

Jennifer Bonine: Yeah.

Daria Mehra: It doesn't have to be higher test engineers. Yeah, you can make your developers do it, but they would spend a quarter of their time doing it.

Jennifer Bonine: Doing it.

Daria Mehra: So it comes with that cost. So this other path that I've trying in the last year and a half, going on two years now, is there's this new wave of crowdsourcing solutions. So I use a platform called Rainforest QA, and what they let you do is write the test in English. There is no code. The tests are written in English. It can be completely unstructured, free-form English for people to understand, or you can make it a little bit structured so you can reuse test steps.

Jennifer Bonine: Right.

Daria Mehra: It's up to you. And when you run your tests, the crowdsourcing platform finds testers on demand, thousands of them. That's how paralyzed, massively paralyzed you want your tests to run.

Jennifer Bonine: Yeah.

Daria Mehra: And they step through the test steps exactly as you define them, and the platform uses machine learning to figure out quorum decision-making. So once results are reported back to you, it's not what one person said, it's a majority vote on your test, because it's executed by multiple people. So you can trust that if they're saying it failed, it actually did fail. It's not some random occurrence of somebody's network failing on you, it's actually a product failure.

Jennifer Bonine: Yeah.

Daria Mehra: And if it passes, you can be assured that it actually passed and that there's nothing wrong with that workflow.

Jennifer Bonine: Right.

Daria Mehra: And I found that to be a really good approach to doing end-to-end UI based testing, because it's a much lighter requirement on your test engineers. So I have a highly technical team of test engineers, but it's small. And they're focused on what I'm calling "artisanal testing." You heard it here first.

Jennifer Bonine: Yeah. I love that term.

Daria Mehra: Artisanal testing. That's what I want my people to do ...

Jennifer Bonine: Yeah.

Daria Mehra: ... who have this knowledge and context on the Quid product. I want them focused on this exploratory bug discovery, something that cannot possibly be crowdsourced, outsourced ...

Jennifer Bonine: Right.

Daria Mehra: ... automated, given to anybody else. I need my people on that. However, regression tests, doing something over and over ...

Jennifer Bonine: Yeah.

Daria Mehra: I'm really happy with the crowd out there doing it for me.

Jennifer Bonine: Yeah.

Daria Mehra: And I don't even feel bad that their job is boring, because it's not. Because at 11 in the morning they're testing Quid, at noon they're testing whatever have you, some payment application ...

Jennifer Bonine: Right. Yeah.

Daria Mehra: Something else. So they have an exciting job too.

Jennifer Bonine: It's a lot of variety.

Daria Mehra: A lot of variety for them. And my tests are no longer tied to the implementation of the UI, so if my developers pick up a JavaScript framework of the day, rewrite the UI, it looks the same but it's completely differently powered.

Jennifer Bonine: Yeah.

Daria Mehra: On the code level, I don't care, because my tests are not talking to it at the code level.

Jennifer Bonine: No.

Daria Mehra: They're talking to it like a human would so it continues working for me. Now of course if we change the business logic, then there's no magic.

Jennifer Bonine: No.

Daria Mehra: We'll change the test. It's just what has to happen.

Jennifer Bonine: Yeah, exactly. Yep.

Daria Mehra: And then when my test fails, I have a human leaving a note for me on why they failed it. They would tell me, "This red error message popped up, and here's the screenshot." And then I know exactly why it failed. There is no debugging process about it.

Jennifer Bonine: Right.

Daria Mehra: I have reports coming to Slack, that's kind of where I live, on Slack.

Jennifer Bonine: Yeah. Exactly.

Daria Mehra: And I just see the notes, and I'm like, "Oh, okay. So ten tests failed for clearly the same reason. I have one bug to file. Great."

Jennifer Bonine: Right. Interesting.

Daria Mehra: So it's been working really well, and I'm excited about the future of this, because what I hear the provider wants to do is use this data that they're assembling, because it's a huge data-labeling task that the testers are basically performing for them.

Jennifer Bonine: Yeah.

Daria Mehra: On a lot of applications, they're recording, "Here's the expected behavior, and here are the ways it can fail."

Jennifer Bonine: Yeah.

Daria Mehra: That's an enormous pre-labeled data set that they're sitting, growing bigger by the minute.

Jennifer Bonine: Yeah.

Daria Mehra: So the eventual direction is, of course, for all machine-learning added and have this test automated ...

Jennifer Bonine: Right, where you don't even then need the person.

Daria Mehra: I don't need to ... Except when it fails, it can be farmed out right back to the person. So there's this fallback path, so you will know that if the test failed, the failure gets confirmed by a human. That is great. That's major. So I'm really looking forward to this, because I think it's a great addition to our toolkit as testers to have this way to automate.

Jennifer Bonine: Yeah. Interesting. And I like that term Artisanal Testers, because people have worried or talk about what happens to testers with AI machine learning, where does that go, but understanding that they're still there, they perform different functions, and it's actually an even more specialized, highly-coveted skill.

Daria Mehra: Oh yes. And it's the context that it's all about. Because yes, you can hand a lot of things over to the machine if you have either a wealth of experience that can be fed into the machine learning algorithm, or you have specifications that are very precisely defined. So again, you can train a machine on, when was the last time you had complete specifications for a new product?

Jennifer Bonine: Right.

Daria Mehra: That does not happen. It's very creative and intuitive, and flowing, and a lot of word-of-mouth hallway conversations.

Jennifer Bonine: Yeah.

Daria Mehra: How do you feed that into the machine?

Jennifer Bonine: Right.

Daria Mehra: So what you can farm out is this, not quite testing, but checking, you know, regression checks.

Jennifer Bonine: Yeah, the checking.

Daria Mehra: That's what the machine learning is for today. And potentially, some cookie-cutter applications like yet another shopping cart website, yeah, you can probably automate the testing of that right off the bat, not putting the human into the loop at all, because it's not a new product.

Jennifer Bonine: No.

Daria Mehra: What we're doing at Quid is revolutionary. It's completely new.

Jennifer Bonine: Yeah, completely new.

Daria Mehra: There is no history of this kind of thing existing, so there is nothing to train the machines on.

Jennifer Bonine: Right.

Daria Mehra: And that's what I imagine the future of software development being. Right now, yeah, there's a lot of people who are building the same, or kind of variance of the same. Again, I look at JavaScript, how many frameworks are there to do like charts on top of …

Jennifer Bonine: Right.

Daria Mehra: There's like a dozen plus. We keep doing the same, kind of trying it in different ways. That's temporary. That will go away. The future is people working on unique, new things.

Jennifer Bonine: Yep.

Daria Mehra: Because same-old, same-old can be handed over to the machine.

Jennifer Bonine: Right. Interesting. Such an interesting topic. Our time is already up, it goes so fast. Daria, if people want to contact you or have more questions about what we talked about, or say, "I'm really interested in learning more," how is the best way to contact you?

Daria Mehra: Google for "Tester's Digest." That's a testing newsletter I run. It comes out about weekly.

Jennifer Bonine: Tester's Digest.

Daria Mehra: Once you go to Tester's Digest, you will find a Contact Me button, and just email me.

Jennifer Bonine: That's great. Perfect, there you go. You heard it here, Tester's Digest, and you'll be able to find Daria. Thank you, Daria. And we will be back with more interviews after lunch.

Daria Mehra has been a bug huntress in the SF Bay Area since the roaring 2000s. Her software development background in distributed systems prepared her for the challenges of testing SaaS and appliance products in data storage space and, more recently, web-based data analytics. As a director of quality engineering, Daria built two QE teams from scratch—once at a startup that open sourced the new analytics programming language Juttle and now at Quid, Inc., a data intelligence platform. Daria is excited to experiment with novel testing approaches for data science and data quality, including crowdsourced test execution. A proud recipient of the “Best QA Team” award at Testathon 2014, Daria now runs Tester's Digest, a QA newsletter.

Topics:

analysis test automation test design test execution test techniques testing

About The Author

Jennifer Bonine

PinkLion’s CEO and Co-Founder Jennifer Bonine is the first female Artificial Intelligence (“AI”) testing tech CEO. Representing the United Nations sustainable development goals of equality, inclusivity, and promise in AI technology, PinkLion is the only company successfully integrating, delivering, and managing AI-based testing for gaming platforms and games without access to the code. PinkLion partners with Test.AI to retrain client workforces, resolving AI testing challenges previously considered impenetrable. Employing a human engagement and AI-first strategy, Jennifer collaborates with entertainment, gaming, media, and sports industries facing AI-based scrutiny.

A guest speaker at Davos’s World Economic Forum, Jennifer will be featured at the UN’s AI for Good summit. She is a member of AI Grrls and TeamWomen, a supporter of Lead the Way, and is a Founding Board Member of the US bid for a Minnesota World Expo 2027. PinkLion is a Google (Gradient) AI Backed Company.

Community Sponsor

User Comments

2 comments

Alisha Henderson

March 21, 2018 - 9:52am EDT

Hello buddy,

I read this content really awesome.You provided another one great article.I hope this information may change my business carrier.I can remember these things whenever taking the decision. I am actually getting ready to across this information, It's very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.

Regards

Alisha

Alisha Henderson

March 21, 2018 - 9:53am EDT

Hello buddy,

Regards

Alisha