#158: The Evolution of Testing & Optimization: Looking Back and Looking Forward with Ton Wesseling

59:25
 
Compartilhar
 

Manage episode 282098944 series 2681163
Por Michael Helbling, Tim Wilson, and Moe Kiss, Michael Helbling, Tim Wilson, and Moe Kiss descoberto pelo Player FM e nossa comunidade - Os direitos autorais são de propriedade do editor, não do Player FM, e o áudio é transmitido diretamente de seus servidores. Toque no botão Assinar para acompanhar as atualizações no Player FM, ou copie a feed URL em outros aplicativos de podcast.

Google bought Urchin in 2005 and, virtually overnight, made digital analytics available to all companies, no matter how large or how small. Optimizely was founded in January 2010 and had a similar (but lesser) impact on the world of A/B testing. What can we learn from ruminating on the past, the present, and the future (server-side testing! sample ratio mismatch checking! Bayesian approaches!) of experimentation? Quite a bit, if we pull in an industry veteran and pragmatic thinker like Ton Wesseling from Online Dialogue!

Books, Concepts, and Communities Mentioned in the Show

Episode Transcript

[music]

0:00:04.4 Announcer: Welcome to the Digital Analytics Power Hour. Michael, Moe, Tim, and the occasional guest discussing analytics issues of the day, and periodically using explicit language while doing so. Find them on the web at analyticshour.io, and on Twitter, @analyticshour. And now, the Digital Analytics Power Hour.

0:00:27.7 Michael Helbling: Hi everyone. Welcome to the Digital Analytics Power Hour, this is episode 158. It’s a new year, 2021. At the beginning of a new year, it’s great. It’s a great time to look forward to what is coming up, but it’s also still a good time to look back and use what has happened in the past. In this case, testing and optimization. It’s been a while since we’ve done a show on this topic, and like any good A/B test, we’re gonna go into some of the history and also some of the future, and you can let us know which you like best. Okay, Tim Wilson, you ready to kick off another year of podcasting?

0:01:06.0 Tim Wilson: Well, I think it’s whether people know whether they’re listening to which version of this episode they’re actually listening to is the big question.

0:01:13.2 MH: Yeah, but if our technology works the way it’s supposed to, it’ll be seamless to the end user.

0:01:18.3 TW: Unless they come back and listen to it again, if they stop and pause it and come back for a subsequent listener.

0:01:24.7 MH: Yeah, right, yeah, there’s no way for us to keep you in the same control group. Okay, Moe, we gotta introduce you too before Tim and I keep making stupid A/B testing technology jokes. 2021, it’s gonna be a pretty big year. Do you feel ready?

0:01:38.8 Moe Kiss: I am absolutely terrified for the experiment I’m about to embark on.

[chuckle]

0:01:44.0 MH: Excellent. And I’m Michael Helbling, and 2021 is going to be an amazing year, at least that is what I keep telling myself. Okay, to really do this topic justice, we wanted a guest who is steeped in testing and optimization, so we turned to Ton Wesseling. Ton is the founder of Online Dialogue, he’s also the founder of the very popular optimization conference, Conversion Hotel. He’s an internationally recognized speaker and thought leader on the topic of testing and optimization, and today he is our guest. Welcome to the show, Ton.

0:02:17.9 Ton Wesseling: Thank you very much. Thank you for having me on.

0:02:20.4 MH: Well, it’s great that you’re here. So I think what’s interesting is we’ve all met at Super Week in the past and I had heard of Conversion Hotel, but up until we started prepping for this show, I did not realize that you were the conference organizer. That’s been one that I’ve sort of looked at and been like, “Man, I hope someday I could make it to that.” So I know this year it’s gonna be virtual but I’m kind of excited to see what happens with it as you progress it, I’ve heard many good things about that conference over the years.

0:02:47.0 TW: We’re hoping that indeed, November 2021 is gonna be a month when we are allowed again to run a conference like this. So indeed, this whole year is virtual. We took down the conference into like 12 small snacks and do it virtual this whole year, but looking forward to November 19, 2021, and hopefully it’s gonna be a blast, and if not, then we’re just gonna wait another year.

0:03:09.9 MH: Nice. [chuckle] Alright, so let’s jump into testing and optimization and just sort of maybe give… Let’s start with maybe just giving people a lay of the land of where we are today. So much has been changing, what are some things that are top of mind for you coming into this year as it pertains to the space?

0:03:28.1 TW: Top of mind, of course, is moving to server-side. Testing has had a big boom since it became client-side testing in 2010, but now it’s moving to server-side because of all the cookie issues we are having. But also, moving to logged-in users, but that’s a broader move than just optimization, of course, combined with the server-sides, and I think that those are the two major things. I think many companies will have a lot more budget for digital in 2021 than they had previous years, where scaling up your tech stack is something that a lot of companies will be working on this year. So, investing in new technologies for optimization and experimentation will also be on the table.

0:04:15.1 MH: Did you say “locked-in users”?

0:04:16.6 MK: Logged-in.

0:04:17.8 MH: Logged-in users, logged-in users.

0:04:20.3 TW: Yeah, that’s my accent, yeah.

[chuckle]

0:04:24.7 MH: No that’s my American inability to… But testing with logged-in users, that’s always been a little bit easier, harder to set up and test maybe, but easier to…

0:04:35.8 TW: It’s always been the big challenge, that testing has always been perceived as the holy grail for making business decisions, or at least making digital business decisions. But then if your samples are really polluted because the same user is using a phone and a laptop or even maybe a tablet, if they’re still using that nowadays, then you are not able to recognize a user if they are not making their self shown to you. Though we were always working already with a little bit of polluted data, but then again…

0:05:04.7 TW: And now the cookies are being deleted almost immediately, it’s harder to recognize guest users. So then of course, you can switch to Session metrics to run your experiments on, but it’s so much more valuable to be able to optimize for lifetime value, so then you have to really recognize your users. So that shift is of course already going, getting more mature in optimization and testing analytics. Especially with all the browsers having all these changes to how they work with cookies, it kinda makes sense to invest in that this year.

0:05:37.6 MK: So when you say “logged-in users”, do you mean only starting the experiment once a user has logged in so that you have a less polluted sample? I’m just trying to gather across the server-side stuff, but maybe less the logged-in user stuff. Is that what you mean? Is that how most people in the industry are tackling it?

0:05:57.5 TW: Of course, experimentation can be used for several things throughout the whole customer journey. Specifically, to product experimentation to optimize your digital products, then of course you will have a user that’s gonna be logged into your application, and then it’s way more easier to run the experiments. And then you can even still run it client-side if you want to, kinda make that too, and server-side because you’re working on a software solution probably.

0:06:17.9 TW: So that’s where experimentation is getting more mature really fast. Of course, on the marketing side of things, attracting users, buying users from social media, getting them to a landing page, then of course, it’s quite hard to tell them, “Please identify yourself first, and after that’s done, then we can serve you the proper variation or the proper segment of whatever we wanna do with you, and then you can take the next break.” Of course, they are gone before that. So over there, identification will probably be done with some sort of fingerprinting.

0:06:49.7 TW: Because the best thing that’s happened at least in Europe with all the GDPR legislation, is that we now really have to ask, opt-in, ask you are we allowed to store stuff on your computer? And in the old days, we never asked, we just did. And there was some sort of a gray border, like the ethical border, what you do as a company, how far do you wanna go as a company as a brand to store stuff on the computer of the user.

0:07:21.0 TW: And there is an ethical border over there and of course, for some companies there was no ethical border, and this is why we ended up with this kind of, but nowadays we just ask, “Okay, we are… ” We have to ask you really politely to store stuff on your computer, and you have to opt-in for this, and here you can read everything we’re gonna store on your computer. Would you do this? Yes or no, and if not, you can edit stuff and have small tweaks on what we can store and what not, but all companies are really making it hard for users not to store data on their computer because in the end if you press “Yes”, Then you’re done, then finally, the application opens or the website opens and you’re a happy woman or a happy man, because finally you are able to do what you wanted to do.

0:08:03.4 TW: So we move from just having an ethical border with no ruling to every company asking for yes and training every user in the world to just click yes, besides maybe that small 1.0% of users, maybe like Tim or like you Moe, or like you Michael, that will go into the depths of cookies on a website, but then again, we’re not optimizing the website for you guys, we are optimizing the website for the world.

0:08:29.4 TW: So everyone is saying yes to everything, so I think with marketing, of course, browsers are making it harder to be able to store those cookies, but if they say yes, if they just click yes on the opt-in, it’s quite easy to ask permission for doing digital fingerprinting and make a combination of your browser tools, installation, everyone has a unique fingerprint. And this is something that was across the ethical border in the past, but nowadays users are just saying yes. So I think from the marketing side of things, that’s the way how to be able to recognize users, and then of course, it’s up to the company if they think they’re crossing an ethical border or not.

0:09:07.4 MH: That’s fascinating. So, because I remember a Webonics Wednesday years ago, where I can’t even remember who the company was that we had, and they were just presenting on some topic and boy, he was so excited that he had… And they were using Flash cookies, he was like, “Even if people delete their cookies, we can still track it.” I was sitting in a room of digital analysts who were like, “Dude, that’s not… That’s not cool.”

0:09:32.0 MH: And digital fingerprinting has gotten to the point where it’s like while you’re working around stuff, I hadn’t thought about the fact that if you’re getting people to opt-in and you’re giving them a fair opt-in experience, then it does get you… There’d be an argument that that’s moving you to a wider side of that gray area because you’re saying, I’m asking them and they’re opting in, now the reality is no one’s reading it, but they are at least getting an interrupt that says, we’re tracking you, so why do they care how I’m tracking you?

0:10:04.8 MH: I think I might have just drifted to where maybe digital fingerprinting is got a way to be, not quite as bad. But you mentioned 2010 is when the client side boomed, and I’m assuming that… My history in the space is not super detailed, but that was like the Optimizely explosion, and I know Target existed before that and other platforms. On the scale of the last 15 years, do you see specific periods of… I remember Optimizely coming out and it was the, this is easy, drop it in, you get a WYSIWYG interface, drag and drop, just test, test, test, test, test. And before that, it was a higher lift, but what have you seen as the volume of testing, pre that period over the last 10 years… I still have clients who aren’t testing or who are doing three tests a year, but at the same time, you hear the space and the perception is there are, who’s not doing hundreds of tests a year? So how is that arc of history moved?

0:11:21.3 TW: It all started in 1872… No. [chuckle] I think the best example you can get is the Google Analytics shift in the uses of these analytics. Before Google Analytics, of course, Urchin was around, and of course Adobe or Omniture back then was around, but analytics was done by looking at log files or using Webalizer, and of course, there were some free solutions out there but they were really silly, and the paid solutions, of course, enterprise solutions were out there and were quite amazing, of what they were capable of, they were quite slow but they could do great things, but then Google Analytics suddenly came to the market-place, and there was a huge explosion of suddenly users, people using Google Analytics.

0:12:02.0 TW: In the beginning, companies were afraid, if you were selling enterprise analytics software, they’re like, “Oh my god gosh, Analytics is now for free.” But after two months, they were really happy because they saw a big shift of people using analytics and finally also buying a premium package from a different vendor than just free Google Analytics. So it’s the same with the explosion in 2010 when simultaneously VWO from India and Optimizely from the US came into the marketplace. VWO was the one conquering Europe before Optimizely went over to Europe, it was around the time when the DOM…

0:12:34.9 MH: Again, sorry, VWO, the Visual Website Optimizer, you said VMO?

0:12:40.1 TW: Yes. Yeah, yeah, yeah.

0:12:40.2 MH: Okay.

0:12:41.9 TW: And of course, even some other vendors out there already had some tools existing but not the marketing money to blow it up big. I think in 2009 or something, the DOM manipulation technique was made freely available by some developers, so the drag-and-drop solution, to be able to drag and drop stuff on your browser, there was of course the big push forward for tools like Optimizely and VWO because suddenly you didn’t have to know how to code, because before that running experiments, even if you were doing Adobe Targets or back then still Omniture Test & Target, you have to be able to understand HTML, or at least some javascripts, you had to be some kind of a front-end developer to be able to run experiments.

0:13:25.3 TW: And suddenly, as a marketer, you could just drag and drop your stuff like, “Oh, this headline should change, or this headline should go here and that picture should go over there, and this is like really amazing.” Try another experiments, and another one, and another one and another one, and that’s of course, what happened is that there was this big explosion of mostly marketing experiments trying to get new users to your e-commerce store or to your tool or whatsoever, but the…

0:13:49.8 TW: The quality both ways from statistical perspective, so the trustworthiness of the experiments, but also what they were experimenting on, because of course, changing the button color, in the end, it can have an effect. If you really have the same color as the color of your website and you choose to pick one that’s really standing out, from a color perspective, it does… Could bring you some extra clicks and maybe even some extra sales. But this is of course not what you wanna run an experiment for, and I think that that went wrong in the first years of optimizing VWO, all marketers being happy, finally being able to change stuff on the website without having to use the content management system or even go to IT to ask for a developer to make changes. So we have seen everything… It went from a whole website only running on experiments being pushed 100% live for every single page with base-loading times going over 20 seconds before something answers.

0:14:50.6 TW: So in the beginning was a big explosion, even the tool vendors had their statistics wrong, they didn’t know about the frequentist approach, shifted to Bayesian in the end. Nowadays, in those 10 years, we moved from aggressive marketing-focused, try out a new tool, experimentation to “okay, how can we implement a proper framework delivered by a software provider in our tool stack to be able to run every code change as an experiment or as a test?”

0:15:23.7 TW: Especially, of course, if you are a smaller company it’s a different story, but we’ve been through a lot, we went through some fun experimenters, to suddenly a large group of really amateurs making every mistake you can ever make in experimentation to a proper understanding of what experimentation is about.

0:15:42.6 TW: And we still have to go a couple of miles, but I think most companies now running experiments at least know something about statistics, and the ones that are not running experiments yet, they are still out there, or they are low traffic. If they are a high traffic company selling a digital product, not running experiments, if they ship a code change on the digital product, how can they still survive? They must have been really, really, really, really lucky or really good at user research or some other solution to understand what’s going on, maybe really good at analytics and just not run any experiments.

0:16:20.0 MK: But do you think… I feel like there has been a volume increase, and I do think it’s better… Well, marginally better than the button color changing…

0:16:32.8 MH: Volume and number of… Number of experiments per company.

0:16:35.0 MK: Not in number of experiments…

0:16:38.2 MH: And number of companies, I guess. Okay.

0:16:40.9 MK: Yeah, and it becomes like a point of almost gloating about how many experiments you can run as a company, that sort of thing, and I remember chatting to you at Super Week last year where you completely blew my mind away, ’cause I just hadn’t thought about it. I mean, I do a little bit of stuff with experimentation, but I probably hadn’t given it, obviously the same thought as you do, because that’s your bread and butter, about how to prioritize experiments though, and do you feel like most companies in industry are nailing that of how to actually judge what kind of impact an experiment is gonna have? You have this whole scoring methodology that I remember just being like, “This is the best thing I’ve ever heard,” and only a year later are we starting to really implement this now. And yeah, I still feel like lots of companies, it’s about the quantity, not about the quality.

0:17:33.4 TW: It’s a really good remark, Moe, it’s always hard for me to… Because I’m really into that whole experimentation thing and also mostly working for companies that really get experimentation because I’m getting too old to keep on fighting against companies that don’t get it, to help them set this thing up, I just wanna help companies that do get it to become better in what they do. But then again, you’re fully right, I think in the last 10 years, we’ve become great in optimizing velocity, running more experiments and maybe even making them a bit trust-worthy and the vendors are really helping out with that. But indeed, what tests are being run, so the winning percentage, if you take out all the false positives, how many real wins did you get with running experiments? And for most companies that’s way too low because they don’t have a proper way of prioritizing what they work on.

0:18:29.0 TW: But this is of course a bigger problem than just running an experiment, ’cause in the end if you are a product team, you wanna add value to the company, you wanna add value to the business, also as a marketing team, so it kind of makes sense that you prioritize what you work on, and if you prioritize this, you wanna look at evidence, and of course, past experiments are really valuable, but also data and user surveys and feedback sessions and everything is evidence with some sort of quality in there, of course, experimentation has a higher quality of evidence, less risk of bias, and you should use that to prioritize what you’re gonna work on and then you’re gonna work on this as a product team or a marketing team, and then once it’s… Or you finish it and just run this experiment and push it live if it’s valuable, or you do some pre-experiments in a lean setup like, okay, if we do this and this and this and then fill up this little thing over here, then it only takes 12 hours to create this instead of 200 hours, we can run experiments, see if it has an impact.

0:19:23.7 TW: If so then we’re gonna work on this as a team and spend those 200 hours to push this forward. That way of thinking, to me, is still not there, most product teams and market teams I see in companies just do their utterly best based on mostly gut feel at and some data and just keep on running and then they do experiment or don’t experiment on the final result, so if you wanna move forward to keep on growing, this will become a really important part because in the end, if you’re able to calculate the impact on experimentation, it could be that you’re gonna find out that it’s not bringing any money.

0:19:58.2 TW: And that’s maybe not because of the fact that they’re just spending money to do does experiments, but because the quality that’s being put in is not good. Now, so of course it will bring money because experimenting on everything, on every line of code you’re gonna be shipping helps you understanding if you’re not hurting the business. So if you make a business case on stuff not being shipped because it’s about hurting the business, then probably you can still pay for the experimentation team. It kind of make sense to use experimentation to grow the company and not just use it to stop them from doing stupid stuff.

0:20:29.8 MK: So when you just said that, because you said it twice now and both times I’m like, I wanna drill into exactly what that means. When you say experimenting on every line of code, what does that look like?

0:20:41.3 TW: If you work on a digital product and to me, a landing page generating leads is also a digital product. If you’re gonna make a change to the product, so change a line of codes, you can just change it, work on the fly, change it and push it live. And then you will never know if it has some sort of impacts. Of course, you will still be able to see a trend line of your sales or leads coming in or tasks being completed over time, but then you will never know what caused this and especially if you’re lower on numbers. So in an ultimate scenario, the thing you wanna do, and this is not for the real small companies, this is for a company where you have an IT department, marketing department, product teams. Once something gets implemented, most of these companies implement this already on server-side level. So it’s not a client-side experiment, it’s a server-side experiment. And they’re gonna just release it as a feature flag solution, and they will get… Like get 20% of the users will get this variation to understand if it’s working.

0:21:41.5 TW: And if you do this for every code release, every line of code that’s being changed, then you continuously know if the code change is hurting the business or not. You can even skip your quality assurance a bit because if you just get a couple of users, and it’s hurting the business, then we don’t know why, it could be a bug, it could be because with the change of wording or whatsoever but just stop it and move on.

0:22:04.0 TW: So with experiments on every single line of code, this is the solution the bigger companies are focusing on. And of course, it really depends on the size of your company, if it’s valuable to test every single line of code because if you would ask Airbnb, Microsoft, Amazon, like the big ones, “Is it feasible to really run an experiment on every line of code change?” At some point, they will say no.

0:22:28.1 TW: Some changes would just be pushed live because it’s too expensive to analyze and maybe analyzing experiments could be automated, but then telling the team that ship that line of codes that it had an impact or a negative versus positive impact and they should do something with it. It’s costing your brain power and time. And at some point that doesn’t make sense. So the team just has to move forward and not look back because this was… The chances of this having some sort of impact is so low, don’t waste money and brain power on this and just ship it. But in the end, those companies want to ship, test everything.

0:23:03.1 MK: I feel like you’re really challenging my perspective here because as you’re saying all of these things, I feel like, yeah, this is the second time we’ve had an in-depth conversation. And both times I’ve been like, Wow, you know all the stuff. And I hear myself agreeing with you. But then I also hear me in a work context being like, this is ridiculous, we shouldn’t be testing this, and I’ll like make the case that I make regularly at work, which is basically like, if we know we’re gonna roll this out, no matter what, 100%, it’s definitely going forward. I often have product managers who are like, “No, I wanna test it so I know the impact” and I’m like, “I don’t give a shit, that’s a waste of an analyst doing analysis on a test that you are gonna roll out regardless of the outcome. But then it sounds like your position then was just the counter to that, which I also agree with. So I’m feeling really conflicted.

0:23:58.0 TW: Of course, there is a difference between the theoretical solution and the real company. Of course, if there’s politics involved or some decisions just have to be made and then you become an imbalanced company compared to the theoretical perfect solution. So then you have to work, you have to work around this. Take it from a theoretical perspective, it kind of make sense to test everything. And if it doesn’t cost any resources anymore, if you’ve automated everything, it kinda makes sense to test everything, because then at least you know for sure that you’re not hurting the business and every position is being tested.

0:24:32.6 TW: So every product owner or marketing manager understands what’s going on. This is the ideal scenario. But then you are in a company like six, seven years ago, you have like two people in the company running experiments and they were the optimizers, they were the conversion rate optimizers, the experimentation team.

0:24:48.5 TW: And once they started growing because they were proving they were bringing money to the company that they became four people, five people, and at some point in the maturity growth, they saw, “Okay, we need to have the product teams and the marketing teams run those experiments because they know what to work on, they know about marketing, they know about the products, they should prioritize experiments, and we should be the team enabling those products and marketing teams to run those experiments, and this is how we can scale experimentation through the whole organization.”

0:25:18.6 TW: But then if you look at the experiments, that’s like setting the hypothesis, designing the experiment, developing the experiments, running the analysis, that takes like six or seven weeks. But those products teams, they do sprints, they do like two-week sprints, and the marketing teams, they prepare a campaign and run the campaign for like four weeks, like Black Friday campaigns, Christmas campaigns, Easter campaigns, whatever. So they have a different pace.

0:25:42.7 TW: So if we’re trying to tell them to run experiments and experiment on everything before they do something, to them, it feels like slowing down. And only if they really have the needs to understand why a specific campaign message or why a specific product change is working or not working, then they will be willing to waste another six or seven weeks to get that learning and to become better. But if it is slowing them down, so the companies that went to that maturity, at some points, we just went to the IT department and told ’em, “Okay, are you able to change the content management system or the text that you’re using to test everything that’s being shipped by the marketing team or the product team as an experiment.” So it doesn’t cost them any effort, so they can just build it, they can collect the evidence they want to collect, which user interface, past experiments, whatever they do.

0:26:36.4 TW: So they would just create the change and it’s being run as an experiment, so for them, it’s invisible. It’s like the normal way of work, they just ship… Code they ship to campaign, and at some point, they’re being told, “Okay, this campaign was successful or not successful. This product change was being pulled back because it was really hurting the business. We think it was a buck, but here’s the data, you have an analyst in your team, go find out yourself.”

0:27:02.5 MH: But to clarify, for that to work, that means when you’re shipping, you’re kind of always running in an A-B mode ’cause you’ve got basically a challenger in the control. What’s being pushed out is… They’re saying, “I’m deploying new X,” which could be feature or it could be content, or it could be marketing. And the process or the system is saying, “Well, we’re gonna use what was already there as our control and what you’re pushing out, we’re not gonna a 100% deploy what you’re pushing out.” Is that how that works?

0:27:36.7 TW: Yeah, yeah.

0:27:37.9 MH: ‘Cause that is different from an email test where we’re gonna push an email and we need to create a subject line for A and a subject line for B, because that then does have to be a conscious… Or even if it was a content where we’re pushing out new content and we wanna test the headline, A versus B, they’re types of… They’re types of tests where I need to design the test because I have to come up with my variations, and then there are tests where it’s like, “No, my baseline is my status quo, that’s my control, and what I’m coming up with is the singular challenger.” Is it fair to break up the test into those two groups?”

0:28:22.4 TW: If you create something new, it could even be creating a new landing page, creating a new product, then there’s no data, there’s no default to compare with, you just create something new. So to me when in email campaign, to me, it doesn’t make sense to create an A and B version and to ship them both out fully at 50%. If you create a new email, you’re doing this because of past learnings, you probably are having some sort of a campaign you wanna push out, ship that campaign to 10% of your email database, and then it could be an A, B, but then pick the winner and ship it like 90%. So it could even be that you ship only 5% with the one you created to understand if the open rates and the click throughs are as expected and if so, ship it.

0:29:07.9 MH: But that’s then back to where the people who are the marketers need to have some awareness of what’s going on. It’s like, “We’re deploying this campaign on February 4th.” And then it’s like, “Well, if we’re doing it through the model… ” I think there’s that whole challenge of just like the, “Hey, these test platforms, you don’t need IT.” And now we’ve turned our test platforms into a content management system, and that really starts to not work well. With testing, it seems like there does need to be a base level of buy-in and education on a range of parties.

0:29:50.5 MH: Even if you’re trying to make it so they don’t have to think about it, they still need to understand that that is the corporate process and rationale, or you’ll wind up with these frustrations. Even if you’ve got the feature where you pushed it out and then somebody says, “What the heck’s going on? I deployed my new feature and I just pulled it up and I’m not seeing it.” Well, ’cause they happened to not be in that… And that seems like that’s this constant tension that hasn’t gone away. As a program gets more sophisticated, you still have to have the culture and the awareness and the education across a pretty broad range of constituents within the organization.

0:30:31.2 TW: Yeah. And I do also have to share with the listeners that there really is a big difference between aiming to be a big tech product company or just a small challenger in some sort of e-commerce niche, because in the end, if it’s all about these two products… If you look at a company like Booking.com, which is based in the Netherlands over here, of course they’re having a hard time. Had a hard time in the whole crisis, but they are an experimentation machine, they run 2000 experiments always. It’s always [0:31:01.9] __. Their Head of Product, David Vismans, he was even quoted in the book, “Experimentation Works” by Stefan Thomke. He said, “Large scale experimentations or high velocity experimentation is not a technical thing, it’s a cultural thing.” You can buy the technology, you can make sure that every line of code is being run as an experiment, automatically in the right buckets, and all statistics are like okay.

0:31:24.9 TW: But then if the teams don’t understand this way of working, and if they don’t start prioritizing their work based on the outcomes of the experiment results, then the only thing you’re sure of is that you’re not pushing stuff live that’s really hurting the business. But they’re probably not uplifting the business too, so it’s a cultural thing.

0:31:44.5 MK: For smaller companies, campaigning is interesting, you mentioned email, experimentation can really be well done on every media, you’re really owning. So if it’s your own platform, so if you’re serving emails from your own platform as an image, so you are still able to change the email once someone is opening this and are able to bypass all the commercial folders in all email programs nowadays, then it’s still owned by you, so then you can run a proper experiment.

0:32:12.4 TW: And especially on digital products, your own digital products, it’s really good to run experiments on. If you buy media, if you run campaigns on Google like your adverts or your Facebook advertising, then it’s way harder to run this in a setup like, “We’re gonna test everything.” It’s not controlled by you, it’s not owned by you. So you have to take a little bit more risk, and maybe run less experiments and just try, try, try to understand what’s working, what’s… It’s still a data-driven approach, but it will be less good as to running experiments on your own platform because there you are in control.

0:32:48.4 MH: So it seems like… I mean, so one thing I always think about in terms of scaling, testing is this cultural aspect that you just mentioned, and the problem has always been, well, how do we get a broad set of users to think through a good design of experiments? To think through all these things, and it sounds like the answer to that is sort of, “Well, we’re just going to sort of have this layer of testing that exists, sort of removing that piece from it.” What about the statistical piece? Do you feel like everyone who’s involved with these programs…

0:33:20.5 MH: ‘Cause it seems like the statistical rigor is necessary, design of experiments is necessary if you’re gonna run it sort of what I’ll call the old-fashioned way, the way I’ve kind of learned about testing. So then, is that kind of how you remove that obstacle from organizations? And I’d like to hear kind of how that interplay works in your head a little bit.

0:33:41.9 TW: It’s something we learned along the years. Trying to convince everyone in a company that they should run experiments is really hard to do. Convincing non-believers? [chuckle] It takes time. So taking away experimentation and making it invisible, it really makes sense to have more teams run experiments. And the statistical also… If you are running experiments yourself, the knowledge you’ve gotta have on statistics, the knowledge on user behavior… I’d rather have those teams focus on understanding more on like the work of Daniel Kahneman, System 1 and System 2 Thinking. The work of Richard Thaler. Teach them that because you should have the Center of Excellence where some people really know about statistics and design of experiments, and they make sure that that’s taken care of; that every experiment run in their company is trustworthy because if those teams need to take care of trustworthiness, there are still really good optimization teams out there that run experiments centralized in quite large companies that’s are not understanding the full depth of the statistical approach of experimentation.

0:34:57.8 TW: Either they know about power; they know about significance, type I errors, type II errors, but then if you talk about like, “Okay, well, your type M rate, and your [0:35:06.3] __ error, how big is this? Are you applying qubits to have less variance in your experimentation outcomes?” Then people are like, “What!” [chuckle] I know.

0:35:17.0 MK: I’ve been there, I’ve made all the statistical mistakes myself because my background is not statistics. I’ve been educated in management science and grow companies, but I wasn’t really teached about this level of statistics, let alone the whole Bayesian approach or let alone machine learning or whatever that’s coming up now with algorithms. So to have that knowledge in all the teams, that doesn’t make sense. It’s just too complex, and it makes conversion rate optimization or experimentation like something you don’t wanna do. If you are in a marketing or product team, and you start listening to talks like this, myself too on statistics, then you probably are like, boring.

[laughter]

0:35:58.5 MH: Well, there’s that constant pressure. I feel like… Maybe this is just reinforcing that there’s the idea of A/B testing is, “Oh, I’m splitting the universe,” I’m getting… It sounds simple, but then the realities is it’s complicated. Both the statistics as well as just the mechanics of actually doing it and doing it in a way that is actually rigorous. And so it feels like we haven’t cracked that nut. If it’s a large company where it would just be ludicrous for them not to be testing and doing constant experimentation, that’s one thing. It feels like either the mid-sized or the ones that may be large, but they don’t have a… They have a long sales cycle or they don’t have an online conversion that somebody is like the champion for testing, and they sell it as, “We’re gonna do this A/B. We’re gonna split the universe into two, and we’re gonna definitively figure out causality between what’s driving behavior.”

0:37:01.0 MH: And then all of a sudden, the cold, hard dose of reality of getting a platform in place, getting alignment on what’s gonna be tested, making sure there’s the rigor, and it does feel like there’s this tendency to feel like there’s a need to sell it as being simple and straightforward and definitive, and then the reality hits, and it’s kind of very frustrating and oh, by the way, even if people embrace that reality, it’s like… And there’s a lot more. You’re gonna keep getting deeper and deeper and deeper into those details of it. It seems like that… No one’s solved for that. The technology companies like to say they’ve solved for it, and that’s really not where the solution comes from.

0:37:47.8 TW: Oh, it’s… I fully agree with this. I think that started already with the launch of companies like Optimizely because they just wanna have clients. [chuckle] So the methods like, “This is really hard to do, you should really know about statistics and developments to be able to run this [chuckle] software properly.”

0:38:05.9 MH: Sign up here!

0:38:06.7 TW: Sign up here. It’s working, of course. And of course, they made it easy, and they made experimentation really big so. So I praise them for that one. But indeed, it’s like going down the rabbit hole where one… If you are like one experiment or one marketing person at a mid-sized company or a smaller company, a small-medium business that heard about conversion optimization like, “This is something you should do. You should run A/B experiments,” and you start teaching yourself how to do this, then you end up in a nightmare of continuous negative messages that this is something that you will never be able to do this right. [chuckle] But in the end, the larger companies…

0:38:53.0 TW: Scale-ups, but also enterprise companies really know that data is their solution to growth. So evidence based growth, data driven growth, insights driven growth, whatever they wanna call it, it’s the way forward. So what they do is they invest heavily on data, heavily on experimentation platforms. They invest heavily on buying other brands that have a large group of users, so they have more data to make decisions on. Because like… As a company, you have to be fast and you have to have lots of data to be able to understand what’s going on. And then you can outperform the competition. And then if you’re a smaller company and you don’t have this buying power or funding or just are in a niche market, it still makes sense to run experiments. But then you have to take a different perspective. Just say, “Okay, I’m a business owner, I’m making decisions. Of course, I became this company of eight people or 10 people, or 20 or maybe even a hundred people, because I have this gut feel and I’m really passionate about this specific product I’m selling to the market or the service I am selling to the market, and that’s brought me somewhere.”

0:39:57.4 TW: But then if you take out the specific person and then the luck that person has, then you wanna make business decisions based on evidence. And maybe, in that company, it’s not possible to run every decision as an experiment, but you still need to look at this hierarchy of evidence and experiment of evidence. Okay, but we have data from users, we have interviews with users. We do have cohort analysis of the data we’re having, and this is what we use to make decisions. Because at some point, if other people are also start making decisions in this company, they don’t have the luck and the passion of the first mover in this company. So they need to make decisions and then if they use experimentation, they should know what’s going on when they make decision based on experiments. But then the good thing is, compared to 10 years ago, back then, statistics [0:40:47.2] __, being able to implement a correct system. It’s still with analytics. If you don’t check the funnel, then it doesn’t make sense to tell anything about the outcomes because you’re not taking the funnels.

0:41:01.0 TW: But the tool’s really mature. So nowadays, if you implement even a client-side solution, if you implement an Optimizely nowadays and look at the stats engine or one of their competitors, you are quite sure that the decision you’re making is probably the right decision.

0:41:14.5 TW: And of course, there are still some minor errors in there. And they’re not testing for simple ratio mismatches and so on, but they were mature. But if you compare it to 10 years ago, a decision that you made 10 years ago on that tech stack, probably asking someone to throw a dart to a board for a yes or no decision, would have led to the same outcome or maybe you were a little better. But nowadays, it really helps to use the software to make that decision.

0:41:41.1 MK: Okay, so Tim always thinks that my professional experience is really warped because I tend to work at start-up, scale-up businesses that have unique…

0:41:53.0 MH: It’s not your professional experience that Tim thinks is warped, Moe, unfortunately.

[laughter]

0:41:57.3 TW: I am gonna drop off now.

0:42:03.4 MH: Sorry Moe.

0:42:04.5 MK: I guess for me, the issue is not convincing people about the need to experiment. Because everyone’s on the boat… Everyone gets it. I did pick up a little while ago, a really interesting point you made, which was when you were talking about that email example of like, “Okay, so we’ve got this email. We’re gonna send it to 10% of users.” And that’s what our engineering team do. So when we roll out product changes, we’ll send it to 5%, then 10%, then 20%, and eventually we work our way up to 100%. But that’s not what our marketing team do. Because our marketing team, have this perception that whatever they’re doing is going to be better than what was there previously.

0:42:43.5 MK: So they want to expose it to 90% of users in the first hit, versus the 10%. Whereas I feel like the engineers are cautious and they wanna make sure they’re not fucking anything up. So for me, it’s more about, how do we get things to be consistent across the business, or how do we make sure that the experiments we’re running are the most valuable experiments we could be running. Because, you’ve got cats running in totally different directions, and I feel like that’s a totally different problem to a lot of the stuff that we’ve been discussing.

0:43:14.9 TW: Is the marketing team also running experiments on existing users? Or just trying get more new users in?

0:43:20.8 MK: Both.

0:43:22.2 TW: The product is a good one. Of course, the product team doesn’t want to hurt and they are cautious and they don’t wanna have downtime and so on. Or leaks or hacking attempts. But that makes sense. If your business is a digital products, like you’re in, then is kind of strange to not have something progressively delivered because you wanna understand if users are dying from this new drug you’re pushing out. If we would have been in that kind of business. Medicines, you wanna test the medicine really, really good before you ship it to the market. But it’s the same for the product. So you wanna test the product because if you’re not testing the product then you’re hurting the products and could die.

0:44:06.0 TW: But then for marketing… If it’s a existing user, then it’s really interesting to see if the messaging marketing you’re using for the existing users already knows the platform, that could potentially hurt your business. It could hurt the perception of the products. So if it’s on your platform, on existing users or emails to existing users then, to my opinion, it should be a product approach.

0:44:30.3 TW: But if it’s just a marketing team, keeping in new users, running campaigns… Of course, you still wanna understand what they’re doing with your brand. [chuckle] But let them go and just calculate how much money they’re spending, how much cost they’re having and how much new users they are bringing in. And if you understand the quality of users, you probably also can calculate a potential lifetime value of the users. So what effort they approach the user in… It kind of makes sense that they use a experimentation approach, if they are in high traffic and they should run the experiments in AdWords or whatever. But if that’s a different approach than optimizing your real product, then that will be okay for me because the last thing you wanna do is slow down those teams, from their creativity, they just need to keep on running. They’re enthusiastic, they are competitive. They wanna move forward.

0:45:19.8 MK: That’s a really interesting delineation about the existing versus new users. And with existing users you could be doing harm, but with new users it’s… Yeah, I guess about trying to run fast and let’s say enhance the creativity, but I’m skeptical.

[laughter]

0:45:39.1 MK: Yeah, that’s a really nice way of framing it. I think that… Yeah, I need to mull that over…

0:45:46.7 TW: Probably at some point will need an ethical board on your marketing teams to make sure they’re not doing stuff you don’t want them to do with your brand, especially if you’re outsourcing at specific marketing firms or agencies. But then for you, it’s your product. In the end, if your product is really good, it probably will sell itself and run its own marketing from your loyal users. So whatever you do, don’t hurt the product and the user experience on the products because growth comes fast, but it even moves out faster.

0:46:16.3 MH: Alright, we’ve gotta start to wrap up. This has been such an amazing conversation though, and I feel like we have so much more we could be digging into, so it means there needs to be a round two at some point in the future Ton, so let’s definitely… Tentatively schedule that at some point. But we do need to do that, and one of the things we like to do on the show is called the last call, we go around the horn. Just share something with our listeners we think they would be interested in, or we think it was interesting to us. So, Ton, you’re our guest, do you have a last call you’d like to share?

0:46:48.0 TW: Yeah, I have two. I’m not sure if it’s allowed to have two last calls but I want to…

0:46:53.0 MH: Yeah, A and B.

0:46:54.5 TW: Okay, there you go thank you very much.

[laughter]

0:46:57.3 TW: I have one for you three because you are talking to so many people from the industry, from analytics to experimentation. There is a difference, like we mentioned in this show, between the theoretical awesome approach and the real daily work. I’m always intrigued in what’s holding them back to become more data-driven, more experimentation-driven. So, if you can keep on gathering that knowledge by asking that specific question to coming guests, and at some point share it with everyone, that will be amazing because the amount of people you’re talking to, that’s such a big scale of knowledge that’s even hard for me to get. So, that would be my tip to you and…

0:47:41.8 MH: I like it.

0:47:43.0 MK: Nice.

0:47:43.4 TW: The one for the ones listening, it’s in the end, experimentation is not a technical thing, it’s a cultural thing. So you should work on that, work on those growth mindsets, but I think you all know this. You want to optimize the outcome of the efforts of people working on their different decision-making, you really need to understand consumer behavior. And maybe this is not for the analysts listening, but there should be someone in your specific team, understanding consumer behaviour.

0:48:16.1 TW: And there are lots of papers out there, the same for, paper for analytics who are really hard to understand and hard to read with and with difficult scientific language, but the change… If I look at myself, my change was made when I read Daniel Kahneman, Thinking, Fast and Slow. It’s really easy to understand, and it really gives you a feeling on how the brain works, the system one, system two thinking, emotions and ratio. If you understand this, and you also know that in the end, you are optimizing product and marketing for the user and for the brain of the user, then this is the first book I would recommend.

0:48:57.6 TW: Love that.

0:48:58.3 MH: He has been invited to be on the show and he didn’t respond, which we actually counted as the victory that he actually…

0:49:05.0 TW: Yeah, we were like, “He replied to your email, oh my God!”

[laughter]

0:49:10.4 MK: I don’t find his book easy to read, though. I’ve read it twice and I find it quite heavy reading, but that’s… I think it’s cause you have to do all the thinking.

0:49:21.4 TW: Okay. I’ve heard that before and even people were telling me this. To me it wasn’t the most easy book to read. Then, I can always suggest Dan Ariely. Dan Ariely is even more approachable…

0:49:36.6 MH: Predictably Irrational?

0:49:37.6 TW: Predictably Irrational, Predictably Irrational.

0:49:41.6 TW: Well, that’s the one, yeah. But also, just reading Nudge from Richard Thaler is also an easy to read book. Because I’m an analyst, my background is in web and digital analytics. I was analyzing log files 20 years ago. And to me, some things that I thought was… The brain was working like this, that people were making decisions like this, were just based just based on ratio. Once I started reading those books, I started to understand “Oh no, but I think they make decisions like this”, but no, it’s vice versa. The brain works differently. And for me as an analyst and also OCD specialist, that was a big eye-opener.

0:50:21.6 MH: Nice, thank you. Alright, Moe, what about you? What’s the last call you’d like to share?

0:50:27.8 MK: Okay, well, we got to the end of 2020, I still sometimes can’t believe we’ve made it. And one of the saddest things that came out for me, was just, in our cultural survey how overworked the team were, and they felt like they were drowning in work, and I felt the same way too over 2020. It just seemed to be a year that really… Everyone increased their workload, but I don’t think everyone kind of had data to back it. Well, Atlassian wrote this really great blog article on quantifying the impact of remote work on the work-life balance, which, actually, they use the data from Jira tickets and when people were pushing closing tickets, opening tickets, all that sort of stuff, to see basically the increase in people’s day, the hour range that they were working, they were able to show you people were starting earlier and finishing later and all that sort of stuff, which…

0:51:23.4 MK: Yeah, just kind of solidified. I guess the feeling that I already had ’cause I felt like I was working longer, but it was a really nice write up, and the graphs in it are pretty shit hot, so definitely check that one out.

0:51:36.9 TW: Nice. Ironically, on a day where you happen to be working at 10 o’clock at night, and Michael and I are working kind of at 06:00 AM in the morning.

[laughter]

0:51:44.7 MH: Do it for love. Okay, Tim. Now, Tim, remember, this is gonna be for the C group of the audience. Okay, go ahead with your last call.

0:51:53.7 TW: Okay. [laughter] This might be the variation that doesn’t get tested at all, ’cause it just gets discarded.

0:52:01.7 MH: Well, just running the experiment.

0:52:05.2 TW: So I’m gonna do a mish mash of the “Better Yourself”, and I guess maybe some of this is outside hours. As we were talking, I was flashing back to when we had Kelly Wortham on the podcast, who… She is the creator and runner of the Test and Learn Community, which Ton has also been on as a panelist, as well as just a participant. I actually had to look that back up. She was episode 25.

0:52:32.3 MH: Yeah, I think that might’ve been our first episode we ever did about testing and optimization.

0:52:37.1 TW: Yeah, I remember… Which funnily having actually worked somewhat in the space, Kelly made the point around the similar thing, this idea of it needs to be a program, it’s not just test, test, test. I remember her enthusiasm and passion about the topic, and that was five years ago. So a plug for the Test and Learn Community, which is a community. It’s got a Slack team. And it is people who are very, very deep into this space and there are… Even like Lucas from Booking.com is pretty active. So if you wanna get your jollies with sample ratio mismatch, boy, there are people who will talk in depth about that. But then my other plug on maybe more of the pure analytics side… And I want to say, I remember having dinner with Stein and Ton on this after Ton and I both spoke in…

0:53:32.0 TW: At a Copenhagen Webonics Wednesday, which was 2016, and there was a discussion about MeasureCamp London coming up, and who… Which tickets were wherever. But MeasureCamp North America is coming up very soon. So it’s kind of all of North America, so basically four time zones, but depending on where you are and what hours you wanna be up… MeasureCamp North America is coming up on the 23rd of January, and since it’s virtual, the tickets should still be available and it’s free, and it’s just four hours on a Saturday. So another way to work outside of work hours, Moe. So I can find all of those. So lots of great community and content, even in the virtual world that we’re living in. So what do you have, Michael?

0:54:24.1 MH: Excellent. Well, it’s 2021, we’re getting the year started. One big thing that started in 2020, and I wanna kick off 2021 by keeping a focus on it, which is, here in the US anyway, we had an opportunity to engage as a culture and society around issues of diversity and things like that. One thing I ran across a little bit ago was an initiative called BlackInData. And one thing that’s definitely true of our analytics space is we don’t have, especially in the United States, enough representation of minorities in data and analytics, and so I was really excited to find this group.

0:55:03.2 MH: I didn’t see in some of the people that are part of this, any digital analytics stuff, but it’s something… Definitely go check that out, especially if you’re black or you’re an ally. Go check this out, let’s see ways we can support communities like this and continue to grow representation and inclusion in our space, it’ll benefit all of us greatly. Alright, so that’s my last call.

0:55:26.7 MH: Alright, I’m sure you’ve been listening and you’re like, “Man, Ton is making such great points. I’ve got this idea, that idea”, and actually Tim, you mentioned one area where it’s a great place to interact, which is on the Test and Learn Community Slack TLC, and also on the Measure Slack. So we would love to hear from you, and you can feel free to reach out, and all these topics, I think are ones… It seems like Ton, you like to talk about, so I don’t think it’s a hard problem to engage you in that way. Ton, are you… Do you maintain a Twitter account or anything like that?

0:56:00.4 TW: Yeah, I’m “tonw” on Twitter, but also active on LinkedIn, so just ask questions. At some point I’m getting too many questions, so I lower down my Slack usage a bit to be able to produce some work, but I do log on the Measure Slack and also TLC Slack. So just reach out and if I’m not responding, it’s not because of you, just try again.

0:56:21.4 MH: Yeah, it’s no problem. And we also have our LinkedIn group, and we’d love to hear from me there as well. And no show would be complete without a little shout out to our producer, Josh Crowhurst, putting the show together and dealing with all of our idiosyncrasies of audio, and in this case even splicing the episode into our different testing control versions, so thank you Josh for all of your hard work. We’re really excited to see the results of this experiment. Alright, well, I’m excited about this. I loved the conversation. Ton, thank you so much for coming on the show, really appreciate it.

0:56:53.8 TW: You’re welcome, thank you for having me.

0:56:55.6 MH: Awesome, and I know that I speak for my two co-hosts, Tim and Moe when I say, if you don’t even feel like you have the right process, you know there is a process for ongoing experimentation and optimization, and part of that process is to keep…

0:57:14.3 Announcer: Thanks for listening, and don’t forget to join the conversation on Twitter or in the Measure Slack, we welcome your comments and questions. Visit us on the web at analyticshour.io, or on Twitter @AnalyticsHour.

0:57:28.3 Charles Barkley: So smart guys want to fit in, so they’ve made up a term called “Analytics”. Analytics don’t work.

0:57:34.3 Thom Hammerschmidt: Analytics. Oh my god, what the fuck does that even mean?

0:57:45.2 TW: By the way, before you both tuned in, I was talking to Tim and I gave him a compliment on the preparation you guys did for the podcast, because it was to me, really clear what to do, what to expect, which program to launch, what questions to prepare, and so well done. And I told Tim a lot of organizers should take an example from a preparation like this, because if the information would be like this then my life would be easy.

0:58:13.1 MH: That’s something that the three of us share equally on the podcast. [laughter]

0:58:21.6 TW: I’m just glad I hit record and got the second take.

0:58:26.9 TW: My in-laws were running through the city of Gouda… It’s not pronounced Gouda in Denmark, it’s like Huda or something.

0:58:35.2 TW: In the Netherlands we call it Huda.

0:58:40.4 MH: Huda doing Tim?

0:58:42.8 TW: We had an episode where I had no audio, which, as the most OCD person on the show, I within 24 hours, actually went and listened to it and re-overlaid by audio, like it was fresh enough that was just kind of…

0:58:58.1 MK: That was pre-me wasn’t it?

0:59:00.8 TW: No, that was with Elliot. Yeah, that was Elliot Morrice.

0:59:07.7 TW: Rock flag and optimization or rock flag and experimentation.

0:59:18.9 MH: Oh good.

Photo by tito pixel on Unsplash

The post #158: The Evolution of Testing & Optimization: Looking Back and Looking Forward with Ton Wesseling appeared first on The Digital Analytics Power Hour: Data and Analytics Podcast.

34 episódios