EA Talks

Responding to evidence: what we do wrong and how to do better | Anna Edmonds | EA Global London 2021

May 27, 2022
EA Talks
Responding to evidence: what we do wrong and how to do better | Anna Edmonds | EA Global London 2021
Show Notes Transcript

One of the most important parts of reasoning well involves understanding how to change your beliefs in response to new information. Unfortunately, this is something almost none of us does well without training. In this workshop, Anna will walk you through the odds formulation of Bayes’ Theorem in order to identify the three independent components we need to update properly. Since Bayesian updating on the fly can be hard, you’ll practice implementing a bundle of trigger-action plans that studies have shown to be successful in helping people respond more accurately to new evidence.

This talk was taken from EA Global London 2021. Click here to watch the talk with the PowerPoint presentation. 

Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet. You can also listen to this talk along with its accompanying video on YouTube.

0:00:00. -->
hello thank you for joining me um i'm anna edmonds and i'm a philosopher at university of michigan at michigan i teach large reasoning and ethics courses i started thinking that spending some time rethinking the standard university level ethics and reasoning curriculum would be a really worthwhile cause for ea to get behind i was asked to give a little primer for the for how the community often talks about responding to evidence so this workshop is going to try to get us up to speed with some of the more fundamental thoughts about how to rationally respond to new information scanning down the list of attendees and vocations i know that we have a really wide range of backgrounds so this is definitely going to be review to some of you and it will probably feel new to a lot of you too

0:01:01. -->
so right now we all believe a bunch of things and we had arrived at those beliefs in different ways some of them through pretty reliable processes so seeing this lectern right in front of me quite reliable process some of our beliefs we've arrived at through fairly unreliable processes so if i believe that uh international trade is always a zero-sum game because of what my father who is not an economist taught me growing up that's not a very reliable belief formation process we are overly likely to regard our beliefs in a a binary way either true or false either we believe a thing full stop or we disbelieve that thing full stop this is a problem that we're going to come back to at the end but we should and we do often have different levels of confidence attached so if i asked you right now whether nitrogen is flammable you might

0:02:02.2 -->
feel less confident than you would about whether seattle or if you're from the uk cardiff gets precipitation uh on more than 50 days per year so call the beliefs that we start with before we go on to get new information call those beliefs our priors and think of what i'm going to talk to you about today as how to update your priors how to change your beliefs in light of the new information you're going to encounter so much of being a good reasoner is about responding well to new information being aware of the priors that we already have understanding what and how much information we're getting how it relates to what we already believe if i believe that adding more teachers and say new uniforms will help attendance and school performance and without changing anything else we add those teachers we add the new

0:03:01.5 -->
uniforms and nothing significant changes how am i supposed to respond what pieces of information do i need what are the ways in which i'm likely to go astray call this process of responding to new information by changing our beliefs updating bad news none of us does it very well without training plenty of data on just how unreliable we are even the very high iq'd amongst us good news there is a formula that specifies correct updating it's just not very easy to use so i'm going to show us one that is a little bit easier to use and let me say a thing about the language that we're using before we go any further we're going to be talking about hypotheses evidence conditionalizing on evidence priors updating and presumably almost everyone has heard people talk about these things and it all sounds very scientific

0:04:00.5 -->
of course it is often used in scientific contexts but it's also describing how we should be interacting with new information all the time in our ordinary everyday reasoning contexts so the examples that you're going to see me use some of them are silly questions that we might be mulling around in the backs of our minds thinking about associated likelihoods some of them are questions that are fitting for precise empirical analysis so to start with think of h our hypothesis as any candidate view or claim or explanation that you're wondering about think of e the evidence that we're going to get as just some new information some observation that you have that you come into contact with relevant to thinking about the likelihood of h of course e could be some single observation that you make or it could be

0:05:01.2 -->
a collection of observations taken together so thinking more about the bad we're not doing very well without training what's going wrong when we fail to update properly so one problem is that most of us can't even say precisely what it is to get evidence for a hypothesis much less how precisely to uh gauge the strength of that evidence and we're not usually testing hypotheses views explanations randomly we're considering the ones that we already have ones that have naturally seemed plausible to us and sometimes they're ones that we are subtly uh rooting for we wish that they would be true if we don't understand exactly what it is to constitute evidence to begin with and don't understand how to determine the strength of that evidence it means that we're going to fail to

0:06:01.2 -->
notice information that isn't exactly what we would expect given our preferred explanation or we'll end up underestimating the strength of that observation why well because we're on the lookout for the information that confirms the views that we already have and so we're likely to miss to proper fail to properly take into account the information that's going to make our view less likely an even more basic problem even when we're trying to do everything right many of us don't know what to do look at two examples with me so for the next two questions that we look at i want you to take 30 seconds or so and think about your answer if the answer isn't obvious to you right now which i expect it not to be for many of you don't work to figure it out just yet just see what in what answer comes to mind so here's the setup that i want you

0:07:00.2 -->
to think about you're looking for your buddy and a reliable source tells you that say this time of the day on this day of the week it's 80 percent likely that your buddy is gonna be at the neighborhood pub here's what the pub is like it's got two rooms and it's equally likely that your buddy is going to be in either one of those rooms here's the information you get you walk into one of the rooms you thoroughly check that room and your buddy's not in there new question how likely is it now having checked the first room that your buddy is at the pub think for a second about your answer to that question second question i want you to think about here's the setup in an effort to crack down on crime a city installs a facial recognition

0:08:01.5 -->
software camera that sounds an alarm when it detects a registered offender's face so the city that we're talking about it's got a hundred thousand people in it they're thought to be about a hundred criminals on the loose the system is such that it correctly identifies 99 of the criminals and it incorrectly identifies one percent of the criminals one percent of non-criminals so let's say you get some information you hear the alarm how likely is it that it's been triggered by a criminal take a few seconds to think about it can jot down your answers if you want to bummer that was the example all right so after you think about your answers to both of those this is the correct way to get the answer

0:09:01.6 -->
it's bae's theorem most of you are probably at least mildly familiar with it along with a conditionalization rule that tells us that the updated probability of the hypothesis is equal to the prior probability of the hypothesis given the evidence that we've observed bayes theorem tells us that then the probability of the hypothesis given the evidence that we've observed is equal to all of that mess so one thing that we could do is we could plug in the values for all of that we could use a calculator to calculate it but it's a bit of a thorny equation and even people who work at least some of the time with bayes theorem forget the ordering of these terms on occasion and more importantly you'd need a calculator and the absolute values of these terms if you wanted to come out with an answer so let me introduce you to a lesser

0:10:02.9 -->
known but i soon hope to be greater loved formulation of bayes theorem call it the odds formulation and what happens is that we take the prior probability of the hypothesis and we're going to express it in odds and we're going to combine it with an assessment of the strength of the evidence so the first step that we're going to work on is making sure that we understand how to express the prior probability of the hypothesis in odds the second step is that we're going to learn how to assess the strength of the evidence that we're getting and then the third step will be to go ahead and update i'm showing you this formula for two reasons the first is that it is by far an easier way to update and if you practice for just a bit you're going to end up being able to do it in your head so

0:11:01.4 -->
most of you in your day-to-day reasoning lives won't actually be updating on the fly so i'm sort of using it for another reason in addition it's a really good tool to illustrate some of the common ways that we fail in responding properly to evidence the reason that i think it's a nice tool is that it's a nice visual representation of the separation between the components we've got we've got the prior odds and then we've got the strength of the evidence which next we're going to see it's a comparison of two values and just those pieces together end up getting us the new odds and so if we can actually use this visual separation of the components to remind ourselves of what the pieces are that we need in order to update properly uh i'm thinking that it will

0:12:02.1 -->
help us stop making some of the most common updating errors that we make by really focusing us in on what the component pieces of information are all right so first thing i'm going to have you do is just get a little bit of practice moving between uh probability claims and odds we're more used to hearing uh claims about probability likelihoods expressed in fractions or decimals as opposed to odds but it is easy to move back and forth from fractions or percentages to odds we just have to remind ourselves from middle school how to do it so think about odds as a ratio of the true to the false and move from a quick example so if i say that rain is 90 likely or that there's a nine tenths chance of rain i want you to be able to quickly express

0:13:00.1 -->
that in odds say that the odds of rain is nine to one all right so right now just turn to someone else at your table and talk them through each one of these three move between percentage assessments two odds or if i gave it to you in odds go backwards to likelihood so let's see for the first one we've got a 75 chance of rain somebody raised their hand tell me what that is in odds you three to one good all right second one we've got a one in three chance the stores open somebody raise their hand what is that in odds yes excellent one to two odds that it's open we start with six to five odds of being in group a somebody give that to me as a fraction yes good both of you six out of 11. all right pretty easy to move back and forth

0:14:01.1 -->
between likelihoods and odds so let's move on to the next example okay so we're thinking now in terms of how to gauge the strength of evidence this is step two in our uh updating training but it's also so important on its own it's the money question which is why i put those monies up on the screen it would be better if they were all 100 bills because it's that much of a money question why is it the money question well it's a way of reminding us to consider a piece of information about how we naturally a piece of information that the way that we naturally reason has set us up to neglect namely how might things look if our hypothesis were actually false what are the ways that you could still get

0:15:00.2 -->
the evidential information that you're getting even if your considered view or explanation isn't right so this is phrased as a comparison a comparison of the probability of e given h to the probability of e given not h and compare you must do you may not assume that you're getting evidence for your hypothesis just because you're noticing just what you would expect to notice if your explanation is right so you must consider how likely it would be to observe this thing that you're observing even if your hypothesis is false so this question comparison is going to tell you whether or not the information that you're getting is actually evidence and if so how strong the evidence is so if you divide these and you get a one out here you're getting no evidence for or against h if you get a number over one

0:16:02.7 -->
you're getting evidence for h under one you're given getting evidence against h the number that you get is often referred to as the bayes factor some of you might have heard it called that before i'll usually refer to it more descriptively as the strength factor so note that there are two components of the strength test sometimes you're actually going to be ballparking them independently and sometimes you're going to have precise numerical inputs for those values other times gauging the strength of the evidence is naturally comparative so you might be thinking how much likelier am i to observe this thing that i'm observing if h is true as opposed to false and maybe you're thinking i think i'm about 10 times likelier to get this information if h is true as opposed to if it's false so if that's the case if

0:17:01.3 -->
you're comparing how likely they are and you're thinking i think it's about 10 times likelier to get this if the hypothesis is true well you have a ready-made strength factor of 10 in that case all right so look through an example with me say you're wondering whether someone is into you and you get some information you're having a halloween party and you've invited them and they show up early wearing a very elaborate costume so you're thinking about whether this observation that you've just made them showing up in this elaborate costume early is evidence that they like you and so you ask the question how much likelier is it that they show up early wearing this costume given that they like me compared to given that they don't like me and you're thinking definitely likelier if they like me

0:18:00.7 -->
so you know that you're getting some evidence and you gauge that evidence to be say five times likelier than if they didn't like you all right so working through uh this problem let's let's walk through it together i started at a one in four chance that they liked me uh and so i want to think about that in terms of prior odds let's see we can go ahead and write this out together conveniently wore an eraser all right so my i'm going to do prior odds times my strength factor and that's going to tell me my new odds so my prior odds um i'm at a one and four chance so i'm going to express that as one to three in odds and then i'm thinking about my strength factor and uh i thought initially that if i got that information it would be five times

0:19:01.4 -->
likelier uh my odds you know here is where i'm starting with um not like me here i'm i'm uh thinking about like me um reverse um and so the evidence that i get i said that it was five times likelier to get that uh if they did like me so i'm asking myself which side of my odds is that supporting and i say yeah it's definitely evidence that they like me as opposed to evidence that they don't like me so i'm actually just going to apply it here what i mean by that is multiplying this side of my odds by my strength factor and so when i multiply 5 by 1 i end up at odds of five to three if i wanted to go back and and express that in terms of a fraction or in terms of probability i would do five out of

0:20:01.4 -->
eight so i moved from being uh 25 confident that they liked me prior probability and then got this information and uh updating makes me end up over 50 likely that uh they like me all right so uh going back to our pub case and in this example we get an opportunity to actually plug in for the different components of our strength test so here um let's think about prior odds of well we have four to one because we had a prior probability of 80 so 80 in prior odds is four to one uh and then we want to figure out what our strength factor is so we're going to

0:21:03.2 -->
compare the probability of the probability of the evidence given the hypothesis to the probability of the evidence given the negation of the hypothesis and here's how i'll do that my hypothesis of course is that our buddy is in the pub so i say assume buddy's in the pub how likely would it be for me to get the information that i in fact got namely that i checked that room thoroughly and they're not in there so if they're in the pub i've checked half of the space in the pub so the likelihood that i get that is one over two fifty percent one half all right and then i'm going to compare that divide that by the probability of the evidence given the negation of the hypothesis start right here say assume that it's not the case that buddy's in the pub how likely would it be for me to get the evidence that i got namely that i checked a room and they're not in there well if they're not

0:22:00.7 -->
in the pub it's guaranteed uh that i would get that 100 so what i'm doing is i'm dividing 50 by a hundred percent i end up with a strength factor of one-half now one-half i was testing for the hypothesis so i know that since it's a fraction it's against the hypothesis i could either do uh the uh since these are since h and not h are um complementary i could do uh a strength factor of 2 4 not h or i could do a strength factor of one half for h so let me go ahead and multiply the one-half on the side of it so this is a pub this is not pub i'm getting one half evidence against pub all right so when i multiply that i end up with new odds of two to one if i want to express that as a likelihood or a

0:23:00.7 -->
fraction i'm at two out of three all right so that's how we use the odds formulation to update on pub example what i want you guys to do now is if you didn't quite know how to do criminal case feel free to go back and use the odds formulation to do the facial recognition case for criminal detection if you feel like you got that one right away go ahead and practice a new one the phone charge case so take a few minutes you can work through it on your own if you get stuck turn to someone at your table and see if you can work through this with them all right let me tell you how uh i would actually think through it for the criminal case now you know these are pretty big numbers and maybe i don't

0:24:00.1 -->
want a totally precise uh answer if i did want a totally precise answer i'm gonna be starting uh with let's see it would be prior uh prior odds of um uh 100 to 99 9900 what i'm actually going to do here is i'm going to say okay 100 100 000 knock off two zeros so i'm thinking okay roughly the prior odds here something like one to a thousand then i'm going to think about the strength factor and i'm going to think okay assume that it is a criminal how likely is it to detect it 99 assume it's not a criminal how likely is it to detect it one percent so i'm thinking okay strength factor of around 99. great i'm going to call it a strength factor of about a hundred so i'm starting with one two thousand odds i'm getting a strength factor of about a thousand about a hundred i know that i'm going to apply that on the side of the criminal

0:25:00.4 -->
one to a thousand so i'm thinking okay i'm gonna end up at roughly a hundred to a thousand odds if i reduce that i'm gonna be at basically one to ten so that's not precise um the precise answer if i were doing it as a fraction and i had done the odds precisely would actually be 99 out of 1098 or approximately 9 but see how easy it is to do just to ballpark how these numbers are working together to realize that you're starting at around a fat one to a thousand odds you're getting a strength factor about a hundred it's supporting the side of the one so you're ending up at roughly a hundred to a thousand and that's a factor of 10. so you end up at roughly 10 likely a really easy thing to do in your head so if we were going to show how exactly we would do the phone case so let's see it tells us that we are starting out 50 50. so i'm going to think of that as one to one prior odds

0:26:02.6 -->
and then i get some evidence namely that when i come home the phone is still plugged in downstairs and i'm told that that happens roughly one once a week so when i'm thinking about what the strength factor is well i'm thinking about the probability of the evidence given the hypothesis assume that uh partners already in bed how likely would it be to see what i saw assume partners not in bed how likely would it be to see what i saw um all right so when i'm dividing that i'm realizing i'm getting a a strength factor of six for um not in bed yet so if i'm thinking of this as bed and this as not bed while i realize that i'm getting uh evidence with a strength factor of six that i'm going to apply to the bedside so i'm

0:27:00.5 -->
going to bedside i'm going to end up at odds of one to six if i wanted to express that in terms of probability i'm at a one seventh chance upon seeing the phone still plugged in that that they have um that they have gone to bed so probably still up all right so um those are our answers already went through them and this is think of this as a little chart that you can just run through to get some kind of benchmarks in your head so uh i'm thinking about starting with a pro a prior probability that's pretty unlikely that's why i put it in red maybe you're at one out of a thousand and then think of getting some strong evidence uh a strength factor of a hundred how does that work uh against

0:28:01.2 -->
each other well you're starting out really unlikely you get some pretty strong evidence and you end up with a likelihood of around ten percent um when i see numbers like that it's just reminding me how important it is to take into account prior probability because when you get evidence with a strength factor of 100 that's a lot of evidence but when you see how that interacts with something that has a quite low prior probability where you end up as only 10 percent likely on the other side you can really see the role that the prior probability is playing start with a prior probability of around 1 out of 10 if you get a strength factor of decent evidence strength factor of 10 um what that does is it makes you even roughly 50 percent likely that your starting hypothesis is true start out 50 50 get a little bit of evidence with a strength factor of three what does that

0:29:00.4 -->
do it gets you 75 likely start out 50 50 get really strong evidence of a hundred what does that do it practically swamps it practically guarantees a little bit under 100 that your hypothesis is true all right so the reason that i'm doing this is because i don't really think that what you guys are going to be doing after this is just updating on the fly uh taking into account what your prior probabilities are actually gauging the strength of your evidence actually updating and spitting out your new probability but what it does for me is it just gives me a couple of sort of stock examples to remind myself how prior probability and strength of evidence are interacting in order to get me closer so the interesting cases what we see when we're looking at how people generally do in these cases well what happens is that people are way off

0:30:00.4 -->
it's not that they're close but don't know precisely how to do it it's usually that they end up forgetting a component entirely so what goes on in uh the initial pub case well i've taught this example a lot to students and it's it's roughly two percent of students that are getting the right answer so the most common answer that i get is that they continue to think it's 80 likely that uh the person is in the pub uh so that person is just not updating on new information at all they stuck they saw the prior and they're just like well it could still be true we haven't eliminated the possibility that it's true uh when we look in one room you told me it was 80 so i'm just going to remain 80 likely uh that person has just failed to update as they get more and more information you can imagine them uh checking a space where they only have a little bit of space left are they still

0:31:01. -->
gonna remain eighty percent confident the way they started as they get that information no we want to respond dynamically we want to as we get new information incorporate that and the weird thing that we see uh in terms of something like neglecting uh the base rate where we just look at something like the evidence that we're getting is it happens to even professionals it's happening to doctors in medical systems where if you see that something has a test rate of correctly identifying 99 of cases in uh correctly identifying one percent of cases what ends up happening if you start one out of a thousand likely and you get a positive test even doctors are very often getting uh the result that it's 99 likely upon getting that test positive that you've got the disease this is nuts it's really good indication of just how naturally bad we are uh at doing this so

0:32:03.6 -->
if we can just get in the habit of reminding ourselves that we've got all three of these components we've got the prior odds it's separate from the assessment of the strength and the assessment of the strength not necessarily has to take into account both the numerator and the denominator of this even when it is delivered to us as a kind of comparison so what we want to do is we want to make sure in each case that we've properly taken to into account all of these uh components and so what we're going to do next is i'm going to run you through some common kinds of reasoning and i want you to think about what if anything you think is going wrong in these cases all right so first case the scenario is that after a school year

0:33:00.4 -->
in which the average number of absences per student is higher than usual superintendent institutes a policy in which students lose recreation periods when they exceed the allowed number of absences this is what happens the following year the average number of absences per student dropped significantly superintendent concludes this is strong evidence that the policy works all right so i want you to keep in mind the things that we've learned how the strength test for evidence works uh think about this policy and talk with the people at your table about what if anything you think might have gone wrong in this kind of reasoning uh taking into account what the superintendent's conclusion was so take just a minute to turn to your table see what's going wrong here we're thinking about the superintendent

0:34:00.8 -->
getting that information that there's a significant drop and concluding that there's uh some strong evidence uh that the policy was effective all right what's going wrong here how can we use our strength test to pinpoint exactly what it is well first thing we know how to determine whether or not there's evidence we need to compare these values right and so looks like what the superintendent is doing is saying okay assume that as she expected the the policy is effective what would she expect to see well she instituted it in order to see a significant drop it's exactly what she expects to see and so she's thinking yeah good probability of evidence given the hypothesis exactly what i was expecting to see concludes strong evidence and we say no no you do not know whether you

0:35:00.4 -->
have any evidence because to know whether you have evidence you must necessarily compare it to the probability of the evidence given the negation of the hypothesis so what we want her to do is we want to say okay assume that in fact your uh policy was not effective uh how likely is it that you could observe the thing that we in fact observed and what i'm thinking is that uh in fact it's pretty likely that we could observe that significant uh drop even if her policy wasn't effective okay why well in particular uh i'm thinking that she should worry about something like regression to the mean so why did she institute the policy to begin with presumably because the problem had gotten unusually bad and so you know if you're starting at a point where uh the problem is pretty bad what

0:36:01. -->
should you expect to see uh the next year well probably you're gonna see it improve some anyway so uh an improvement in the in the numbers the following year is exactly what she should expect to see due to regression to the mean even if her policy uh isn't effective and i'm also thinking about problems like what if in response to the same information that the superintendent is picking up on parents say are cracking down on uh their kids about attendance okay so that's another explanation of how we could how it could come about that we're seeing exactly the evidence that we're seeing but that it's not in fact evidence specifically for the hypothesis uh it could be a reason that's just as good to believe some competing hypothesis so what would actually happen numerically here is that yeah we're going to get a high number for the probability of the evidence given uh the hypothesis but the

0:37:01.3 -->
problem is it seems um maybe even just as likely but minimally plenty of reason to think that this number is also going to be high if it's just as likely we're not going to get a number greater than 1 out of this meaning that we're not even getting any evidence and minimally we're going to get a small amount of evidence out if there are plenty of ways that we could observe not h even uh if that evidence uh even sorry plenty of ways that we could observe the evidence even if not h um was true all right so boiling this down to a kind of problem that we want to be on the lookout for uh call this problem one-sided strength testing and one-sided strength testing is happening when you're only considering the likelihood of observing the evidence if the hypothesis is true but you're not thinking about the probability of the evidence if the

0:38:01.6 -->
hypothesis is false so how to detect it how to be on the lookout for it well when you're expecting a certain result and you end up observing that result exactly what you were expecting it'll feel like you're getting even more evidence for your view for your candidate explanation your hypothesis and so what ends up actually happening is we just keep sort of ratcheting up uh our confidence in response to the evidence that we are anticipating getting being on the lookout for it spotting it and assuming that it's the kind of information that should increase our hypothesis further so what should you be doing in response a couple of questions that really help address this kind of problem remind yourself to ask might things still look like this might i have still gotten this information even if the hypothesis isn't true try to channel what someone with the

0:39:01.7 -->
opposite view as you have would say about the evidence if someone didn't believe that h was true if that person observed the same evidence that you observed how would they explain it so you started out confident that you were getting evidence that should increase the likelihood of h but if there are other views that account for h just as well then it's just not going to be evidence uh for age and sometimes what's going to happen when you ask yourself uh this question you might end up discovering that there are views that explain the information that you got even better than your view all right next scenario i want you to think about the day after the notorious halloween party you text your potential someone and by that night you haven't received a reply so you remind yourself that it was a

0:40:00.8 -->
real rager and so you take it as no evidence that they're not interested when you fail to get that reply because you think they're probably just sleeping it off all right i want you to think about this way of responding to that information again think about uh what we know about the strength test for for evidence talk with the people at your table about whether that kind of reasoning is okay take 60 seconds to do that okay so we're thinking this way when you get evidence what it means to be evidence is to be something that ought to increase your confidence in the hypothesis and we know that we determine whether there's evidence by comparing these two values the likelihood to observe what we observed if the hypothesis is true as compared to whether it's false so this person is

0:41:01.5 -->
dismissing what we should take to be evidence against the hypothesis that they like you why well assume that they in fact like you how likely would it be for them to text back when you text the next day well minimally we think it's going to definitely be likelier than that they don't text back when you text the next day and so as long as this value is at least some bigger than this value it means that you are getting evidence so if our hypothesis in this case is that they like you this value is going to be bigger we're going to end up with a fraction so we're getting evidence for the negation of the hypothesis all right so what we're often inclined to do is we're inclined to do something

0:42:01.5 -->
like uh consistency testing to see whether an observation could be true and that our hypothesis remains true hasn't been falsified by the observation of that information so call this problem when we don't update at all in response to the fact that actually we have gotten evidence call this problem this comes from economist brian kaplan heads i win tales were even reasoning here's how we often operate instead of treating all of our beliefs as degrees of confidence we think about something like thresholds required to have enough support to count as believing or enough evidence against to count as disbelieving but then what happens if we're in that kind of binary threshold mentality when little bits of evidence

0:43:01.1 -->
come in we're inclined to just ask whether it's enough to push us over the precipice to say disbelieving and if it's not then we just end up ignoring that evidence entirely and of course the real problem is that depending on how we feel about the hypothesis that is going to determine different levels where that threshold exists so it turns out being that for hypothesis that we prefer say we already have that view we're in some sense maybe even kind of subconsciously rooting for that view to be true we ask a much more permissive question when uh disconfirming information shows up we ask something like can i still believe my hypothesis and if we think yeah uh the evidence is still consistent with my hypothesis being true what happens is we just end up disregarding the evidence and not updating at all

0:44:02.1 -->
so it's feeling like um well you know this evidence that i've gotten it's not uh it's not the case that it necessarily disconfirms my hypothesis my hypothesis could still be true so i remain exactly as confident this is the remedy this is what i want you to do in response to this kind of feeling i want you to think about something that is called the opposite evidence rule so mathematically we can show that if e increases the likelihood of h then not e must decrease the likelihood of h you don't have to know exactly why this is true but just remember the rule that if you'd treat some piece of information some observation e as evidence for h then learning the opposite of that

0:45:00. -->
information experiencing not e has to be at least a little bit of evidence for not h so ask yourself if your potential someone had texted right back saying that they'd had a great time would you have treated that as evidence that they're into you uh if so and i think definitely we would have then you must take into account how you would respond uh to that op how you should respond to that opposite evidence so if getting an affirmative text response is e4h well failing to get that response has to be at least some evidence against h what do we know when we're getting some evidence against h well we need to decrease our confidence in h at least a little bit so notice that for these numbers we haven't uh we haven't put in any kind of numerical precision so we don't know exactly how much evidence we're getting because we didn't consider that comparison to begin with

0:46:01.1 -->
but the opposite evidence rule can often really confirm if you try on for size having the opposite evidential experience and you know right away that you would have responded the opposite way well you need to understand that you're getting at least some evidence against your hypothesis all right next scenario that i want you to think about you've been wanting to track down your long-separated biological sibling one day you run into someone at the airport who shares a striking number of traits with your sibling she's about the right age about the right build she's got curly hair and has facial vitiligo you reason the likelihood of having all of these features if it isn't her is only around one in a hundred thousand so you say wow given how unlikely that is it'd be super unlikely for for me to see someone like

0:47:00.4 -->
this at the airport if it wasn't her maybe only one in a hundred thousand you're thinking all right so think about somebody reasoning this way take a second talk to people at your table and ask what if anything of course trick it's always something um ask the people at your table what do you think is going wrong in this scenario so here what's going on you uh you know you're maybe what you've done is you've multiplied out uh all of the likelihoods of those traits to come up with the very tiny likelihood that somebody has all of them and so you're thinking of the probability that they have all those if it's not in fact her you know you're thinking about this probability of evidence given the hypothesis being really really low and then you think okay assuming his is her how likely would it be to look like that

0:48:00.4 -->
and you're like yeah really likely um and so what happens when you're comparing these values and this one is high and this one's uh really really low what you're noticing is that you're getting a ton of evidence what happens when we get a ton of evidence well if you get really strong evidence for something then you're likely to think hypothesis has got to be true and so what's happening is we're just thinking about the strength of the evidence without having taken into account the prior probability and this in particular is why we should worry think just about the population of the u.s well if it's around 330 million how many uh sets of 100 000 in there well there's gonna end up being over three thousand thirty three hundred uh people that look roughly like that in a population of three hundred thirty million so um what has happened is that in neglecting the

0:49:00.5 -->
base rate and failing to take into account uh the prior probability of someone looking roughly like that person that you saw you end up just focusing on the massive amount of evidence that you think that you're getting um and uh fail to recognize that actually maybe it's something like a one in over three three thousand chance all right so if we get a lot of evidence and we move directly to h without having considered uh the prior probability you've probably heard the phrase neglecting the base rate remedy don't do that don't neglect the base rate um notice that we're inclined to actually make a kind of similar error in statistics sometimes so notice that the denominator of the strength test um is actually the p-value if you're

0:50:02.5 -->
familiar with that term from statistics so the probability of observing data at least that extreme if the null hypothesis is true and what we do often see in statistics even with people who are fairly well versed is that people are inclined to move from a highly significant result or a low p-value to the automatic assumption that their hypothesis is true but the p-value hasn't factored in the prior probability of that hypothesis so note that um we need to take into account once again all of those pieces of information before we conclude uh how likely all right so what i want to do is take a few minutes to um uh sum up what i think some of the upshots of updating uh this way are um

0:51:02.8 -->
and i'll also take questions at the end i'm told that we can stick on in this room a little bit longer if you do have questions so i'm thinking that the two uh major uh upshots of thinking about updating with the odds formulation in this way the first i'm going to say is that it helps us inhabit the degreed perspective let's say a little bit more about that in a second the second is that it's combating our unfortunately very natural tendency towards confirmation bias all right so putting putting together the upshots of thinking about this way the first that it's getting us to inhabit the degreed perspective so we should be thinking in terms of degrees of confidence anyway it's really rare that our evidence justifies all-out belief or disbelief so

0:52:03.9 -->
in one way it's helping us avoid overconfidence to think explicitly in terms of our confidence level that a given claim or view that we're considering is true or false and when we do that and no longer are thinking of how much evidence would be required to push us over the precipice from say belief to disbelief or the reverse we're likely to take into account all of the evidence we want to be updating incrementally on all of the new evidence that we get why well because small amounts of evidence add up and if we keep ignoring little bits of evidence that wouldn't be enough to shift us over to the opposite of the belief that we currently hold it means that we can't sort of slowly amass enough considerations against our view to eventually change our beliefs so

0:53:02.4 -->
we want to know that we are routinely responding even to the small amounts of uh evidence that uh that could eventually collectively change our belief and then the next thought most of you have heard of confirmation bias we're naturally inclined to notice or take seriously information that supports what we already believe inclined to fail or notice fail to notice or dismiss entirely information that goes against what we already believe these strategies of focusing us in on both components of the strength test are ways broadly of doing something that we might call considering the opposite in a way that we rarely naturally do so asking how people with the opposite view would respond to the evidence that you got asking how we would have responded to having gotten the opposite

0:54:01.1 -->
evidence so in fact studies show that although almost no amount of prompting people to just really consider the evidence carefully or make sure to be as fair or as unbiased as you can helps to mitigate our massive natural tendency towards bias it turns out that getting people to ask these questions getting people to think about how a proponent of the opposite view would have responded or how they would have responded to getting the opposite evidence it does turn out to significantly help so getting people to actively engage in the kind of mindset where you are habitually asking those questions that turns out to be highly effective so uh what we want to do is we want to remember these things that the prior probability is separate from must be decoupled from our assessment

0:55:02 -->
independent assessment of the strength of the evidence and that when we are assessing the strength of the evidence that we need to take into account not just uh how likely it would be to observe the thing that we're observing if our view is right but importantly that one how likely we might be to observe it even if uh our view is not right so thinking about uh those components uh separately and actually getting getting yourself to think through those questions uh best way i think we know of at this point uh to uh get yourself to respond more rationally to new information all right that's it uh happy to take as many questions as there are so feel free to run out if you've got someplace else to be and otherwise i'll stick around and answer some questions thank you so much for showing up [Applause]

0:56:01.4 -->
so you mentioned the uh opposite evidence rule it reminded me of when people say evidence of or absence of evidence is not evidence of absence uh it seems conflicting to that is there some nuance there or are they wrong the people that say that absence so failing to get any evidence is not evidence that basically that evidence doesn't exist yeah so one thing that i would want to say uh about that is that um we're set up antecedentally to spot uh particular uh pieces of evidence call this something like selective noticing and we're set up antecedently to be way more inclined to miss other pieces of evidence and so if we know that uh you know the way that our

0:57:00.7 -->
brains are naturally going to be um on the lookout for some evidence and inclined to miss the kinds of pieces that would be evidence were we to consider them but in fact we'll be way less likely to notice them given that they're not fitting as well with the hypothesis that we already have then that's that's a nice way of fleshing out why uh it is that you know it's not great information that you know that conflicting evidence uh doesn't in fact exist so yeah you know just thinking about different kinds of um views that i might have and if i think that there are things that fit particularly well that would look a certain way with that view but i just have really failed to consider like how things would look um if that view isn't true well you know those things um are potential pieces of evidence and i'm just um missing them uh so um i think i think that's a nice way of thinking about it you know just in terms

0:58:02 -->
of which pieces of evidence i'm inclined to bump up against yeah so you mentioned how sort of the p-value was always yeah very strong evidence of whether you should believe something or not do you have any like thoughts on how given this updating formula we should respond to scientific evidence yeah so um lots lots to say about um p-values so one thing uh is that you know depending on what the prior probability of the hypothesis is um you know if it was decently high and we end up getting a p-value a statistically significant result with you know a p-value that's quite low what is what is that meaning well it is meaning that we're you know we're getting a lot of evidence and so

0:59:00.7 -->
if the prior probability wasn't terribly low which i think often in scientific testing cases that is the case well we might be we might be getting really really strong uh evidence to believe the truth of the hypothesis but it's also the case that we can't assume that at the out so some of the time we're going to be in cases where um you know we actually it's very hard for us to even gauge the prior probability of the hypothesis and if it's the case where actually you know it wasn't fairly um it wasn't all that likely to begin with then you know even a fairly low p value we should be pretty worried the other thing is that you know what p values are telling us uh the probability uh that we get this result given the truth of the null hypothesis well you probably already know this because you're asking asking this question in particular but um let's say that we have

1:00:00 -->
a standard p-value of 0.05 well that's telling us that roughly there's a 5 chance that we're going to um get this evidence you know even if not h is true but think about the way science is done right and we've got this kind of background scenario in which scientific results are likelier to get published if they are surprising if they're counter-intuitive well what does that kind of background do well if we're not being a good scientist who is pre-regis pre-registering all of our trials we end up in a kind of situation where um well let's say 20 trials have been done if it's way likelier to have published like the surprising results or strong strong positive confirmation results and to have just you know filed away the ones that you know come out as not significant or aren't very surprising well then it

1:01:01.9 -->
makes it a whole lot likelier that the ones that we bump up against in scientific literature are in that five percent chance um and so that you know uh if i don't know anything about what the methodology is with say pre-registration um of uh you know hypothesis and trials and whatnot and i don't feel like i have a good sense of if i have uh if i'm able to tell whether there have been lots of negative results that i haven't heard about well i'm going to be really worried about something that is clearing the bar for statistical significance on my p-value but if it's likelier that it's.