Announcement

Collapse
No announcement yet.

Statistical Significance

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical Significance

    From the EQLive boards.

    Originally posted by Ngreth
    We ran a fairly large set of tests on this.

    Jewelcraft Skill of 299, No AA's, No mod, trivial 335
    Expected Success Rate: 95%
    1000 combines - 961 = 96.1% Success Rate

    Jewelcraft Skill of 300, No AA's, No mod, trivial 335
    Expected Success Rate: 95%
    1000 combines - 966 = 96.6% Success Rate

    Jewelcraft Skill of 299, No AA's, 15% mod (maxed trophy), trivial 335
    Expected Success Rate: 95%
    1000 combines - 949 = 94.9% Success Rate

    Jewelcraft Skill of 300, No AA's, 15% mod (maxed trophy), trivial 335
    Expected Success Rate: 95%
    1000 combines - 960 = 96.0% Success Rate

    the differences in the numbers are statistically insignificant, and show that in both cases there is not effect on failure rate having the trophy.
    Now, my statistics and probability are a bit rusty, but here goes.

    The chance to succeed, "S", is a binomial random variable, with (in theory) P=.95.

    Let E(X) be the expected value of a random event, V(X) be the variance, SD(X) the standard deviation, and C(X) be the 95% confidence radius.

    (by definition, SD(X) = sqrt(V(X)), and C(X) = 1.96*SD(X))

    More notation: S^1000 means "1000 trials of the random event S".

    The following are "stat 101" formulas, given that successive trials are independant:

    E(S^1000) = 1000 * E(S) = 950
    V(S^1000) = 1000 * V(S) = 1000 * .95 * .05 = 47.5
    SD(S^1000) = sqrt( V(S^1000) ) =~ 6.892
    C(S^1000) = 1.96 * SD(S^1000) =~ 13.51

    So, 19 times out of 20, we would expect after 1000 trials to get between 936.49 and 963.51 successes. (ie, 950 - 13.51 and 950 + 13.51).

    Note that
    Jewelcraft Skill of 300, No AA's, No mod, trivial 335
    Expected Success Rate: 95%
    1000 combines - 966 = 96.6% Success Rate
    966 is outside of this interval. In other words, the above post demonstrates evidence that with a JC skill of 300 and no AAs and no MOD, on a trivial 335 item, the chance of success is greater than 95%.

    Now, is it my math that is off, or did Ngreth incorrectly claim there was no statistical evidence for an anomoly?

    Going deeper into the rabbit hole, let us compare the differences between trials.

    2 and 3 are the furthest apart:
    Jewelcraft Skill of 300, No AA's, No mod, trivial 335
    Expected Success Rate: 95%
    1000 combines - 966 = 96.6% Success Rate

    Jewelcraft Skill of 299, No AA's, 15% mod (maxed trophy), trivial 335
    Expected Success Rate: 95%
    1000 combines - 949 = 94.9% Success Rate
    The success percent on 1000 combines is a random number.

    The null hypothesis is that both are 1000 trials at an independant 95% success rate.

    If the null hypothesis is true, the expected value is 95% and the SD is 0.689%.

    The difference between two independant random variables is a random variable. The variances (not SDs) add, and the expected values subtract.

    So, the expected value of the difference between trial 2 and 3 is 0%, with a standard deviation of 0.975%. This gives a confidence interval of 1.91%.

    Ie, for any of the above two trials, the difference should be less than 1.91%, or we have an anomoly.

    The observed difference between trials 2 and 3 is 1.7% -- within the confidence interval.

    So maybe that is what Ngreth did -- compared the differences between the samples, and assumed the 95% success chance was accurate?

    On one hand, the samples are not statistically different from each other. On the other hand, one of the samples is statistically different from a 1000 repeated 95% chance trial.

    Combining all 4 trials together, we get 3836 successes out of 4000 attempts.

    Assuming the null hypotheses (95% chance of success), the expected number of successes would be 3800. The variance would be 190. The SD would be 13.78. The Confidence Radius would be 27.

    3836 has a z-score of (3836-3800)/13.78 = 2.6. The chance of this happening by random luck, assuming the null hypothesis, is 1 in 215.

    Ngreth's post is strong evidence that the chance of success at that recipie is not 95%. It is higher. This matters to the discussion at hand because a higher underlying success chance gives a lower expected variance, so the difference between trials 2 and 3 becomes more statistically significant.

    If you assume a 96% chance of success, the difference between trial 2 and 3 becomes equal to the confidence radius -- ie, there is only a 2.5% chance that a given two trials would be that far apart by random chance, assuming the 96% null hypothesis.

    My stats are rusty. Anyone want to tear my work apart?
    --
    I am not the Yakatizma you are looking for.
    No, really.

  • #2
    As posted in other threads, there could be some other factors at work (ie. lag).

    I am not sure what your point is other than to throw a lot of mathematics at this.

    There could be slight rounding errors in the approximations. (just dropping the decimal instead of rounding up, etc.)

    There could be a variance between items 1 point above your trivial compared to at the peak of the 95% success range. IE. at 300 skill attempt a 301 trivial compared to a 342.

    I am not certain. And the worse case in there was only 94.9%. If I know that on a trivial or near trivial combine I am going to have a 95% success rate (or better) then that is fine with me.

    If they claimed it was a 96% success rate, what difference would it really make? On any 1 combine you could get unlucky. Also, it helps lead to the theoretical postings that a 40 point increase over the trivial of the combine by your raw skill translate to ~1% increase in success rate to the point of 200 raw skill over trivial is roughly no fail.
    Shawlweaver Sphynx on Cazic Thule
    Master Artisan Aldier on Cazic Thule

    Comment


    • #3
      Jewelcraft Skill of 299, No AA's, 15% mod (maxed trophy), trivial 335
      Expected Success Rate: 95%
      1000 combines - 949 = 94.9% Success Rate

      How do you get a maxed trophy when if your skill is not 300.
      Liwsa 75 Druid Prexus - Retired


      Comment


      • #4
        He's a dev; he can literally summon the top-level trophy on his Dev account and hand it to himself on the player account.
        Sir KyrosKrane Sylvanblade
        Master Artisan (300 + GM Trophy in all) of Luclin (Veeshan)
        Master Fisherman (200) and possibly Drunk (2xx + 20%), not sober enough to tell!
        Lightbringer, Redeemer, and Valiant servant of Erollisi Marr

        Comment


        • #5
          I haven't verified Yakk's calculations, but his procedures and formulas appear correct. Assuming no math errors, that one trial does appear statistically significant.

          Originally posted by Aldier
          As posted in other threads, there could be some other factors at work (ie. lag).

          I am not sure what your point is other than to throw a lot of mathematics at this.
          The point is the one you just made. The devs claimed that there is no difference on success rates between using a trophy and not using a trophy. Ngreth performed a number of experiments to see if that was true or not. One of those experiments actually shows that there IS a difference. As you said, some other factor may be stepping in and causing a slight change in the expected result.

          At this point, we can't actually draw a firm conclusion. One of the cornerstones of scientific analysis is that a result must be repeatable for it to be valid. If this trial is repeated a few more times and the result is consistently outside the expected range, that means there's a problem and the devs or coders should investigate. However, if upon repetition, the results fall within the expected range, that just means this one trial got unlucky.

          To give a more common example, let's say you're doing a survey to determine the average height of people in the United States. You randomly pick 100 people from all across the US, measure their height, then take the average. Assuming you really did get a random selection, the average height would probably be somewhere between 5-6 feet. Now let's say that due to a glitch in your system (or maybe due to just pure luck of the draw), the 100 people picked for your survey were all NBA players. Suddenly your average height skyrockets to 6-7 feet. That doesn't mean the entire US suddenly got taller; it just means there's a glitch in your system. If someone else repeats your survey but doesn't encounter your glitch, they'll probably get an average height of 5-6 feet.
          Sir KyrosKrane Sylvanblade
          Master Artisan (300 + GM Trophy in all) of Luclin (Veeshan)
          Master Fisherman (200) and possibly Drunk (2xx + 20%), not sober enough to tell!
          Lightbringer, Redeemer, and Valiant servant of Erollisi Marr

          Comment


          • #6
            The main purpose of the post was asking for help with the accuracy of my math.

            Because if my math is right, then
            1> The max chance of success isn't 95% like Ngreth claims. It is closer to 96%, and reasonably could be as high as 96.5%.

            This, possibly, changes an assumption in Ngreth's statistical analysis. Which places his conclusions into question.

            2> If the success chance is high enough, then Ngreth's claim that there is no statistical difference displayed here between these samples becomes incorrect. Rather, test run 2 and 3 show a statistically significant difference, with the nominally higher skill player having statistically significant lower chance of success.

            ...

            But, like I said, my statistics is rusty. So I'm hoping someone out there who knows some statistics/probability will point out if and what I got wrong.
            --
            I am not the Yakatizma you are looking for.
            No, really.

            Comment


            • #7
              OK, I understand what you mean. I feel that I have a reasonable grasp of statistics and your numbers seem correct. I am now stepping back from the mathematics and looking at this.

              Dev_01 claims it is a 95% chance of success.
              Tests show it is actually closer to 96%.

              Is this a big deal to worry about and try to correct? What results do you want to see from this. Do you want it adjusted down to the 95%? Do you want to show that the developers math was off? To the untrained eye these appear similar enough that most would say, ok, 95/96%. That is not a huge difference. I am just wondering if you are trying to split hairs here.
              Shawlweaver Sphynx on Cazic Thule
              Master Artisan Aldier on Cazic Thule

              Comment


              • #8
                Can we get a recipe for Aspirin, please? This thread gave me a pounding headache.


                Comment


                • #9
                  Well, if the chance of success is, say, 96.5%, then Ngreth just posted statistical evidence that having a trophy reduces your chance of success.

                  If the chance of success is 95%, then Ngreth posted numbers which show no statistical evidence that having a trophy reduces your chance of success.

                  So the exact success rate has an impact.

                  Dinner time.
                  --
                  I am not the Yakatizma you are looking for.
                  No, really.

                  Comment


                  • #10
                    Originally posted by Yakk
                    If the chance of success is 95%, then Ngreth posted numbers which show no statistical evidence that having a trophy reduces your chance of success.
                    Is this not what he was trying to show?
                    Shawlweaver Sphynx on Cazic Thule
                    Master Artisan Aldier on Cazic Thule

                    Comment


                    • #11
                      Originally posted by Aldier
                      Is this not what he was trying to show?
                      TRYING yes.

                      The question this thread is intended to discuss is DID he actually show it? Or did he in fact, show the opposite?

                      I won't comment on the math... just gonna read this one.

                      Comment


                      • #12
                        I still say 1000 combines is not nearly enough with to determine anything other than vague approximations (which is what I assume was the intent).

                        My main concern lays in the fact that your argument is based off 3 too many successes (ie 963 successes of 1000 falls into statistical probability) and you are forcing you arguement against only one of the trophy samples. Thus you are making an argument against .3% of only a given sample (not all the data), which in a random system is not something to make a case against.


                        If 9,000 more combines are done for each example (or even a brand new full set of 10,000, I don't care), and if the results remain the same, then I would be willing to consider it something needing to be looked at.


                        Many of us tradeskillers have experienced much crazier results than 1 in 215, for example, I have had a run of 488 non trivial combines to get a single skill up, and many others have reported runs of over 1000 non-trivial combines without a skill-up. While these are statistical anomalies, it does not mean that the system is broken, it just that an anomaly.




                        And there was a question earlier about max trophy with 299 skill. These were not real characters they were talking about, they were specific Developer created characters used to run tests, so they can create them however they want. Thus you can also run 1000 combines without a skill-up (hehe, which some have already proven in game).



                        Gorse

                        Comment


                        • #13
                          So my statement "Statistically significant" may have been technically wrong.

                          but realistically... we all know how fickle random numbers are.

                          To really do this right it would have to be like 100000 combines or some other ridiculous number which is just NOT going to happen.

                          What it does show though it the people that say

                          "Before I had the trophy I was succeeding 99% of the time, now I am failing 50% of the time" Are just having bad runs of luck.

                          or "Before I had 95 out of 100 successes, now with X I am only 7 for 10" (with X being trophy, skill 300, AA's combinations) are just unlucky, and not using enough of a sample size.
                          Ngreth Thergn

                          Ngreth nice Ogre. Ngreth not eat you. Well.... Ngreth not eat you if you still wiggle!
                          Grandmaster Smith 250
                          Master Tailor 200
                          Ogres not dumb - we not lose entire city to froggies

                          Comment


                          • #14
                            I've gone on 200 combine runs of something suppose to give a 95% success rate and didn't fail or failed 2 combines. Does that mean that is out of line? no.

                            On the flip side of the coin, I've done 200 of the same combine and failed 28... is that out of line? No.

                            One could say my sample size is too small, and they are probably right. However my average failure rate at the skill % should be 10... both variables were considerably off. I also can't count the times that it has been in the range of 8-12 fails... which has by far been the consistent.

                            Now, if you were doing 10,000-100,000 combines and those percentages still showed up the same, I would say there is a cause for concern, but atleast then you know that one bad run here or there didn't skew the numbers. Even at 1,000 a bad enough run will skew the numbers enough to put them this way.

                            Comment


                            • #15
                              Hey Ngreth, I think we think too much alike :P Note the post times.

                              Comment

                              Working...
                              X