Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 16. April 2011 17:27

In my last post, I looked into running a simple simulation using F# sequences; our model mapped a sequence of random numbers to 2 states, Rainy and Sunny.

What if we wanted to model something a bit more realistic, like a system where the weather tomorrow depends on the weather today? Let’s say, for instance, that if the weather is Sunny today, there is a 60% chance that it’s still Sunny tomorrow, but if it’s Rainy today, we have a 70% chance that tomorrow is Rainy.

Technicality: we will also assume that if we know today’s weather, what happened yesterday brings us no additional information on the probability of rain or sun tomorrow.

Let’s start like last time, and define first a Weather type, with 2 states, Rainy and Sunny, and represent the transitions from state to state, using pattern matching:

type Weather = Sunny | Rainy

let NextDay today proba =
    match today with
    | Rainy -> if proba < 0.7 then Rainy else Sunny
    | Sunny -> if proba < 0.6 then Sunny else Rainy

Armed with this, starting from an initial state, we want to generate the next state, based on the current state and the next probability coming from the sequence of random numbers. This part got me stumped for a while. Using a Sequence map is clearly not going to work, because, unlike in the previous post, we can’t determine the Weather based on the probability alone, we need both the probability and the previous Weather. Conversely, Sequence unfold has the opposite problem: it generates a sequence of states based on the previous State, but doesn’t take in another Sequence as input.

More...

by Mathias 8. July 2010 11:03

A new method of forecasting has been brought to my attention: Paul. Paul is an English-born octopus, currently living in Germany, and has been predicting with high accuracy the results of the German soccer team, by picking between two boxes marked with the flag of the two competing teams:

How unlikely is it that Paul would have such a string of correct forecasts? Pretty unlikely. If you assume that Paul’s picks were completely random, with a 50% chance of correctly calling the winner, the probability of making 11 good calls out of 12 comes down to 0.29%. Does this mean Paul is the next big thing in forecasting? It’s possible, but I don’t think so (this said with all due respect to Paul and his skills). Leonard Mlodinow, in his excellent book, The Drunkard's Walk, makes the following comment:

This example illustrates an important point: even with data significant at, say, the 3 percent level, if you test 100 nonpsychic people for psychic abilities […], you ought to expect a few people to show up as psychic.

Mallarme In other words, if a phenomenon is random, you should typically see the “average” case regularly, but you should also see highly unlikely cases happen from time to time – observing such a rare occurrence doesn’t contradict the randomness of the phenomenon. Or, in the words of the French poet Mallarmé, Un Coup de Dés Jamais N'Abolira Le Hasard (A throw of the Dice will Never Abolish Chance).

by Mathias 28. June 2010 13:14

A client asked me recently a fun probability question, which revolved around figuring out the probability of success of a research program. In a simplified form, here is the problem: imagine that you have multiple labs, each developing products which have independent probabilities of succeeding – what is the probability of more than a certain number of products being eventually successful?

Let’s illustrate on a simple example. Product A has a 30% probability of success, and product B a 60% probability of success. Combining these into a probability tree, we work out that there is an 18% chance of having 2 products successful, 18% + 12 % + 42% = 72% chance of having 1 or more products succeed, and 28% chances of a total failure.

SimpleBinaryTree

It’s not a very complicated theoretical problem. Practically, however, when the number of products increases, the number of outcomes becomes large, fairly fast – and working out every single combination by hand is extremely tedious.

Fortunately, using a simple trick, we can generate these combinations with minimal effort. The representation of integers in base 2 is a decomposition in powers of 2, resulting in a unique sequence of 0 and 1. In our simplified example, if we consider the numbers 0, 1, 2 and 3, their decomposition is

0 = 0 x 2^2 + 0 x 2^1 –> 00

1 = 0 x 2^2 + 1 ^ 2^1 –> 01

2 = 1 x 2^2 + 0 x 2^1 –> 10

3 = 1 x 2^2 + 1 x 2^2 –> 11

As a result, if if consider a 1 to encode the success of a product, and a 0 its failure, the binary representation of integers from 0 to 3 gives us all possible outcomes for our two-products scenario.

More...

by Mathias 8. August 2009 18:03

Today I came across a solution to Euler Problem 205 on The Daily Dose of Excel. The problem is stated as follows:

Peter has nine four-sided (pyramidal) dice, each with faces numbered 1, 2, 3, 4.
Colin has six six-sided (cubic) dice, each with faces numbered 1, 2, 3, 4, 5, 6.

Peter and Colin roll their dice and compare totals: the highest total wins. The result is a draw if the totals are equal.

What is the probability that Pyramidal Pete beats Cubic Colin? Give your answer rounded to seven decimal places in the form 0.abcdefg

I thought it was a pretty cool problem; I love probability problems, and had never come across something similar, so it piqued my interest. The solution presented in The Daily Dose was essentially a pretty efficient brute-force enumeration, and I wondered if it was possible to go a bit faster that 6 minutes follow a different approach – using my language of predilection, C#. [Edited August 9. Note to self: before commenting on other people’s blog posts, I should make sure I read them properly. Especially when discussing their code’s performance. Otherwise, I will look foolish].

The probability that Pete wins can be written as:

ProbaOfColinWin

Refreshing a bit my memory in probability through Wikipedia, “the exact probability distribution Fs,i of a sum of i s-sided dice can be calculated as the repeated convolution of the single-die probability distribution with itself” as follows:

DistributionOfSumOfRolls

More...

by Mathias 28. June 2009 12:41

I finished “The Drunkard’s Walk – how randomness rules our lives”, by Leonard Mlodinow, a few days after Sway; I have postponed writing about it, because a minor storm of work has hit my parts – but I really loved this book, and thought I should say a few words about it.

The book’s point is that humans are very bad at drawing conclusions from observations of random phenomena. They routinely make gross mistakes when dealing with conditional probability (92% of Americans, some of them with pretty solid mathematical credentials, get the Monty Hall problem wrong), and fall prey to the Law of Small Numbers, seeing patterns where there is none, and refusing to admit the importance of randomness in shaping our fates. In the end, the message is pretty upbeat – I loved this quote from Thomas Watson:

If you want to succeed, double your failure rate.

I really enjoyed how the book builds up following the history of probability and statistics; some of the individuals who contributed to its development are truly remarkable (just lookup Cardano for instance), and the book contains a fair share of anecdotes about them. If anything, it gave me my first introduction to Benford’s law, which I am still digesting – and which has been weirdly prominent in the news, via the Iran election issue.

In short, I strongly recommend this book if you are interested in either history of sciences, probability, or decision making.

Comments

Comment RSS