Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
12. October 2014 06:46

Well, last year’s F# tour was so much fun, I figured I would try to do it again, in Europe this time. I am becoming quite fond of F# tourism: after all, what better way to discover a place than going there and meeting locals who happen to have at least one common interest – and spread the F# love in the process?

Anyways, if everything goes according to plan, I should be visiting F# communities in 7 different countries in 6 weeks :) As an aside, if you are running a meetup/user group that is somewhat on my way, have a couch I can crash on, and would like me to stop by, ping me on twitter. I can’t make promises (obviously the schedule is a bit tight already), but if can, I will.

One thing I find pretty exciting is that all of a sudden, conferences are starting to have very nice F# and functional programming offerings. In particular, huge props to:

• Build Stuff in Vilnius: the conference last year was fantastic, fun, diverse, stimulating, and just a great atmosphere. And the F#/functional lineup this year is awesome. Trust me, if you can go – just go, you won’t regret it.
• NDC London: when you see a major conference like NDC putting together one entire track solely dedicated to functional programming, you know something is happening. I am really stoked – at that point, I don’t see why I would attend conferences without a solid functional track. My daily work is primarily functional, and in my (biased) opinion, functional is where a lot of the innovation is happening lately. So… thanks to NDC for bridging the gap, and putting together a program that I can enjoy!

At any rate, here is the current plan – stay tuned for updates and more details, and hope to see you somewhere along the way!

Nov 3 & 4: Aarhus, Denmark

Nov 6 & 7: London, UK

Nov 8: London, UK

Nov 10: Dublin, Ireland

Nov 11: Munich, Germany

• Munich .NET user group: F# for the C# developer

Nov 12: Zurich, Switzerland

Nov 17, Paris

Nov 19-23: Vilnius, Lithuania

Nov 25 & 26: Berlin, Germany

Nov 27: Frankfurt, Germany

• Frankfurt .NET group: TBA

Dec 1 – 5: London, UK

Dec 8 & 9: Oslo, Norway

21. September 2014 11:40

If you have ever come across my blog before, it will probably come as no surprise if I tell you that I enjoy coding with F# tremendously. However, there is another reason why I enjoy F#, and that is the Community aspect. One thing we have been trying to do in San Francisco is to build a group that is inclusive, and focused on learning together.

This is why we started the coding dojos a while back: one of our members mentioned that while he was convinced from talks that F# was a good language, presentations were not quite enough to help him get over the hump and feel comfortable coding, so we started sessions completely focused on writing code in groups to solve fun problems. This has been an amazingly fun experience.

During a discussion with my sister last year, we ended up talking about gender inequality, a topic that is also dear to my heart – and, in her great wisdom, she made the following remark: scheduling a meeting at 6:00 PM is possibly the worst time you could pick for a mom. In hindsight, this is totally obvious; it also goes to show that everyone has blind spots.  For that matter, it applies more broadly: choosing to go coding after work instead of going back home is not feasible for everyone. So I thought, why not try meetings in completely different time slots?

At the same time, I came across the Alt.NET Paris group (which is pretty awesome); one thing they do is run Coding Breakfasts, which they expanded into Coding Mojitos, and Coding Candies. I really liked the idea, and adapted it a bit for F# Coding Breakfast.

Here is the format we have been following so far:

• If you want people to carve out a bit of time in a working day, respecting their time is crucial. So the format is strict: start on time, code in pairs for 45 minutes, show-and-tell for 15 minutes, and then, off you go!
• In order to be able to code something from scratch in 45 minutes, the problem needs to be reasonably small, and accessible for beginners. We have been working initially on some of the 99 ocaml problems, and lately settled on Project Rosalind, which people seemed to find more interesting.
• Some of the early feedback I got was that knowing the problems in advance would help, especially for beginners – so every time we pick and announce two problems. If people want to work on other stuff, that’s fine, too :). As an illustration, here is how the current prototypical event invite looks like.
• One of the nice aspects is that the logistics requirements are virtually zero. Essentially all you need is a couple of tables, and ideally some wifi. In San Francisco, we have been meeting in a bakery. People show up around 8:15 in the morning, grab coffee and pastries, and start coding. No projector, no speaker – just open your laptop and go.
• While the equipment of the venue is not that important, location matters. If you want to reach people before they go to work, it makes sense to find a place that is close to offices. In San Francisco, we are meeting downtown, close to public transportation.
• I shamelessly borrowed another idea, this time from the NashFP group. They have a GitHub organization repository, which makes it possible for everyone to share their code, see what others have been doing, and potentially reuse bits of code.

So far, we have had 4 breakfasts in San Francisco, and the response has been very positive. It’s usually a smaller crowd than the evenings, but different people show up, and it has a different energy than evening sessions. Minds are still fresh (well, most minds – I have a hard time booting my brain before 9 AM), there is light outside...

The next step in San Francisco is to try out different time slots. After all, mornings are also not convenient for all, so this week, we will have our first F# Coding Lunch, hosted at Terrace Software (thanks Clayton!). Same general idea, but, you guessed it, 12:00 to 1:00. We’ll see how that goes!

So if you are considering starting or developing an F# community in your area, I encourage you to try that out! It is tremendously easier to setup than an evening presentation (you don’t really need a venue or a speaker), it has potential to be owned or replicated by multiple people (my dream is to see regular F# breakfasts everywhere in the Bay Area), and I suspect it would make a great way to introduce F# in a company as well…

If you have questions or comments, ping me on Twitter – I’d love to hear your thoughts or ideas!

13. September 2014 17:18

Let’s face it, @fsibot in its initial release came with a couple flaws undocumented features. One aspect that was particularly annoying was the mild Tourette’s syndrom that affected the bot; on a fairly regular basis, it would pick up the same message, and send the same answer over and over again to the brave soul that tried to engage in a constructive discussion.

I wasn’t too happy about that (nobody likes spam), and, being all about the enterprise and stuff, I thought it was time to inject a couple more buzzwords. In this post, I’ll briefly discuss how I ended up using the Azure Service Bus to address the problem, with a sprinkle of Azure Storage for good measure, and ended up liking it quite a bit.

So what was the problem?

The issue came from a combination of factors. Fundamentally, @fsibot is doing two things: pulling on a regular basis recent mentions from Twitter, and passing them to the F# Compiler Services to produce a message with the result of the evaluation.

Mentions are pulled via the Twitter API, which offers two options: grab the latest 20, or grab all mentions since a given message ID. If you have no persistent storage, this implies that when the service starts, you pull the 20 most recent ones, and once you have retrieved some messages, you can start pulling only from the last one seen.

This is a great strategy, if your service runs like a champ and never goes down (It’s also very easy to implement – a coincidence, surely). Things start to look much less appealing when the service goes down. In that scenario, the service reboots, and starts re-processing the 20 most recent mentions. In a scenario where, say, a couple of enthusiastic F# community members decide to thoroughly test the bots’ utter lack of security, and send messages that cause it to have fits and go down in flames multiple times in a short time span, this is becoming a real problem.

So what can we do to fix this?

A first obvious problem is that a failure in one part of the service should not bring down the entire house. Running unsafe code in the F# Compiler Service should not impact the retrieval of Twitter mentions. In order to decouple the problem, we can separate these into two separate services, and connect them via a queue. This is much better: if the compiler service fails, messages keep being read and pushed to the queue, and when it comes back on line, they can be processed as if nothing happened. At that point, the only reasons that will disrupt the retrieval of mentions is either a problem in that code specifically, or a reboot of the machine itself.

So how did I go about implementing that? The most lazy way possible, of course. In that case, I used the Azure Service Bus queue. I won’t go into all the details of using the Service Bus; this tutorial does a pretty good job at covering the basic scenario, from creating a queue to sending and receiving messages. I really liked how it ended up looking from F#, though. In the first service, which reads recent mentions from Twitter, the code simply looks like this:

let queueMention (status:Status) =
let msg = new BrokeredMessage ()
msg.MessageId <- status.StatusID |> string
msg.Properties.["StatusID"] <- status.StatusID
msg.Properties.["Text"] <- status.Text
msg.Properties.["Author"] <- status.User.ScreenNameResponse
mentionsQueue.Send msg

From the Status (a LinqToTwitter class) I retrieve, I extract the 3 fields I care about, create a BrokeredMessage (the class used to communicate via the Azure Service Bus), add key-value pairs to  Properties and send it to the Queue.

On the processing side, this is the code I got:

let (|Mention|_|) (msg:BrokeredMessage) =
match msg with
| null -> None
| msg ->
try
let statusId = msg.Properties.["StatusID"] |> Convert.ToUInt64
let text = msg.Properties.["Text"] |> string
let user = msg.Properties.["Author"] |> string
Some { StatusId = statusId; Body = text; User = user; }
with
| _ -> None

let rec pullMentions( ) =
match mention with
| Mention tweet ->
tweet.Body
|> processMention
|> composeResponse tweet
|> respond
mention.Complete ()
| _ -> ignore ()

pullMentions ()

I declare a partial Active Pattern (the (|Mention|_|) “banana clip” bit), which allows me to use pattern matching against a BrokeredMessage, a class which by itself knows nothing about F# and discriminated unions. That piece of code itself is not beautiful (just it’s a try-catch block, trying to extract data from the BrokeredMessage into my own Record type), but the part I really like is the pullMentions () method: I can now directly grab messages from the queue, match against a Mention, and here we go, a nice and clean pipeline all the way through.

So now that the two services are decoupled, one has a fighting chance to survive when the other goes down. However, it is still possible for the Twitter reads to fail, too, and in that case we will still get mentions that get processed multiple times.

One obvious way to resolve this is to actually persist the last ID seen somewhere, so that when the Service starts, it can read that ID and restart from there. This is what I ended up doing, storing that ID in a blob (probably the smallest blob in all of Azure); the code to write and read that ID to a blob is pretty simple, and probably doesn’t warrant much comment:

let updateLastID (ID:uint64) =
let lastmention = container.GetBlockBlobReference blobName

let lastmention = container.GetBlockBlobReference blobName
if lastmention.Exists ()
then
|> System.Convert.ToUInt64
|> Some
else None

However, even before doing this, I went an even lazier road. One of the niceties about using the Service Bus is that the queue behavior is configurable in multiple ways. One of the properties available (thanks @petarvucetin for pointing it out!) is Duplicate Detection. As the name cleverly suggests, it allows you to specify a time window during which the Queue will detect and discard duplicate BrokeredMessages, a duplicate being defined as “a message with the same MessageID”.

So I simply set a window of 24 hours for Duplicate Detection, and the BrokeredMessage.MessageID equal to the Tweet Status ID. If the Queue sees a message, and the same message shows up withing 24 hours, no repeat processing. Nice!

Why did I add the blob then, you might ask? Well, the Duplicate Detection only takes care of most problem cases, but not all of them. Imagine that a Mention comes in, then less than 20 mentions arrive for 24 hours, and then the service crashes – in that case, the message WILL get re-processed, because the Duplicate Detection window has expired. I could have increased that to more than a day, but it already smelled like a rather hacky way to solve the problem, so I just added the blob, and called it a day.

So what’s the point here? Nothing earth shattering, really – I just wanted to share my experience using some of the options Azure offers, in the context of solving simple but real problems on @fsibot. What I got out of it is two things. First, Azure Service Bus and Azure Storage were way easier to use than what I expected. Reading the tutorials took me about half an hour, implementing the code took another half an hour, and it just worked. Then (and I will readily acknowledge some F# bias here), my feel is that Azure and F# just play very nicely together. In that particular case, I find that Active Patterns provide a very clean way to parse out BrokeredMessages, and extract out code which can then simply be plugged in the code with a Pattern Match, and, when combined with classic pipelines, ends up creating very readable workflows.

24. August 2014 15:35

My recollection of how this all started is somewhat fuzzy at that point. I remember talking to @tomaspetricek about the recent “A pleasant round of golf” with @relentlessdev event in London. The idea of Code Golf is to write code that fits in as few characters as possible – a terrible idea in most cases, but an interesting one if you want to force your brain into unknown territory. Also, a very fun idea, with lots of possibilities. If I recall correctly, the discussion soon drifted to the conclusion that if you do it right (so to speak), your code should fit in a tweet. Tweet, or GTFO, as the kids would say (or so I hear).

Of course, I began obsessing about the idea, that’s what I do. The discussion kept going at LambdaJam, with @rickasaurus, @pblasucci and @bbqfrito (beers, too). So I thought I had to try it out: what if you set up a twitter bot, which would respond to your F# inquiries, and send back an evaluation of whatever F# expression you sent it?

As it turns out, it’s not that difficult to do, thanks to the fsharp Compiler Services, which lets you, among many things, host an FSI session. So without further due, I give you @fsibot. Tweet a valid expression to @fsibot, and it will run it in an F# interactive session, and reply with the result:

Note that you need to send an expression, as opposed to an interaction. As an example, printfn “Hello, world” won’t do anything, but sprintf “Hello, world” (which evaluates to a string) will.

What else is there to say?

A couple of things. First, my initial plan was to run this on an Azure worker role, which seemed to make a lot of sense. Turns out, after spending countless hours trying to figure out why it was working just great on my machine, using the Azure emulator, but exploding left and right the moment I deployed it in production, I just gave up, and changed track, rewriting it as a Windows Service hosted in an Azure virtual machine (it’s still a cloud-based architecture!), using the awesome TopShelf to simplify my life (thank you @phatboyg for saving my weekend, and @ReedCopsey for pointing me in the right direction).

You can find the whole code here on GitHub. As you might notice, the whole TopShelf part is in C# – nothing wrong with it, but I plan on moving this over to F# as soon as I can, using existing work by @henrikfeldt, who discreetly produces a lot of awesome code made in Sweden.

Another lesson learnt, which came by way of @panesofglass, was that if your code doesn’t do anything asynchronous, using async everywhere is probably not such a hot idea. Duh – but I recently got enamored with mailbox processors and async workflows, and started initially building a gigantic pipe factory, until Ryan aptly pointed out that this was rather counter-productive. So I simplified everything. Thanks for the input, Ryan!

That’s it! I am not entirely sure the bot will handle gracefully non-terminating expressions, but in traditional San Francisco fashion, I’ll call this a Minimum Viable Product, and just ship it – we can pivot later. Now have fun with it :) And if you have some comments, questions or suggestions, feel free to ping me on twitter as @brandewinder.

Source code on GitHub

31. July 2014 14:15

It is the summer, a time to cool off and enjoy vacations – so let’s keep it light, and hopefully fun, today! A couple of days ago, during his recent San Francisco visit, @tomaspetricek brought up an idea that I found intriguing. What if you had two images, and wanted to recreate an image similar to the first one, using only the pixels from the second?

To make this real, let’s take two images - a portrait by Velasquez, and one by Picasso, which I have conveniently cropped to be of identical size. What we are trying to do is to re-arrange the pixels from the Picasso painting, and recombine them to get something close to the Velasquez:

You can find the original images here: http://1drv.ms/XmlTN4

My thinking on the problem was as follows: we are trying to arrange a set of pixels into an image as close as possible to an existing image. That’s not entirely trivial. Being somewhat lazy, rather than work hard, I reverted to my patented strategy “what is the simplest thing that could possibly work (TM)”.

Two images are identical if each of their matching pixels are equal; the greater the difference between pixels, the less similar they are. In that frame, one possible angle is to try and match each pixel and limit the differences.

So how could we do that? If I had two equal groups of people, and I were trying to pair them by skill level, here is what I would do: rank each group by skill, and match the lowest person from the first group with his counterpart in the second group, and so on and so forth, until everyone is paired up. It’s not perfect, but it is easy.

Problem here is that there is no obvious order over pixels. Not a problem – we’ll create a sorting function, and replace it with something else if we don’t like the result. For instance, we could sort by “maximum intensity”; the value of a pixel will be the greater of its Red, Green and Blue value.

At that point, we have an algorithm. Time to crank out F# and try it out with a script:

open System.IO
open System.Drawing

let combine (target:string) ((source1,source2):string*string) =
// open the 2 images to combine
let img1 = new Bitmap(source1)
let img2 = new Bitmap(source2)
// create the combined image
let combo = new Bitmap(img1)
// extract pixels from an image
let pixelize (img:Bitmap) = [
for x in 0 .. img.Width - 1 do
for y in 0 .. img.Height - 1 do
yield (x,y,img.GetPixel(x,y)) ]
// extract pixels from the 2 images
let pix1 = pixelize img1
let pix2 = pixelize img2
// sort by most intense color
let sorter (_,_,c:Color) = [c.R;c.G;c.B] |> Seq.max
// sort, combine and write pixels
(pix1 |> List.sortBy sorter,
pix2 |> List.sortBy sorter)
||> List.zip
|> List.iter (fun ((x1,y1,_),(_,_,c2)) ->
combo.SetPixel(x1,y1,c2))
// ... and save, we're done
combo.Save(target)

… and we are done. Assuming you downloaded the two images in the same place as

let root = __SOURCE_DIRECTORY__

let velasquez = Path.Combine(root,"velasquez.bmp")
let picasso = Path.Combine(root,"picasso.bmp")

let picasquez = Path.Combine(root,"picasquez.bmp")
let velasso = Path.Combine(root,"velasso.bmp")

(velasquez,picasso) |> combine velasso
(picasso,velasquez) |> combine picasquez

… which should create two images like these:

Not bad for 20 lines of code. Now you might argue that this isn’t the nicest, most functional code ever, and you would be right. There are a lot of things that could be done to improve that code; for instance, handling pictures of different sizes, or injecting an arbitrary Color sorting function – feel free to have fun with it!

Also, you might wonder why I picked that specific, and somewhat odd, sorting function. Truth be told, it happened by accident. In my first attempt, I simply summed the 3 colors, and the results were pretty bad. The reason for it is, Red, Green and Blue are encoded as bytes, and summing up 3 bytes doesn’t necessarily do what you would expect. Rather than, say, convert everything to int, I went the lazy route again…