Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias Brandewinder 20. December 2014 01:55

This post is December 20th' part of the English F# Advent #fsAdvent series; make sure to also check out the Japanese series, which also packs the awesome!

As I was going around Paris the other day, I ended up in the Concorde metro station. Instead of the standard issue white tiles, this station is decorated with the French constitution, rendered as a mosaic of letters.

[Source: Wikipedia]

My mind started wandering, and by some weird association, it reminded me of Calligrammes, a collection of poems by Guillaume Apollinaire, where words are arranged on the page to depict images that compliment the text itself.

Guillaume-Apollinaire-Calligramme-La_Mandoline,_l’œillet_et_le_bambou

[Source: Wikipedia]

After some more drifting, I started wondering if I could use this as an inspiration for some playful F# fun. How about taking a piece of text, an image, and fusing them into one?

There are many ways one could approach this; being rather lazy, I thought a reasonably simple direction would be to decompose the original image into dark and light blocks, and fill them with the desired text. As simple as it may sound, the task is not entirely trivial. First, we need to decide on an appropriate threshold to separate "dark" and "light" areas on the image, to get a contrast good enough to recognize the image rendered in black & white. Then, we also have to resize the image appropriately into a new grid, where the characters from the original text fit the mapped dark area as closely as possible.

Note: I don't think the warning "don't put this in production, kids" is useful, unless someone thinks there is a market for Calligramme as a Service. However, I'll say this: this is me on "vacation hacking fun" mode, so yes, there are quite probably flaws in that code. I put it up as a Gist here - flame away, or tell me how to make it better on Twitter ;)

Separating dark and light

So how could we go about splitting an image into dark and light pixels? First, we can using the color brightness from the System.Drawing namespace to determine how light the color of an individual pixel is:

open System.Drawing

let brightness (c:Color) = c.GetBrightness ()

let pixels (bmp:Bitmap) =
    seq { for x in 0 .. bmp.Width - 1 do
            for y in 0 .. bmp.Height - 1 ->
                (x,y) |> bmp.GetPixel }

We still need to decide what boundary to use to separate the image between dark and light. What we want in the end is an image which is reasonably balanced, that is, it should be neither overwhelmingly dark or light. A simple way to enforce that is to arbitrarily constrain one third of the pixels at least to be either dark or light. Then, we want a boundary value that is as clear cut as possible, for instance by finding a value with a large brightness change margin. Let's do that:

let breakpoint (bmp:Bitmap) =
    let pixelsCount = bmp.Width * bmp.Height
    let oneThird = pixelsCount / 3
    let pixs = pixels bmp
    let threshold = 
        pixs
        |> Seq.map brightness
        |> Seq.sort
        |> Seq.pairwise
        |> Seq.skip oneThird
        |> Seq.take oneThird
        |> Seq.maxBy (fun (b0,b1) -> b1 - b0)
        |> snd
    let darkPixels = 
        pixs 
        |> Seq.map brightness
        |> Seq.filter ((>) threshold)
        |> Seq.length
    (threshold,darkPixels)

We iterate over every pixel, sort them by brightness, retain only the middle third, and look for the largest brightness increase. Done - breakpoint returns both the threshold value (the lightness level which decides whether a pixel will be classified as dark or light), as well as how many pixels will be marked as dark.

Resizing

Now that we have a boundary value, and know how many pixels will be marked as dark, we need to determine the size of the grid where our text will be mapped. Ignoring for a moment rounding issues, let's figure out a reasonable size for our grid.

First, how many characters do we need? We know the number of dark pixels in the original image - and in our target image, we want the same ratio of text to white space, so the total number of characters we'll want in our final image will be roughly total chars ~ text length * dark pixels / (width * height).

Then, what should the width of the target image, in characters? First, we want [1] target width * target height ~ total chars. Then, ideally, the proportions of the target image should be similar to the original image, so target width / target height ~ width / height, which gives us target height ~ target width * height / width. Substituting in [1] gives us target width * target width * height / width ~ total chars, which simplifies to target width ~ sqrt (total chars * width / height).

Translating this to code, conveniently ignoring all the rounding issues, we get:

let sizeFor (bmp:Bitmap) (text:string) darkPixels =
    let width,height = bmp.Width, bmp.Height
    let pixels = width * height
    let textLength = text.Length
    let chars = textLength * pixels / darkPixels 
    let w = (chars * width / height) |> float |> sqrt |> int
    let h = (w * height) / width
    (w,h)

Rendering

Good, now we are about ready to get down to business. We have an original image, a threshold to determine which pixels to consider dark or light, and a "target grid" of known width and height. What we need now is to map every cell of our final grid to the original image, decide whether it should be dark or light, and if dark, write a character from our text.

Ugh. More approximation ahead. At that point, there is no chance that the cells from our target grid map the pixels from the original image one to one. What should we do? This is my Christmas vacation time, a time of rest and peace, so what we will do is be lazy again. For each cell in the target grid, we will retrieve the pixels that it overlaps on the original image, and simply average out their brightness, not even bothering with a weighted average based on their overlap surface. As other lazy people before me nicely put it, "we'll leave that as an exercise to the reader".

Anyways, here is the result, a mapping function that returns the coordinates of the pixels intersected by a cell, as well as a reducer, averaging the aforementioned pixels by brightness, and a somewhat un-necessary function that transforms the original image in a 2D array of booleans, marking where a letter should go (I mainly created it because I had a hard time keeping track of rows and columns):

let mappedPixels (bmp:Bitmap) (width,height) (x,y) =

    let wScale = float bmp.Width / float width
    let hScale = float bmp.Height / float height

    let loCol = int (wScale * float x)
    let hiCol = 
        int (wScale * float (x + 1)) - 1
        |> min (bmp.Width - 1)
    let loRow = int (hScale * float y)
    let hiRow = 
        int (hScale * float (y + 1)) - 1
        |> min (bmp.Width - 1)

    seq { for col in loCol .. hiCol do
            for row in loRow .. hiRow -> (col,row) }

let reducer (img:Bitmap) pixs =
    pixs 
    |> Seq.map img.GetPixel
    |> Seq.averageBy brightness

let simplified (bmp:Bitmap) (width,height) threshold =

    let map = mappedPixels bmp (width,height)
    let reduce = reducer bmp
    let isDark value = value < threshold

    let hasLetter = map >> reduce >> isDark

    Array2D.init width height (fun col row ->
        (col,row) |> hasLetter)

Almost there - wrap this with 2 functions, applyTo to transform the text into a sequence, and rebuild, to recreate the final string function:

let applyTo (bmp:Bitmap) (width,height) threshold (text:string) =
    
    let chars = text |> Seq.toList
    let image = simplified bmp (width,height) threshold
    
    let nextPosition (col,row) =
        match (col < width - 1) with 
        | true -> (col+1,row) 
        | false -> (0,row+1)
    
    (chars,(0,0)) 
    |> Seq.unfold (fun (cs,(col,row)) ->
        let next = nextPosition (col,row)
        match cs with
        | [] -> Some(' ',(cs,next))
        | c::tail ->
            if image.[col,row]
            then 
                Some(c,(tail,next))
            else Some(' ',(cs,next)))

let rebuild (width,height) (data:char seq) =
    seq { for row in 0 .. height - 1 -> 
            data 
            |> Seq.map string
            |> Seq.skip (row * width)
            |> Seq.take width
            |> Seq.toArray  
            |> (String.concat "") }
    |> (String.concat "\n")

Trying it out

Let's test this out, using the F# Software Foundation logo as an image, and the following text, from fsharp.org, as a filler:

F# is a mature, open source, cross-platform, functional-first programming language. It empowers users and organizations to tackle complex computing problems with simple, maintainable and robust code.

Run this through the grinder…

let path = @"c:/users/mathias/pictures/fsharp-logo.jpg"
let bmp = new Bitmap(path)

let text = """F# is // snipped // """

let threshold,darkPixels = breakpoint bmp
let width,height = sizeFor bmp text darkPixels

text
|> applyTo bmp (width,height) threshold    
|> rebuild (width,height)

… and we get the following:

                        
           F#           
           is           
         a matu         
        re, open        
        source, c       
      ross-platfor      
     m, fun  ctiona     
    l-firs t   progr    
   amming  l   anguag   
  e. It  emp    owers   
 users  and      organi 
 zation s to     tackle 
   compl ex    computi  
   ng pro bl  ems wit   
    h simp l e, main    
     tainab le and      
      robust code.      

Not too bad! The general shape of the logo is fairly recognizable, with some happy accidents, like for instance isolating "fun" in "functional". However, quite a bit of space has been left empty, most likely because of the multiple approximations we did along the way.

Improved resizing

Let's face it, I do have some obsessive-compulsive behaviors. As lazy as I feel during this holiday break, I can't let go of this sloppy sizing issue. We can't guarantee a perfect fit (there might simply not be one), but maybe we can do a bit better than our initial sizing guess. Let's write a mini solver, a recursive function that will iteratively attempt to improve the fit.

Given a current size and count of dark cells, if the text is too long to fit, the solver will simply expand the target grid size, adding one row or one column, picking the one that keeps the grid horizonal/vertical proportions closest to the image. If the text fits better in the new solution, keep searching, otherwise, done (similarly, reduce the size if the text is too short to fit).

For the sake of brevity, I won't include the solver code here in the post. If you are interested, you can find it in the gist here.

Below are the results of the original and shiny new code, which I ran on a slightly longer bit of text.

Before:

                                  
               F#is               
              amatur              
             eopenso              
            urcecross             
           -platformfun           
          ctional-firstp          
         rogramminglangua         
        geItempowersusersa        
       ndorganiz  ationstot       
      acklecomp    lexcomput      
     ingproble ms   withsimpl     
    emaintain abl    eandrobus    
   tcodeF#ru nson     LinuxMacO   
  SXAndroid iOSWi      ndowsGPUs  
 andbrowse rsItis       freetouse 
  andisope nsourc       eunderanO 
  SI-approv edlic      enseF#isu  
   sedinawid eran     geofappli   
     cationar eas   andissuppo    
      rtedbybo th   anactive      
      opencommu    nityandin      
       dustry-le  adingcomp       
         aniesprovidingpr         
          ofessionaltool          
          s                       

… and after:

               F #               
              is am              
             atu reo             
            pens ourc            
           ecros s-pla           
          tformf unctio          
         nal-fir stprogr         
        ammingla nguageIt        
       empowers   usersand       
      organiza t   ionstota      
     cklecomp le    xcomputi     
    ngproble msw     ithsimpl    
   emaintai nabl      eandrobu   
  stcodeF# runso       nLinuxMac 
 OSXAndro  idiOS       WindowsGP 
  Usandbro wsers      Itisfreet  
   ouseandi sope     nsourceun   
    deranOSI -ap    provedlic    
     enseF#is us   edinawide     
      rangeofa p  plication      
       areasand  issupport       
        edbyboth anactive        
         opencom munitya         
          ndindu stry-l          
           eadin gcomp           
            anie spro            
             vid ing             
              pr of              
               e s               

We still have a small mismatch, but the fit is much better.

And this concludes our F# Advent post! This was a rather useless exercise, but then, the holidays are about fun rather than productivity. I had fun doing this, and hope you had some fun reading it. In the meanwhile, I wish you all a holiday period full of fun and happiness, and… see you in 2015! And, as always, you can ping me on twitter if you have comments or questions. Cheers!

by Admin 26. October 2014 17:01

From time to time, I get absorbed by questions for no clear reason. This is one of these times – you have been warned.

So here is the question: can I use a logistic map to encode an arbitrary list of 1s and 0s into a single float, and generate back the series by applying the logistic map? I don’t think there is a clear theoretical or practical interest in this question, but for some reason I couldn’t shake it off, and had to do it.

Just to clarify a bit what I have in mind, here is the expression for the logistic map:

x(n+1) = alpha * x(n) * (1-x(n))

This is a recurrence relation, and has been well studied, because it illustrates very nicely some important ideas in chaos theory. In particular, for x0 in ] 0.0; 1.0 [ and values of alpha between 0 and 4, the series will remain in the interval ] 0.0; 1.0 [, and for certain values of alpha, 4.0 for instance, the series will exhibit a chaotic behavior.

logistic

Anyways – so here is what I have in mind. If I gave you an arbitrary float in the unit interval, I could “decrypt” binary values this way:

let f x = 4. * x * (1. - x)

let decrypt root = 
    root 
    |> Seq.unfold (fun x -> Some(x, f x)) 
    |> Seq.map (fun x -> if x > 0.5 then 1 else 0)

let test = decrypt 0.12345 |> Seq.take 20 |> Seq.toList

Running that example produces the following result:

val test : int list =
  [1; 1; 0; 1; 1; 1; 1; 0; 0; 1; 0; 1; 1; 0; 1; 1; 0; 0; 0; 1; 1; 1; 0; 0; 0;
   1; 1; 0; 1; 0; 0; 0; 1; 0; 1; 1; 0; 0; 0; 1; 0; 0; 0; 1; 0; 1; 1; 1; 1; 1;
   1; 1; 0; 1; 0; 0; 1; 1; 0; 0; 0; 1; 1]

This illustrates how, from a single float value, in this case, 0.12345, I could generate a list of 0s and 1s. The question is, can I generate any sequence? That is, if I gave you (for instance) the following series [ 0; 1; 1; 0; 1 ], could you give me a float that would produce that sequence? And are there sequences that I couldn’t generate by that mechanism?

As it turns out, any sequence is feasible. If I start from the last number in the series (1 in our case), the value that generated it, x4, had to be in [0.5;1.0], because it got rounded up to 1. But then, x4 = f(x3), which implies that 4.0 * x3 * (1.0 – x3) belongs in [0.5;1.0]. I’ll let you work through the math here (it involves solving a second-degree polynomial) – what you should end up with is that there are exactly 2 segments that, when transformed by f, result in [0.5;1.0] (see the diagram below for a more visual explanation, illustrating how to find the two segments that f transforms into a given segment x(n)).

logistic-map

Because we know that the 4th value in our series is a 0, we also know that x3 has to be in [0.0; 0.5], so we can just compute the intersection of f inverse (interval(x4)) and [0.0; 0.5], and repeat the process over and over again, until we have finished covering the sequence and reached x0, and we are left with one interval. Any number we pick in that interval will produce the desired sequence.

So how does this look in code? Not awesome, but not too bad. invf is the inverse of f, which returns 2 possible values – and backsolve computes the current interval x has to belong to, given the interval its successor has to be in, and the desired value, a 0 or a 1:

let f x = 4. * x * (1. - x)

let decrypt root = 
    root 
    |> Seq.unfold (fun x -> Some(x, f x)) 
    |> Seq.map (fun x -> if x > 0.5 then 1 else 0)

let empty (lo,hi) = lo >= hi
// two inverses, low value then high value
let invf x = (1.-sqrt(1.-x))/2.,(1.+sqrt(1.-x))/2.

// interval = where f(x) needs to be
// binary = whether x is 0 or 1
let backsolve interval binary =
    let lo,hi = 
        match binary with
        | 0 -> (0.0,0.5)
        | 1 -> (0.5,1.0)
        | _ -> failwith "unexpected"
    
    let lo',hi' = interval
    let constraint2 = 
        let x1,x2 = invf lo'
        let y1,y2 = invf hi'
        (x1,y1),(y2,x2) // this is union
    // compute intersect
    let (lo1,hi1),(lo2,hi2) = constraint2
    let sol1 = (max lo1 lo,min hi1 hi)
    let sol2 = (max lo2 lo,min hi2 hi)
    if empty sol1 then sol2 else sol1 

let solve (xs:int list) = 
    let rec back bins curr =
        match bins with
        | [] -> curr
        | hd::tl -> back tl (backsolve curr hd)
    back (xs |> List.rev) (0.,1.)

Does this work? Let’s try out, by generating a random sequence of 20 0s and 1s, and checking that if we encrypt and decrypt it, we get the initial series:

let validate xs =
    let l = List.length xs
    let lo,hi = solve xs
    decrypt (0.5*(lo+hi)) |> Seq.take l |> Seq.toList

let rng = System.Random ()
let sample = List.init 25 (fun _ -> rng.Next(2))
sample = validate sample

It does work – up to a limit. If you start expanding the length of the series you are trying to encrypt, at some point you will observe that the encrypted/decrypted version stops matching the original. This should not come as a surprise: we are operating in finite precision here, so there would be something deeply flawed if we managed to encode a potentially infinite amount of information, by simply using a float. However, in the world of math, where infinite precision exists, we could transform any sequence of 0s and 1s, of any length, into a segment in [ 0.0; 1.0]. Pretty useless, but fun.

One thing I started playing with was representing segments better, with a discriminated union. After all, the algorithm can be expressed entirely as a sequence of interval unions and intersections – I’ll let that to the reader as a fun F# modeling problem!

If you have questions, or even, who knows, find the problem interesting, ping me on Twitter!

by Admin 12. October 2014 06:46

Well, last year’s F# tour was so much fun, I figured I would try to do it again, in Europe this time. I am becoming quite fond of F# tourism: after all, what better way to discover a place than going there and meeting locals who happen to have at least one common interest – and spread the F# love in the process?

Anyways, if everything goes according to plan, I should be visiting F# communities in 7 different countries in 6 weeks :) As an aside, if you are running a meetup/user group that is somewhat on my way, have a couch I can crash on, and would like me to stop by, ping me on twitter. I can’t make promises (obviously the schedule is a bit tight already), but if can, I will.

One thing I find pretty exciting is that all of a sudden, conferences are starting to have very nice F# and functional programming offerings. In particular, huge props to:

  • Build Stuff in Vilnius: the conference last year was fantastic, fun, diverse, stimulating, and just a great atmosphere. And the F#/functional lineup this year is awesome. Trust me, if you can go – just go, you won’t regret it.
  • NDC London: when you see a major conference like NDC putting together one entire track solely dedicated to functional programming, you know something is happening. I am really stoked – at that point, I don’t see why I would attend conferences without a solid functional track. My daily work is primarily functional, and in my (biased) opinion, functional is where a lot of the innovation is happening lately. So… thanks to NDC for bridging the gap, and putting together a program that I can enjoy!

At any rate, here is the current plan – stay tuned for updates and more details, and hope to see you somewhere along the way!

Nov 3 & 4: Aarhus, Denmark

Nov 6 & 7: London, UK

Nov 8: London, UK

Nov 10: Dublin, Ireland

Nov 11: Munich, Germany

Nov 12: Zurich, Switzerland

Nov 17, Paris

Nov 19-23: Vilnius, Lithuania

Nov 24: Lodz, Poland

  • Details TBA

Nov 25 & 26: Berlin, Germany

Nov 27: Frankfurt, Germany

  • Frankfurt .NET group: TBA

Dec 1 – 5: London, UK

Dec 8 & 9: Oslo, Norway

by Admin 21. September 2014 11:40

If you have ever come across my blog before, it will probably come as no surprise if I tell you that I enjoy coding with F# tremendously. However, there is another reason why I enjoy F#, and that is the Community aspect. One thing we have been trying to do in San Francisco is to build a group that is inclusive, and focused on learning together.

This is why we started the coding dojos a while back: one of our members mentioned that while he was convinced from talks that F# was a good language, presentations were not quite enough to help him get over the hump and feel comfortable coding, so we started sessions completely focused on writing code in groups to solve fun problems. This has been an amazingly fun experience.

During a discussion with my sister last year, we ended up talking about gender inequality, a topic that is also dear to my heart – and, in her great wisdom, she made the following remark: scheduling a meeting at 6:00 PM is possibly the worst time you could pick for a mom. In hindsight, this is totally obvious; it also goes to show that everyone has blind spots.  For that matter, it applies more broadly: choosing to go coding after work instead of going back home is not feasible for everyone. So I thought, why not try meetings in completely different time slots?

At the same time, I came across the Alt.NET Paris group (which is pretty awesome); one thing they do is run Coding Breakfasts, which they expanded into Coding Mojitos, and Coding Candies. I really liked the idea, and adapted it a bit for F# Coding Breakfast.

Here is the format we have been following so far:

  • If you want people to carve out a bit of time in a working day, respecting their time is crucial. So the format is strict: start on time, code in pairs for 45 minutes, show-and-tell for 15 minutes, and then, off you go!
  • In order to be able to code something from scratch in 45 minutes, the problem needs to be reasonably small, and accessible for beginners. We have been working initially on some of the 99 ocaml problems, and lately settled on Project Rosalind, which people seemed to find more interesting.
  • Some of the early feedback I got was that knowing the problems in advance would help, especially for beginners – so every time we pick and announce two problems. If people want to work on other stuff, that’s fine, too :). As an illustration, here is how the current prototypical event invite looks like.
  • One of the nice aspects is that the logistics requirements are virtually zero. Essentially all you need is a couple of tables, and ideally some wifi. In San Francisco, we have been meeting in a bakery. People show up around 8:15 in the morning, grab coffee and pastries, and start coding. No projector, no speaker – just open your laptop and go.
  • While the equipment of the venue is not that important, location matters. If you want to reach people before they go to work, it makes sense to find a place that is close to offices. In San Francisco, we are meeting downtown, close to public transportation.
  • I shamelessly borrowed another idea, this time from the NashFP group. They have a GitHub organization repository, which makes it possible for everyone to share their code, see what others have been doing, and potentially reuse bits of code.

So far, we have had 4 breakfasts in San Francisco, and the response has been very positive. It’s usually a smaller crowd than the evenings, but different people show up, and it has a different energy than evening sessions. Minds are still fresh (well, most minds – I have a hard time booting my brain before 9 AM), there is light outside...

 

 

The next step in San Francisco is to try out different time slots. After all, mornings are also not convenient for all, so this week, we will have our first F# Coding Lunch, hosted at Terrace Software (thanks Clayton!). Same general idea, but, you guessed it, 12:00 to 1:00. We’ll see how that goes!

So if you are considering starting or developing an F# community in your area, I encourage you to try that out! It is tremendously easier to setup than an evening presentation (you don’t really need a venue or a speaker), it has potential to be owned or replicated by multiple people (my dream is to see regular F# breakfasts everywhere in the Bay Area), and I suspect it would make a great way to introduce F# in a company as well…

If you have questions or comments, ping me on Twitter – I’d love to hear your thoughts or ideas!

by Admin 13. September 2014 17:18

Let’s face it, @fsibot in its initial release came with a couple flaws undocumented features. One aspect that was particularly annoying was the mild Tourette’s syndrom that affected the bot; on a fairly regular basis, it would pick up the same message, and send the same answer over and over again to the brave soul that tried to engage in a constructive discussion.

I wasn’t too happy about that (nobody likes spam), and, being all about the enterprise and stuff, I thought it was time to inject a couple more buzzwords. In this post, I’ll briefly discuss how I ended up using the Azure Service Bus to address the problem, with a sprinkle of Azure Storage for good measure, and ended up liking it quite a bit.

So what was the problem?

The issue came from a combination of factors. Fundamentally, @fsibot is doing two things: pulling on a regular basis recent mentions from Twitter, and passing them to the F# Compiler Services to produce a message with the result of the evaluation.

Mentions are pulled via the Twitter API, which offers two options: grab the latest 20, or grab all mentions since a given message ID. If you have no persistent storage, this implies that when the service starts, you pull the 20 most recent ones, and once you have retrieved some messages, you can start pulling only from the last one seen.

This is a great strategy, if your service runs like a champ and never goes down (It’s also very easy to implement – a coincidence, surely). Things start to look much less appealing when the service goes down. In that scenario, the service reboots, and starts re-processing the 20 most recent mentions. In a scenario where, say, a couple of enthusiastic F# community members decide to thoroughly test the bots’ utter lack of security, and send messages that cause it to have fits and go down in flames multiple times in a short time span, this is becoming a real problem.

So what can we do to fix this?

A first obvious problem is that a failure in one part of the service should not bring down the entire house. Running unsafe code in the F# Compiler Service should not impact the retrieval of Twitter mentions. In order to decouple the problem, we can separate these into two separate services, and connect them via a queue. This is much better: if the compiler service fails, messages keep being read and pushed to the queue, and when it comes back on line, they can be processed as if nothing happened. At that point, the only reasons that will disrupt the retrieval of mentions is either a problem in that code specifically, or a reboot of the machine itself.

So how did I go about implementing that? The most lazy way possible, of course. In that case, I used the Azure Service Bus queue. I won’t go into all the details of using the Service Bus; this tutorial does a pretty good job at covering the basic scenario, from creating a queue to sending and receiving messages. I really liked how it ended up looking from F#, though. In the first service, which reads recent mentions from Twitter, the code simply looks like this:

let queueMention (status:Status) =
    let msg = new BrokeredMessage ()
    msg.MessageId <- status.StatusID |> string
    msg.Properties.["StatusID"] <- status.StatusID
    msg.Properties.["Text"] <- status.Text
    msg.Properties.["Author"] <- status.User.ScreenNameResponse
    mentionsQueue.Send msg

From the Status (a LinqToTwitter class) I retrieve, I extract the 3 fields I care about, create a BrokeredMessage (the class used to communicate via the Azure Service Bus), add key-value pairs to  Properties and send it to the Queue.

On the processing side, this is the code I got:

let (|Mention|_|) (msg:BrokeredMessage) =
    match msg with
    | null -> None
    | msg ->
        try
            let statusId = msg.Properties.["StatusID"] |> Convert.ToUInt64
            let text = msg.Properties.["Text"] |> string
            let user = msg.Properties.["Author"] |> string
            Some { StatusId = statusId; Body = text; User = user; }
        with 
        | _ -> None

let rec pullMentions( ) =
    let mention = mentionsQueue.Receive ()
    match mention with
    | Mention tweet -> 
        tweet.Body
        |> processMention
        |> composeResponse tweet
        |> respond
        mention.Complete ()
    | _ -> ignore ()

    Thread.Sleep pingInterval
    pullMentions ()

I declare a partial Active Pattern (the (|Mention|_|) “banana clip” bit), which allows me to use pattern matching against a BrokeredMessage, a class which by itself knows nothing about F# and discriminated unions. That piece of code itself is not beautiful (just it’s a try-catch block, trying to extract data from the BrokeredMessage into my own Record type), but the part I really like is the pullMentions () method: I can now directly grab messages from the queue, match against a Mention, and here we go, a nice and clean pipeline all the way through.

So now that the two services are decoupled, one has a fighting chance to survive when the other goes down. However, it is still possible for the Twitter reads to fail, too, and in that case we will still get mentions that get processed multiple times.

One obvious way to resolve this is to actually persist the last ID seen somewhere, so that when the Service starts, it can read that ID and restart from there. This is what I ended up doing, storing that ID in a blob (probably the smallest blob in all of Azure); the code to write and read that ID to a blob is pretty simple, and probably doesn’t warrant much comment:

let updateLastID (ID:uint64) =
    let lastmention = container.GetBlockBlobReference blobName
    ID |> string |> lastmention.UploadText

let readLastID () =
    let lastmention = container.GetBlockBlobReference blobName
    if lastmention.Exists ()
    then 
        lastmention.DownloadText () 
        |> System.Convert.ToUInt64
        |> Some
    else None

However, even before doing this, I went an even lazier road. One of the niceties about using the Service Bus is that the queue behavior is configurable in multiple ways. One of the properties available (thanks @petarvucetin for pointing it out!) is Duplicate Detection. As the name cleverly suggests, it allows you to specify a time window during which the Queue will detect and discard duplicate BrokeredMessages, a duplicate being defined as “a message with the same MessageID”.

So I simply set a window of 24 hours for Duplicate Detection, and the BrokeredMessage.MessageID equal to the Tweet Status ID. If the Queue sees a message, and the same message shows up withing 24 hours, no repeat processing. Nice!

Why did I add the blob then, you might ask? Well, the Duplicate Detection only takes care of most problem cases, but not all of them. Imagine that a Mention comes in, then less than 20 mentions arrive for 24 hours, and then the service crashes – in that case, the message WILL get re-processed, because the Duplicate Detection window has expired. I could have increased that to more than a day, but it already smelled like a rather hacky way to solve the problem, so I just added the blob, and called it a day.

So what’s the point here? Nothing earth shattering, really – I just wanted to share my experience using some of the options Azure offers, in the context of solving simple but real problems on @fsibot. What I got out of it is two things. First, Azure Service Bus and Azure Storage were way easier to use than what I expected. Reading the tutorials took me about half an hour, implementing the code took another half an hour, and it just worked. Then (and I will readily acknowledge some F# bias here), my feel is that Azure and F# just play very nicely together. In that particular case, I find that Active Patterns provide a very clean way to parse out BrokeredMessages, and extract out code which can then simply be plugged in the code with a Pattern Match, and, when combined with classic pipelines, ends up creating very readable workflows.

Comments

Comment RSS