Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 1. September 2013 13:39

I have been back for about a week now, after nearly three weeks on the road, talking about F# all over the US. The first day I woke up in my own bed, my first thought was “where am I again? And where am I speaking tonight?”, now life is slowly getting back to normal, and I thought it would be a good time to share some impressions from the trip.

  • I am very proud to have inaugurated two new F# meetup groups during that trip! The Washington DC F# meetup, organized by @devshorts, is off to a great start, we had a full house at B-Line Medical that evening, with a great crowd mixing F# fans, C# developers, as well as OCaml and Python people, it was great. My favorite moment there was with Sam. Sam, a solid C# developer, looked very worried about writing F# code for the first time. Two hours later, he was so proud (and legitimately so) of having a nice classifier working, all in F#, that he couldn’t resist, and presented his code to the entire group. Nice job! Detroit was my final stop on the road, and didn’t disappoint: the Detroit F# meetup was awesome. It was hosted at the Grand Trunk Pub; while the location had minor logistics drawbacks, it was amply compensated by having food and drinks right there, as well as a great crowd. Thanks to  @OldDutchCap and @JohnBFair for making this happen, this was a suitable grand finale for this trip!
  • In general, August seems to be the blossoming period for F# meetups – two other groups popped up in the same month, one in Minsk, thanks to the efforts of @lu_a_jalla and @sergey_tihon, and one in Paris, spearheaded by @tjaskula, @robertpi and @thinkb4coding, this is very exciting, and I am looking forward to meeting some F#ers next time I stop back home!
  • A lesson I learnt the hard way is that San Francisco is most definitely not a good benchmark for what to wear in August in the US. My first stops were all in the south – Houston, Nashville, Charlotte and Raleigh, and boy was I not ready for the crazy heat and humidity! On the other hand, I can confirm the rumor, the South knows how to make a guest welcome. For that matter, I am extremely grateful to everyone who hosted me during this trip – you know who you are, thank you for all the help.
  • One surprise during this trip was the general level of interest in F#. I regularly hear nonsense sentences like “F# is a niche language”, so I expected smaller crowds in general .NET groups. Well, apparently someone forgot to tell the .NET developers, because I got pretty solid audiences in these groups as well, with an amazing 100 people showing up in Raleigh. Trinug rocked!
  • In general, I was a bit stressed out by running a hands-on machine learning lab with F# novices; for an experienced F# user, it’s not incredibly complex, but for someone who hasn’t used the language before, it’s a bit of a “here is the deep-end of the swimming pool, now go see if you can swim” moment. I was very impressed by how people did in these groups, everyone either finished or ended up very close. Amusingly, in one of the groups, the first person who completed the exercise, in very short time, was… a DBA, who explained that he immediately went for a set-oriented style. Bingo! The lesson for me is that F# is not complicated, but you have to embrace its flow, and largely forget about C#. One trick which seemed to help was to ask the question “how would you write it if you were using only LINQ”. Otherwise, C# developers seemed to often over-think and build code blocks too large for their own good, whereas F# works best by creating very small and simple functions, and then assembling them in larger workflows.
  • Another fun moment was in Boston, where I ran the Machine Learning dojo at Hack/Reduce, language agnostic (thanks @JonnyBoats for making the introductions!). Pretty much every language under the sun was represented (C#, Java, F#, Scala, Python, Matlab, Octave, R, Clojure, Ruby) – but one of the participants still managed to pull “something special”, and tried to implement a classifier entirely in PostgreSQL. It didn’t quite work out, but hats off nevertheless, that was a valiant experiment!
  • As a Frenchman, I take food seriously. As a scientist, I want to see the data. Therefore, I was very excited to have the opportunity to investigate whether Northern Carolina style BBQ is indeed an heresy, first hand. I got the chance to try out BBQ in Houston and Raleigh, and I have to give it to Texas, hands down.

Texas-BBQ

  • Lesson learnt the hard way: do not ever depend on the internet for a presentation. Some of my material was on a Gist on GitHub, and a couple of hours before a presentation, I realized that they were under a DOS attack. Not happy times.
  • I am more and more of a fan of the hands-on, write code in groups format. It has its limitations – you can’t really do it with a very large crowd, and it requires more time than a traditional talk – but it’s a very different experience. One thing I really enjoyed when starting with F# was its interactivity; the “write code and see what happens” experience rekindled the joy of coding for me. The hands-on format captures some of that “happy hacking” spirit, and gets people really engaged. Once someone start writing code, they own it – and working in groups is a great way to accelerate the learning process, and build a community.

Great afternoon with @phillyaltnet crowd hacking at #kaggle machine learning dataset with #fsharpMachine learning and lots of fun with #fsharp @trinug tonight, you guys rocked!

  • I have been complacent with the story “it works on environments other than Windows/Visual Studio”. It does, but the best moment to figure out how to make it work exactly is not during a group coding exercise. In these situations, fsharp.org is your friend – and since I came back, I started actually trying all that out, because “I heard it should work” is just not good enough.
  • I saw probably somewhere between 500 and 1,000 developers during this trip, and while this was completely exhausting, I don’t regret any of it. One of the highpoints of the whole experience was to just get some time to hang out with old or new friends from the F#/functional community – @panesofglass in Houston, @bryan_hunter and the FireFly Logic & @NashFP crew in Nashville, @rickasaurus, @tomaspetricek, @pblasucci, @mitekm and @hmansell in New York City, and @plepilov, @kbattocchi and @talbott in Boston (sorry if I forgot anyone!). If this trip taught me one thing, it’s that there is actually a lot of interest for F# in the .NET community, and beyond – but we, the F# community, are very scattered, and from our smaller local groups, it’s often hard to get a sense for that. Having a chance to talk to all of you guys who have been holding the fort and spreading F# around, discussing what we do, what works and what doesn’t, and simply having a good time, was fantastic. We need more of this – I am incredibly invigorated, and very hopeful that 2014 will be a great year for F#!
by Mathias 1. December 2012 13:55

I have been obsessing about the following idea lately – what if I could run a FSI session from within Excel? The motivation behind this is double. First, one thing Excel is good at is creating and formatting charts. If I could use F# for data manipulation, and Excel for data visualization, I would be a happy camper. Then, I think F# via FSI could provide an interesting alternative for Excel automation. I’d much rather leverage existing .NET libraries to, say, grab data from the internet, than write some VBA to do that – and the ability to write live code in FSI would be less heavy handed that VSTO automation, and closer to what people typically do in Excel, that is, explore data. Having the ability to execute F# scripts would be, at least for me, very useful.

Seeing Tim Robinson’s awesome job with FsNotebook.net kicked me out of procrastination. Even though FsNotebook is still in early development, it provides a very nice user experience – on the web. If something that nice can be done on the web, it should be feasible on a local machine.

As an aside, Tim is looking for feedback and input on FsNotebook – go try it out, it’s really fun:

Anyways – this is the grand plan, now we need to start with baby steps. If I want to embed FSI in Excel (presumably via a VSTO add-in), I need a way to talk to FSI from .NET, so that I can create a Session and send arbitrary strings of code to be evaluated.

As usual, StackOverflow provided two good starting points (this answer, and this answer) – so I set out to look into the Process class, which I didn’t know much about, and attempted to spawn a FSI.EXE process, redirecting input and output. Turns out it’s not overly complicated – here are the 34 lines of code I ended up with so far (see it on GitHub):

namespace ClearLines.FsiRunner

open System.Diagnostics

type public FsiSession(fsiPath: string) =

    let info = new ProcessStartInfo()
    let fsiProcess = new Process()

    do
        info.RedirectStandardInput <- true
        info.RedirectStandardOutput <- true
        info.UseShellExecute <- false
        info.CreateNoWindow <- true
        info.FileName <- fsiPath

        fsiProcess.StartInfo <- info

    [<CLIEvent>]
    member this.OutputReceived = fsiProcess.OutputDataReceived

    [<CLIEvent>]
    member this.ErrorReceived = fsiProcess.ErrorDataReceived

    member this.Start() =
        fsiProcess.Start()
        fsiProcess.BeginOutputReadLine()

    member this.AddLine(line: string) =
        fsiProcess.StandardInput.WriteLine(line)

    member this.Evaluate() =
        this.AddLine(";;")
        fsiProcess.StandardInput.Flush()

This is a fairly straightforward class. The constructor expects the path to FSI.EXE, and sets up the process in the constructor (the do block) to run headless and redirect the stream of inputs and outputs. Start() simply starts the process, and begins reading asynchronously the output of FSI, AddLine(line) is used to add an arbitrary string of F# code, and Evaluate() sends all lines currently buffered to FSI for evaluation – and flushes the buffer. The 2 events OutputReceived and ErrorReceived are provided for the client to listen to the FSI results.

More...

by Mathias 12. September 2011 14:39

Just saw that version 3.1 of Microsoft Solver Foundation has been released. I haven’t had time to try it out yet, but the list of improvements looks promising. A better non linear solver, better MIP and MIQP – sounds like Santa has come early! And I am curious about the faster bindings with LINQ…

by Mathias 24. July 2011 11:17

In our last post, we explored how the Microsoft Solver Foundation can be used to solve simple maximization/minimization problems from C#. The problem we looked at is the following: given a set of products, each with a unit cost, a reselling price, and a weight, how can we maximize profit if we have only a limited budget and a limited weight capacity available.

Expressing and resolving the problem for a particular set of inputs was rather easy. However, the example we presented was very static: the decisions, the goal and the constraints were completely hard-coded.

The real value I found in the Microsoft Solver Foundation is that it can be completely integrated in your .NET code, working with strongly-typed objects. Today, we will revisit the same example we presented previously, but our goal will be to make the optimization program “generic”, so that we can resolve the same prototypical problem, given any set of inputs.

At a high level, what we are looking for is a class which, given a collection of Products, a Budget and a Capacity, returns a “recommended” purchase quantity for each product, maximizing our profit:

public class Profit
{
   public static IDictionary<Product, int> Maximize(
      IEnumerable<Product> products, 
      double budget, 
      double capacity)
   {
      // do stuff here
   }
}

Let’s first define what a Product is:

public class Product
{
   public Product(string name, double cost, double price, double weight)
   {
      this.Name = name;
      this.Cost = cost;
      this.Price = price;
      this.Weight = weight;
   }

   public string Name { get; private set; }
   public double Cost { get; private set; }
   public double Price { get; private set; }
   public double Weight { get; private set; }
   public double Margin
   {
      get { return this.Price - this.Cost; }
   }
}

More...

by Mathias 13. March 2011 17:52

I had heard good things about OpenXML, but until now I didn’t have time to give it a try. After attending a rather intimate session on the topic at the MVP Summit, I realized I should look into it. For those of you like me who haven’t kept up with the news, the general idea is that, since the release of Office 2007, Office files are no longer saved as obscure proprietary files: they are essentially zipped xml files. If you rename an Excel file from MyFile.xlsx to MyFile.zip and open it, you will see that it is simply a collection of xml files, describing the various parts of your Workbook and their relationships. This has a few interesting implications, one of them being that you can create or edit an Excel file without using Excel, or even having Excel installed on your machine.

The OpenXML SDK is a free library which provides strongly typed .NET classes to manipulate these files without having to deal with raw XML, and are LINQ-friendly, which is awesome.

One scenario where this comes very handy is if you have some form of a .NET application which needs to read input data from an Excel file; another interesting case is a .NET application which needs to produce some Office outputs for the user. Rather than launch an instance of the Office application and use the COM Interop, you can perform all these tasks safely in .NET, without having to worry about cleanly closing the application.

In line with the first scenario, my initial goal was to see if I could read the contents of an Excel Workbook with a console app. Rather than going into lengthy explanations, here is the code I ended up with, which borrows heavily from the samples provided with the SDK:

namespace OpenXmlApp
{
   using System;
   using System.Collections.Generic;
   using System.Linq;
   using DocumentFormat.OpenXml;
   using DocumentFormat.OpenXml.Packaging;
   using DocumentFormat.OpenXml.Spreadsheet;

   public static class Program
   {
      private static void Main(string[] args)
      {
         var filePath = @"C:/Tests/protectedFile.xlsx";
         using (var document = SpreadsheetDocument.Open(filePath, false))
         {
            var workbookPart = document.WorkbookPart;
            var workbook = workbookPart.Workbook;

            var sheets = workbook.Descendants<Sheet>();
            foreach (var sheet in sheets)
            {
               var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
               var sharedStringPart = workbookPart.SharedStringTablePart;
               var values = sharedStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
               
               var cells = worksheetPart.Worksheet.Descendants<Cell>();
               foreach (var cell in cells)
               {
                  Console.WriteLine(cell.CellReference);
                  // The cells contains a string input that is not a formula
                  if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                  {
                     var index = int.Parse(cell.CellValue.Text);
                     var value = values[index].InnerText;
                     Console.WriteLine(value);
                  }
                  else
                  {
                     Console.WriteLine(cell.CellValue.Text);
                  }

                  if (cell.CellFormula != null)
                  {
                     Console.WriteLine(cell.CellFormula.Text);                    
                  }
               }
            }
         }

         Console.ReadLine();
      }
   }
}

A few comments:

  • I am opening the document as read-only, setting the second argument to false.
  • workbook.Descendants<Sheet>() returns an IEnumerable<Sheet>, which means that you can now query it using Linq if you please.
  • I am still wrapping my head around the organization of elements. Coming from “classic” Excel, I expect to be able to navigate down directly from a Workbook into its Worksheets; here, the Sheet contained in the Workbook is merely a key which indicates what sheets exist, and what Id to use when requesting them. Navigating between the parts of the file will take a bit of getting used to.
  • I love the fact that you can directly iterate over the Cells of a Worksheet. The cells variable above retrieves only cells that have some content, and nothing more. No need to read cells into 2-d arrays and iterating over all of them.
  • On the other hand, I found the organization of the cells content a bit disorienting at first. Interestingly, cells that contain strings that are not formulas do not store the value in the cell element itself. They are stored in a SharedStringTable, and the cell contains an index, in Cell.CellValue.Text, which indicates which element of that table it contains. This seems to be true only for strings that are not formulas, however: if the cell contains a formula, or some non-string type, then the content is stored in CellValue.Text, and there is no record in SharedStringTable. I am sure this will make sense to me some day.
  • I am interested to see how easy or painful it is to work with Cells addressed by their index (as in, Cells[3,2] ). This is fairly straightforward using the Interop, but from what I have seen so far, I expect it will be a bit more involved here, because that’s just not how the data is organized.

In short, I found the SDK pleasant to install and use so far (and well documented), and I can definitely see scenarios where I will be using it in the future. On the other hand, I suspect I will end up writing quite a few helper methods to make it more usable – probably trying to make it look closer to the classic Interop. I suspect also that it will turn out to be better suited for applications like Word and PowerPoint, because of the more hierarchical nature of their content.

Comments

Comment RSS