Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 27. November 2011 07:59

Earlier this month, on Simon Murphy’s blog, the project to organize a UK Excel Developer conference 2012 in London quickly took shape. As of now, the planned date is in the January 24 to 26 range, with a pretty cool agenda, covering most of the recent developments Excel developers should know about. I am looking forward to it!

I’ll be talking for about an hour on VSTO (Visual Studio Tools for Office). VSTO can be both awesome and painful: I plan on demoing building an Excel add-in from the ground up, illustrating some of the benefits and drawbacks/pitfalls of that approach, so that you know when it’s the right time to go that route.

If there are any topics or questions you are specifically interested in, please let me know in the comments – I’ll be happy to take requests.

As an aside, I usually talk at .NET developers events, where I need to convince the audience that developing for Excel (or Office) isn’t a terrible idea. This will be my first time with an Excel developers audience, and I expect the opposite challenge, namely why bother with C#/VB.NET and Visual Studio when VBA is free and works just fine?

Check out the Excel Conference page and Simon’s blog for updates.

London-calling

Picture from the amazing series “The Kitten Covers

by Mathias 6. November 2011 15:20

I am somewhat tests-obsessed, and as a result, often find Excel frustrating to work with, because writing automated tests against it isn’t trivial. So recently, while perusing the chapter on Scripting in Programming F#, I came across an Office automation example, and started wondering whether this would be a practical way to write automated tests against Excel.

The use case I have in mind is an existing Excel Workbook, which contains a model (say, your typical Financial model), with a fixed structure, and maybe a sprinkle of VBA, and no .NET.

For illustration purposes, let’s work with the following: our workbook, Model.xlsx, contains one worksheet, “Finances”, with a Profit cell in B3, computed as the difference between the revenue and cost named cells. Pretty impressive stuff.

image

What I want is a way to automatically set the Revenue and Cost to some arbitrary value, and check that the result in Profit is what it should be – so that I don’t have to do it myself by hand, and don’t have to remember how this Workbook was supposed to work later on.

Here is how this could look like in a F# script – create a Script file, say WorkbookTest.fsx, with the following code inside:

#r "Microsoft.Office.Interop.Excel"

open System
open Microsoft.Office.Interop.Excel
      
Console.WriteLine("Press [Enter] to start")
Console.ReadLine()

let excel = new ApplicationClass(Visible=false)
let workbooks = excel.Workbooks

let workbookPath = @"C:\Users\Mathias\Desktop\Model.xlsx"

let workbook = workbooks.Open(workbookPath)
let worksheets = workbook.Worksheets
let sheet = worksheets.["Finances"]
let worksheet = sheet :?> Worksheet

let revenueCell = worksheet.Range "Revenue"
revenueCell.Value2 <- 100

let costCell = worksheet.Range "Cost"
costCell.Value2 <- 10

let profitCell = worksheet.Range "Profit"
let profit = profitCell.Value2

Console.WriteLine("Check profit calculations")
Console.WriteLine("Expected: {0}, Actual {1}", 90, profit)

workbook.Close(false, false, Type.Missing)
excel.Quit()

Console.WriteLine("Done, press [Enter] to close")
Console.ReadLine()

The script launches Excel in Invisible mode, opens the workbook, sets the Revenue and Cost to 100 and 10, retrieves the value from Profit, printouts the value it found as well as the expected value – and closes back the Workbook without saving any of the changes.

The nice thing here is that I can now drop that file on my desktop, and simply right-click and select “Run with F# Interactive” to execute it, without building anything, and I’ll see something like this happen:

image

Nothing earth shattering, but still pretty nice: now I got a script which I can run anytime I want, to check whether the Workbook is behaving properly. Furthermore, what’s nice is that I don’t need to open Visual Studio to work with it: I can simply open WorkbookTest.fsx with Notepad, edit my code, and run it again.

There are some clear issues with the code in its current form. For instance, if anything goes wrong in the code (say, for instance, that I mis-typed a name which doesn’t exist with the workbook), the script will crash miserably, and let the hidden Excel instance hang in the background, waiting for someone to kill it manually. This would require some work to make sure that if exceptions are raised, everything is properly disposed, and no matter what, the file gets closed without saving any modification.

In any case, I thought it was worth sharing, even in its rough state – if only because it was fun, and also because the F# code looks surprisingly more appealing than the usual C# Interop code. Now the fun part would be to turn this into a decent testing framework for Excel…

by Mathias 2. August 2011 16:06

When working with Excel, it is common to use small optimization tricks, like setting Excel.ScreenUpdating to false to avoid un-necessary screen refreshes, during lengthy operations – something along these lines:

public void DoJob(Worksheet  worksheet)
{
   var excel = worksheet.Application;

   var initialScreenUpdating = excel.ScreenUpdating;
   var initialCursor = excel.Cursor;

   excel.ScreenUpdating = false;
   excel.Cursor = XlMousePointer.xlWait;

   // do stuff

   excel.ScreenUpdating = initialScreenUpdating;
   excel.Cursor = initialCursor;
}

This is a good outline of what we would like to happen, but as is, this code has limitations. We would like to be certain that whenever we capture the state, the final two lines, which reset the state to what it was originally, are executed.

More...

by Mathias 13. March 2011 17:52

I had heard good things about OpenXML, but until now I didn’t have time to give it a try. After attending a rather intimate session on the topic at the MVP Summit, I realized I should look into it. For those of you like me who haven’t kept up with the news, the general idea is that, since the release of Office 2007, Office files are no longer saved as obscure proprietary files: they are essentially zipped xml files. If you rename an Excel file from MyFile.xlsx to MyFile.zip and open it, you will see that it is simply a collection of xml files, describing the various parts of your Workbook and their relationships. This has a few interesting implications, one of them being that you can create or edit an Excel file without using Excel, or even having Excel installed on your machine.

The OpenXML SDK is a free library which provides strongly typed .NET classes to manipulate these files without having to deal with raw XML, and are LINQ-friendly, which is awesome.

One scenario where this comes very handy is if you have some form of a .NET application which needs to read input data from an Excel file; another interesting case is a .NET application which needs to produce some Office outputs for the user. Rather than launch an instance of the Office application and use the COM Interop, you can perform all these tasks safely in .NET, without having to worry about cleanly closing the application.

In line with the first scenario, my initial goal was to see if I could read the contents of an Excel Workbook with a console app. Rather than going into lengthy explanations, here is the code I ended up with, which borrows heavily from the samples provided with the SDK:

namespace OpenXmlApp
{
   using System;
   using System.Collections.Generic;
   using System.Linq;
   using DocumentFormat.OpenXml;
   using DocumentFormat.OpenXml.Packaging;
   using DocumentFormat.OpenXml.Spreadsheet;

   public static class Program
   {
      private static void Main(string[] args)
      {
         var filePath = @"C:/Tests/protectedFile.xlsx";
         using (var document = SpreadsheetDocument.Open(filePath, false))
         {
            var workbookPart = document.WorkbookPart;
            var workbook = workbookPart.Workbook;

            var sheets = workbook.Descendants<Sheet>();
            foreach (var sheet in sheets)
            {
               var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
               var sharedStringPart = workbookPart.SharedStringTablePart;
               var values = sharedStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
               
               var cells = worksheetPart.Worksheet.Descendants<Cell>();
               foreach (var cell in cells)
               {
                  Console.WriteLine(cell.CellReference);
                  // The cells contains a string input that is not a formula
                  if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                  {
                     var index = int.Parse(cell.CellValue.Text);
                     var value = values[index].InnerText;
                     Console.WriteLine(value);
                  }
                  else
                  {
                     Console.WriteLine(cell.CellValue.Text);
                  }

                  if (cell.CellFormula != null)
                  {
                     Console.WriteLine(cell.CellFormula.Text);                    
                  }
               }
            }
         }

         Console.ReadLine();
      }
   }
}

A few comments:

  • I am opening the document as read-only, setting the second argument to false.
  • workbook.Descendants<Sheet>() returns an IEnumerable<Sheet>, which means that you can now query it using Linq if you please.
  • I am still wrapping my head around the organization of elements. Coming from “classic” Excel, I expect to be able to navigate down directly from a Workbook into its Worksheets; here, the Sheet contained in the Workbook is merely a key which indicates what sheets exist, and what Id to use when requesting them. Navigating between the parts of the file will take a bit of getting used to.
  • I love the fact that you can directly iterate over the Cells of a Worksheet. The cells variable above retrieves only cells that have some content, and nothing more. No need to read cells into 2-d arrays and iterating over all of them.
  • On the other hand, I found the organization of the cells content a bit disorienting at first. Interestingly, cells that contain strings that are not formulas do not store the value in the cell element itself. They are stored in a SharedStringTable, and the cell contains an index, in Cell.CellValue.Text, which indicates which element of that table it contains. This seems to be true only for strings that are not formulas, however: if the cell contains a formula, or some non-string type, then the content is stored in CellValue.Text, and there is no record in SharedStringTable. I am sure this will make sense to me some day.
  • I am interested to see how easy or painful it is to work with Cells addressed by their index (as in, Cells[3,2] ). This is fairly straightforward using the Interop, but from what I have seen so far, I expect it will be a bit more involved here, because that’s just not how the data is organized.

In short, I found the SDK pleasant to install and use so far (and well documented), and I can definitely see scenarios where I will be using it in the future. On the other hand, I suspect I will end up writing quite a few helper methods to make it more usable – probably trying to make it look closer to the classic Interop. I suspect also that it will turn out to be better suited for applications like Word and PowerPoint, because of the more hierarchical nature of their content.

by Mathias 20. February 2011 17:27

While browsing the ListObject documentation today, I realized that, while all the examples given focused on binding to a DataSet, it also supports databinding to “any component that implements the IList interface”. This is something I wasn’t aware of, so I figured I would give it a try.

I quickly created a Excel 2007 Template project in VS2010, and added a simple Product class as follow:

public class Product
{
   public string Name { get; set; }
   public double Price { get; set; }
}

I then added the following code behind Sheet1, creating a straightforward list of Product, as well as a ListObject, setting the DataSource to the list:

public partial class Sheet1
{
   private List<Product> products;
   private ListObject listObject;

   private void Sheet1_Startup(object sender, System.EventArgs e)
   {
      this.listObject = this.Controls.AddListObject(this.Range["B2"], "Products");

      this.products = new List<Product>();
      this.products.Add(new Product() { Name = "Alpha", Price = 10d });
      this.products.Add(new Product() { Name = "Bravo", Price = 20d });
      this.products.Add(new Product() { Name = "Charlie", Price = 30d });

      this.listObject.DataSource = products;
      this.listObject.AutoSetDataBoundColumnHeaders = true;
   }
   // auto-generated code omitted
}

Hit F5, and watch:

image

Out of the box, we get a nicely formatted list, with filters in the headers. If anything, that’s a convenient way to display a list of items on a Worksheet. I didn’t have time to dig deeper into it, but I am now very curious about how much more can be done with this mechanism. Can I add data validation? Can I control what properties to display?

Comments

Comment RSS