Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 17. October 2010 16:13

The current project I am working on requires writing large amount of data to Excel worksheets. In this type of situation, I create an array with all the data I want to write, and set the value of the entire target range at once. I know from experience that this method is much faster than writing cells one by one, but I was curious about how much faster, so I wrote a little test, writing larger and larger chunks of data and measuring the speed of both methods:

private static void WriteArray(int rows, int columns, Worksheet worksheet)
{
   var data = new object[rows, columns];
   for (var row = 1; row <= rows; row++)
   {
      for (var column = 1; column <= columns; column++)
      {
         data[row - 1, column - 1] = "Test";
      }
   }

   var startCell = (Range)worksheet.Cells[1, 1];
   var endCell = (Range)worksheet.Cells[rows, columns];
   var writeRange = worksheet.Range[startCell, endCell];

   writeRange.Value2 = data;
}
private static void WriteCellByCell(int rows, int columns, Worksheet worksheet)
{
   for (var row = 1; row <= rows; row++)
   {
      for (var column = 1; column <= columns; column++)
      {
         var cell = (Range)worksheet.Cells[row, column];
         cell.Value2 = "Test";
      }
   }
}

Clearly, the array approach is the way to go, performing close to 1000 times faster per cell. It also seems to improve as size increases, but that would require a bit more careful testing.

WriteDataToExcel

However, one additional thing I needed to do was to format the data, using NumberFormat as well as font, borders and color fills, and I thought I would use the same approach – and I observed a significant performance degradation.

private static void WriteNumberFormatArray(int rows, int columns, Worksheet worksheet)
{
   var data = new object[rows, columns];
   for (var row = 1; row <= rows; row++)
   {
      for (var column = 1; column <= columns; column++)
      {
         data[row - 1, column - 1] = "0.000%";
      }
   }

   var startCell = (Range)worksheet.Cells[1, 1];
   var endCell = (Range)worksheet.Cells[rows, columns];
   var writeRange = worksheet.Range[startCell, endCell];

   writeRange.NumberFormat = data;
}
private static void WriteNumberFormatCellByCell(int rows, int columns, Worksheet worksheet)
{
   for (var row = 1; row <= rows; row++)
   {
      for (var column = 1; column <= columns; column++)
      {
         var cell = (Range)worksheet.Cells[row, column];
         cell.NumberFormat = "0.000%";
      }
   }
}

Here is the benchmark I ran, comparing writing NumberFormat by array vs. cell by cell:

WriteNumberFormat

The cell-by-cell version performs about the same writing values or number formats; however, the array version works about 100 times worse for NumberFormat compared to Value2. It still runs way faster than the cell-by-cell approach, but it’s not night-and-day any more.

Fortunately, when you are writing large amount of data like this, chances are, you are really writing records to a worksheet. And while every cell could potentially have a different value, the format is likely consistent, either by row or by column. That is, every cell in a column probably has the same number format. In that case, we have an alternative, which is to apply the format to an entire range at once, like this:

private static void WriteNumberFormatByColumn(int rows, int columns, Worksheet worksheet)
{
   for (var column = 1; column <= columns; column++)
   {
      var startCell = (Range)worksheet.Cells[1, column];
      var endCell = (Range)worksheet.Cells[rows, column];
      var writeRange = worksheet.Range[startCell, endCell];

      writeRange.NumberFormat = "0.000%";
   }
}

I ran my test again, and observed the following:

FormatByColumn

The format by column runs initially as fast as the array-based approach, but its time remains roughly constant as we increase the number of rows, making it an increasingly attractive option as the number of rows increases.

How do you handle writing large amounts of data to Excel? Any Jedi tricks you care to share?

And for completeness, here is the code I used to run my tests; I used an Action delegate in my test loop, which allowed me to easily swap the functions I wanted to compare – feel free to comment and criticize!

namespace ExcelSpeedTest
{
   using System;
   using System.Diagnostics;
   using Microsoft.Office.Interop.Excel;

   class Program
   {
      static void Main(string[] args)
      {
         var excel = new Application();
         excel.DisplayAlerts = false;

         var workbooks = excel.Workbooks;
         var stopwatch = new Stopwatch();
         var blockSize = 10;

         Console.WriteLine("Write by array.");
         MeasureOverIncreasingSize(workbooks, blockSize, stopwatch, WriteNumberFormatArray);

         Console.WriteLine("Write by column.");
         MeasureOverIncreasingSize(workbooks, blockSize, stopwatch, WriteNumberFormatByColumn);

         Console.WriteLine("Write cell by cell.");
         MeasureOverIncreasingSize(workbooks, blockSize, stopwatch, WriteNumberFormatCellByCell);

         Console.ReadLine();
         excel.Quit();
      }

      private static void MeasureOverIncreasingSize(Workbooks workbooks, int blockSize, Stopwatch stopwatch, Action<int, int, Worksheet> method)
      {
         for (int size = 1; size <= 10; size++)
         {
            var workbook = workbooks.Add(Type.Missing);
            var worksheets = workbook.Sheets;
            var worksheet = (Worksheet)worksheets[1];

            var rows = blockSize * size;
            var columns = blockSize;

            stopwatch.Reset();
            stopwatch.Start();

            method(rows, columns, worksheet);

            stopwatch.Stop();

            WriteEvaluation(stopwatch, rows, columns);
            workbook.Close(false, Type.Missing, Type.Missing);
         }
      }

      private static void WriteArray(int rows, int columns, Worksheet worksheet)
      {
         var data = new object[rows, columns];
         for (var row = 1; row <= rows; row++)
         {
            for (var column = 1; column <= columns; column++)
            {
               data[row - 1, column - 1] = "Test";
            }
         }

         var startCell = (Range)worksheet.Cells[1, 1];
         var endCell = (Range)worksheet.Cells[rows, columns];
         var writeRange = worksheet.Range[startCell, endCell];

         writeRange.Value2 = data;
      }

      private static void WriteCellByCell(int rows, int columns, Worksheet worksheet)
      {
         for (var row = 1; row <= rows; row++)
         {
            for (var column = 1; column <= columns; column++)
            {
               var cell = (Range)worksheet.Cells[row, column];
               cell.Value2 = "Test";
            }
         }
      }

      private static void WriteNumberFormatArray(int rows, int columns, Worksheet worksheet)
      {
         var data = new object[rows, columns];
         for (var row = 1; row <= rows; row++)
         {
            for (var column = 1; column <= columns; column++)
            {
               data[row - 1, column - 1] = "0.000%";
            }
         }

         var startCell = (Range)worksheet.Cells[1, 1];
         var endCell = (Range)worksheet.Cells[rows, columns];
         var writeRange = worksheet.Range[startCell, endCell];

         writeRange.NumberFormat = data;
      }

      private static void WriteNumberFormatByColumn(int rows, int columns, Worksheet worksheet)
      {
         for (var column = 1; column <= columns; column++)
         {
            var startCell = (Range)worksheet.Cells[1, column];
            var endCell = (Range)worksheet.Cells[rows, column];
            var writeRange = worksheet.Range[startCell, endCell];

            writeRange.NumberFormat = "0.000%";
         }
      }

      private static void WriteNumberFormatCellByCell(int rows, int columns, Worksheet worksheet)
      {
         for (var row = 1; row <= rows; row++)
         {
            for (var column = 1; column <= columns; column++)
            {
               var cell = (Range)worksheet.Cells[row, column];
               cell.NumberFormat = "0.000%";
            }
         }
      }

      private static void WriteEvaluation(Stopwatch stopwatch, int rows, int columns)
      {
         var cells = rows * columns;
         var time = stopwatch.ElapsedMilliseconds;
         var timePerCell = Math.Round((double)time / (double)cells, 5);

         Console.WriteLine(string.Format("Writing {0} values took {1} ms or {2} ms/cell.", cells, time, timePerCell));
      }
   }
}

Comments

10/18/2010 5:24:47 AM #

Dennis Wallentin (XL-Dennis)

Interesting to read. What would the outcome be if You used classic ADO and with the use of CopyFromRecordset?

Thanks and all the best,
Dennis

Dennis Wallentin (XL-Dennis) Sweden | Reply

10/19/2010 1:27:20 PM #

Mathias

Hi Dennis,
It's a good question, and I don't have an answer just yet - I'll try to compare these approaches soon. I have actually never played with ADO, and see this approach mentioned regularly, thank you for giving me an excuse to look into it Smile
Mathias

Mathias United States | Reply

10/25/2010 12:16:31 AM #

Ivan Thalasso

Hi Matthias,

That's a really nice approach but i'm wondering about the suitability of compiling a full-blown C# app when a vbScript (using ADO) could do the same thing. (You can have it in JScript too). You'll have to go through those old ASP books to find the solution. I have the ASP bible. Very useful.

Nice code by the way.

Ivan Thalasso France | Reply

10/31/2010 4:41:40 AM #

Mathias

Hi Ivan,
Thank you for the feedback - you bring up some fair points. Yes, you could do all the above using VBA. I would probably not write a C# app to do "just" this, but I was interested specifically in comparing how these approaches compared, in C#. That being said, you are the second person who brings up ADO now, so I should really add this to the mix, and compare!
Mathias

Mathias United States | Reply

10/27/2010 3:11:15 PM #

Esha Putra

Hai there, is it possible to using it n Linux environment, not Microsoft. because it's hard to compiling it under Ubuntu...

Esha Putra Indonesia | Reply

11/2/2010 11:25:23 PM #

Ivan Thalasso

Hello Esha.

Yes it would be hard as there are libraries in the code that are specific to windows such as Microsoft.Office.Interop.Excel

Now i'm wondering if the MS script and JET engines can run under WINE ;)

Ivan Thalasso France | Reply

11/19/2010 7:27:05 AM #

David McCormack

Wow! I never knew you could assign a 2-D array of values to a range. The performance of my add-in has gone up by several orders of magnitude. Thank you so much.

David McCormack Ireland | Reply

11/19/2010 3:11:10 PM #

Mathias

Thanks for the comment, David - and I am glad this helped! When you don't have too much data to handle, it doesn't matter much, but it does make a big difference when you have to read or write a lot.

Mathias United States | Reply

1/4/2011 4:57:57 AM #

Ashfaq

Hey thanks for this blog post. I m still a newbie to c#, can you explain more what this code does
private static void WriteEvaluation
How does it work.

Ashfaq United States | Reply

1/8/2011 6:55:59 AM #

mathias

Hi Ashfaq,
thank you for your question. The purpose of the WriteEvaluation method is to produce a summary of each of the writes to a worksheet. To compare each writing episode, I compute the number of cells written per millisecond. The overall program uses various methods to write to a worksheet, and increase the number of cells to write, and measure the time it takes for each method and each size. WriteEvaluation receives the stopWatch - the timer which recorded how long it took to write - and the number of cells that has been written, and writes out to the console the performance.
Hope this helps!
Mathias

mathias United States | Reply

1/7/2011 3:20:58 AM #

Michael

Mathias, the only way I could get snippets from your sample to compile in my project was by changing the following line:

var writeRange = worksheet.Range[startCell, endCell];

to

var writeRange = worksheet.get_Range(startCell, endCell);

Any idea what I'm missing?

Thanks,

Mike

Michael United States | Reply

1/8/2011 6:26:29 AM #

mathias

Hi Mike,

That's intriguing. I tried this code with various versions of the Excel interop in a console app, and it worked just fine, so this is not it. When I looked up the msdn reference for Worksheet.Range, it indicates that the corresponding assembly is Microsoft.Office.Tools.Excel.v9.0. Can you try out to add that reference, and see if that solves the problem for you? I am having a hard time reproducing the issue, but I'd be interested in hearing whether this solved it or not!  
msdn.microsoft.com/.../...heet.range(v=VS.90).aspx

Mathias

mathias United States | Reply

1/10/2011 2:23:32 AM #

Michael

Mathias,

Thanks for the suggestion.  I checked, and my project already has a reference to that assembly.

I'm curious what using statements are in your code file.

Mike

Michael United States | Reply

1/22/2011 5:25:24 AM #

mathias

Hi Michael,
Actually, my project references and using statements are fairly straightforward, I can't see any obvious issue there - I wonder if this has to do with VSTO4 being installed, a different version of the Interop, and/or Visual Studio 2010. What version of Office and VS are you using? The source of the difference is unfortunately a bit painful to track down, but I would like to be able to solve that mystery Smile

mathias United States | Reply

2/13/2011 10:18:40 AM #

pingback

Pingback from excel.darmowe-blogi.pisz.pl

Write data to an Excel worksheet with C#, fast | excel

excel.darmowe-blogi.pisz.pl | Reply

2/18/2011 4:23:52 PM #

trackback

Codebix.com - Your post is on Codebix.com

This post has been featured on Codebix.com. The place to find latest articles on programming. Click on the url to reach your post's page.

Codebix.com | Reply

7/13/2011 1:56:18 AM #

anton

Hello,
where and how you write the data on  WriteNumberFormatByColumn method?

thanks

anton Austria | Reply

7/20/2011 3:56:44 AM #

Mathias

Hi Anton,
Not totally sure I understand your question. WriteNumberFormatByColumn is used to write data column by column, it is called in the Main method (the second version), in the evaluation loop. Hope this helps!
Mathias

Mathias | Reply

12/14/2011 1:34:23 PM #

EL Denis0

heh very nice and good job but can we do this with qbasic Smile @echo
have i nice day

EL Denis0 Canada | Reply

4/15/2012 11:30:32 PM #

Koti

Hi,

While compiling i am getting the below error message.
Even i added "using Excel1 = Microsoft.Office.Interop.Excel;"

can any one please help.

Error  1  The type or namespace name 'Office' does not exist in the namespace 'Microsoft' (are you missing an assembly reference?)  
C:\Documents and Settings\in211249\Local Settings\Application Data\Temporary Projects\WindowsFormsApplication1\Form1.cs  3  26  WindowsFormsApplication1


Thanks,
Koti

Koti India | Reply

6/7/2012 8:16:23 PM #

Azhar Mansuri

Thank you so much to share this blog. I reduce my time from minutes to ms using your code.

Thanks & Regards,
Azhar Mansuri.

Azhar Mansuri India | Reply

6/21/2012 12:06:33 PM #

Arbiter

Thanks for the Tip.
Take me 3ms to do what I want to do.
But It still take half a second for xlApp to launch Cry

Arbiter France | Reply

1/4/2013 12:29:40 AM #

Asad Naeem

I never knew about 2D array writing to Excel. It saved my lot of time but after seeing this solution. I will change my previous implementation which was cell by cell. Because simple I love this solution.

Asad Naeem Islamic Republic of Pakistan | Reply

4/18/2013 10:04:06 AM #

JMX

Would it perhaps be faster if you removed the boxing/unboxing penalty per cell? I'm not sure if that is even possible but sending explicit types, i.e., string if your field is intended to be a string would theoretically prevent Excel from having to box/unbox.

Just a thought.

JMX United States | Reply

4/27/2013 1:04:28 PM #

latelearner

Can a progrramme be written for the following situation? in C or C+ or C# or dot net or any other language

1. Nine numbers 1,2,3,4,5,6,7,8and 9 are to be filled in 12 adjacent cells in a row in an excel
    worksheet.

2.  A cell can be blank. It can contain one or more numbers.
3. In a row a number can appear only once.
4. In a cell the order of numbers  does not matter. ie., 2,3,5, is the same as 2,5,3 or 3,5,2, or 3,2,5
    or 5,2,3 or 5,3,2

How many rows will be needed to fill the cells in all possible combinations with the conditions stated above

latelearner India | Reply

4/28/2013 8:02:18 PM #

pingback

Pingback from blogosfera.co.uk

C#: Cell by Cell writing is very slow, but I'm facing a situation in range writing | BlogoSfera

blogosfera.co.uk | Reply

5/18/2013 1:11:08 AM #

alex leblois

Hi!

This is a nice artile. Thanks for sharing your knowlodge. I have read some other links (Write data in EXCEL, CSV and XML file using C#) that's also helpful for

developers.

www.codeproject.com/.../Export-Tabular-Data-in-CSV-and-Excel-Formats-Throu

http://www.mindstick.com/Blog/257/Write%20data%20in%20EXCEL%20CSV%20and

alex leblois United States | Reply

5/29/2013 5:44:09 PM #

Anil

Hi Mathias, In the WriteArray method when i actually assign the actual value from the dataset rows instead of data[row - 1, column - 1] = "Test", i get the insufficient memory error for large number of rows for ex: 50K rows. But, it works just with value "Test". Any idea where i am missing?

Thanks
Anil

Anil India | Reply

7/10/2013 6:19:25 PM #

pingback

Pingback from word.deveronline.com

How to optimize exporting data to excel in Excel Interop? | Excel Tips | Word Tutorials

word.deveronline.com | Reply

11/22/2014 10:23:28 AM #

pingback

Pingback from ecanswers.org

Is there a proceed to boost opening to conceal rows in Excel controlling Excel Add-in and C# | Zicherman

ecanswers.org | Reply

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading



Comments

Comment RSS