Mathias Brandewinder on .NET, VSTO and Excel development, and quantitative analysis.
by Mathias 20. October 2009 10:46

When working with Excel workbooks with C#, I often need to retrieve the entire contents of a particular worksheet, so that I can process the data within C# code. By “the entire contents”, I mean the content of every cell between cell A1 and the last cell of the sheet, that is, the cell such that there is no cell on its right or below it that contains anything.

To do this, I use the following code, where excelWorksheet is a Worksheet (duh):

Excel.Range firstCell = excelWorksheet.get_Range("A1", Type.Missing);
Excel.Range lastCell = excelWorksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);

object[,] cellValues;
object[,] cellFormulas;

Excel.Range worksheetCells = excelWorksheet.get_Range(firstCell, lastCell);
cellValues = worksheetCells.Value2 as object[,];
cellFormulas = worksheetCells.Formula as object[,];

The 2 resulting arrays of objects, cellValues and cellFormulas, contain the values and formulas, or null if the cell has no content.

However, while I was working on Akin recently, I realized 2 interesting things I had never noted before. First, the resulting array is 1-based, even though “C# arrays are zero indexed; that is, the array indexes start at zero”. Then, this code will fail if your spreadsheet contains only one value, in cell A1.

To prove my point, here is a snapshot of a QuickWatch of the cellValues array, reading a small spreadsheet. As you can see, the indexing begins at indexes 1 and 1.

OneBasedArray

It isn’t especially difficult to handle, but it really came as a surprise to me. Once you got used to zero-based indexes, which are pretty much the norm in .NET, this really is an intriguing oddity.

Once I realized this, I based all my code on the assumption that the array was going to be one-based, and everything went fine, until I realized that the following line was throwing an exception when the worksheet contained only a single populated cell in A1:

cellValues = worksheetCells.Value2 as object[,];

If you want to keep the same code, returning a 2-dimensional array, you have to handle that special case slightly differently, along the lines of:

if (lastCell.Row == 1 && lastCell.Column == 1)
{
    cellValues = new object[1,1];
    cellFormulas = new object[1,1];
    cellValues[0, 0] = firstCell.Value2;
    cellFormulas[0, 0] = firstCell.Formula;
}

But now this creates its own particular problem, because cellValues and cellFormulas are of course 0-based. I looked around and couldn’t find a way to declare a one-based array in C# (does anyone know if this is feasible?), so your best options are to either transform the array obtained in the standard case into a 0-based array, or, much less elegant, read the “special case” into a 0-based array which can be read through as a 1-based array:

if (lastCell.Row == 1 && lastCell.Column == 1)
{
    cellValues = new object[2,2];
    cellFormulas = new object[2,2];
    cellValues[1, 1] = firstCell.Value2;
    cellFormulas[1, 1] = firstCell.Formula;
}
I am frankly a bit torn between the 2 approaches. On the one hand, I feel much more comfortable working with a standard, 0-based array. On the other hand, while I strongly dislike having a method which returns sometimes a non-standard 1-based array, and sometimes a 0-based array but should be handled as a 1-based array (just describing this makes me queasy), it seems counter-productive to incur the cost of transforming a larger array most of the times, for the sake of the one-cell case, which is clearly an unusual boundary case. I’ll let you decide how you want to handle this!

Comments

10/17/2009 4:36:51 AM #

Jon Peltier

That's one of the nice things about VBA and pre-.Net VB: You can declare an array with any starting and ending index values:

Dim vData(2 to 5, 3 to 6) As Variant

This array may have been declared thus to relate to the range C2:F5, and the indexes of an array element matches the row and column indexes of the corresponding cell.

.Net has made all arrays 0-based. Good for consistency, until you run into an OM-imposed inconsistency.

Jon Peltier United States | Reply

10/26/2009 10:01:34 PM #

Mathias

I guess I am torn on the issue. I really like the consistency of .NET, where everything is zero-based. I tend to regularly make mistakes whenever indexes, upper- and lower-bounds are involved, so having one less thing to worry about, the lower bound, is definitely a plus for me! I wish the arrays coming from Excel into .NET were systematically converted to 0-based, to keep everything coherent, but I assume there must have been a good reason for it not being the case...

Mathias | Reply

3/29/2010 5:55:18 PM #

Lonely

I go the following error: outOfMemoryException or "Insufficient Memory to continue the execution of the program". That exception occurred when assigning the values of the range to the worksheet to the Object array ValueArray:

object[,] valueArray = (object[,]) worksheetCells.Value2;

Lonely United States | Reply

3/29/2010 6:13:20 PM #

mathias

I am assuming you are not getting this systematically, but only with a specific worksheet, is that correct? I have never encountered the issue, but when I see outOfMemoryException my first instinct would be to look for a size problem - is the worksheet/array large?

mathias United States | Reply

3/29/2010 9:20:05 PM #

Lonely

Actualy my excel input file has 1.3 millions record and I am not sure if that is the problem. If it is the problem, Is there a way to overcome this problem and use the same application I developed. Keep in mind that the excel file is a stream of data.

Regards,

Lonely United States | Reply

3/30/2010 8:37:30 AM #

Mathias

I suspect the size of the array is the guilty party. I just tried instantiating a large array on my PC with the following statement, and I got an OutOfMemoryException.
var data = new object[1300000,10000];
If this is indeed the source of the problem, you should be able to address it by reading the contents in chunks, like 100,000 rows at a time, using a smaller array.
That being said, 1,300,000 rows is an awful lot of data to store in Excel! Is there a reason why you wouldn't use a database to store that?

Mathias | Reply

3/30/2010 1:51:03 PM #

Lonely

I already opened the same file in Excel 2007. I do not know if there is another solution to avoid using arrays in retrieving data from Excel sheets. Please help me to find out if there is another way to resolve this problem and access the sheet data without moving the data to another data structure like arrays.

Regards,

Lonely United States | Reply

3/30/2010 2:17:00 PM #

Lonely

Microsoft.Office.Interop.Excel.Range firstCell = well_output_worksheet.get_Range("A1", Type.Missing);
                    Microsoft.Office.Interop.Excel.Range lastCell = well_output_worksheet.Cells.SpecialCells(Microsoft.Office.Interop.Excel.XlCellType.xlCellTypeLastCell, Type.Missing);

I really do not understand what the Range object mean. I am really new to .net programming. Would you mind to explain to me what it means?

Regards,

Lonely United States | Reply

3/30/2010 2:33:32 PM #

Lonely

Another question to you. I would like to know if the array and range are the same. I mean any of them can be converted to the other.

Regards,

Lonely United States | Reply

3/31/2010 7:07:51 AM #

mathias

@Lonely,
The Range object is a collection of Excel cells; internally, Excel has no "cell" but only ranges. A cell is a Range, but a Range can be a selection of cells. In the code example, what we are doing is selecting the first cell of the sheet (A1), the last cell of the sheet (special cell), and the range defined by these 2 cells (ex: A1: F5 if F5 is the last one).
Regarding reading from a workbook, these 2 threads on StackOverflow may help, too - they describe
different techniques:
stackoverflow.com/.../reading-excel-files-from-c
stackoverflow.com/.../reading-excel-files-as-a-server-process

mathias United States | Reply

4/26/2010 8:33:15 AM #

trackback

Create an Excel 2007 VSTO add-in: read worksheets

Create an Excel 2007 VSTO add-in: read worksheets

Clear Lines Blog | Reply

5/17/2010 8:14:52 PM #

pingback

Pingback from itbiancheng.com

IT编程之家  » Blog Archive   » Create an Excel 2007 VSTO add-in: read worksheets

itbiancheng.com | Reply

7/30/2010 4:30:34 PM #

home bedding

I keep getting error on "cellValues = worksheetCells.Value2 as object[,];" What the heck?

home bedding United States | Reply

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading



Comments

Comment RSS