Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 20. October 2009 10:46

When working with Excel workbooks with C#, I often need to retrieve the entire contents of a particular worksheet, so that I can process the data within C# code. By “the entire contents”, I mean the content of every cell between cell A1 and the last cell of the sheet, that is, the cell such that there is no cell on its right or below it that contains anything.

To do this, I use the following code, where excelWorksheet is a Worksheet (duh):

Excel.Range firstCell = excelWorksheet.get_Range("A1", Type.Missing);
Excel.Range lastCell = excelWorksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);

object[,] cellValues;
object[,] cellFormulas;

Excel.Range worksheetCells = excelWorksheet.get_Range(firstCell, lastCell);
cellValues = worksheetCells.Value2 as object[,];
cellFormulas = worksheetCells.Formula as object[,];

The 2 resulting arrays of objects, cellValues and cellFormulas, contain the values and formulas, or null if the cell has no content.

However, while I was working on Akin recently, I realized 2 interesting things I had never noted before. First, the resulting array is 1-based, even though “C# arrays are zero indexed; that is, the array indexes start at zero”. Then, this code will fail if your spreadsheet contains only one value, in cell A1.

To prove my point, here is a snapshot of a QuickWatch of the cellValues array, reading a small spreadsheet. As you can see, the indexing begins at indexes 1 and 1.

OneBasedArray

It isn’t especially difficult to handle, but it really came as a surprise to me. Once you got used to zero-based indexes, which are pretty much the norm in .NET, this really is an intriguing oddity.

Once I realized this, I based all my code on the assumption that the array was going to be one-based, and everything went fine, until I realized that the following line was throwing an exception when the worksheet contained only a single populated cell in A1:

cellValues = worksheetCells.Value2 as object[,];

If you want to keep the same code, returning a 2-dimensional array, you have to handle that special case slightly differently, along the lines of:

if (lastCell.Row == 1 && lastCell.Column == 1)
{
    cellValues = new object[1,1];
    cellFormulas = new object[1,1];
    cellValues[0, 0] = firstCell.Value2;
    cellFormulas[0, 0] = firstCell.Formula;
}

But now this creates its own particular problem, because cellValues and cellFormulas are of course 0-based. I looked around and couldn’t find a way to declare a one-based array in C# (does anyone know if this is feasible?), so your best options are to either transform the array obtained in the standard case into a 0-based array, or, much less elegant, read the “special case” into a 0-based array which can be read through as a 1-based array:

if (lastCell.Row == 1 && lastCell.Column == 1)
{
    cellValues = new object[2,2];
    cellFormulas = new object[2,2];
    cellValues[1, 1] = firstCell.Value2;
    cellFormulas[1, 1] = firstCell.Formula;
}
I am frankly a bit torn between the 2 approaches. On the one hand, I feel much more comfortable working with a standard, 0-based array. On the other hand, while I strongly dislike having a method which returns sometimes a non-standard 1-based array, and sometimes a 0-based array but should be handled as a 1-based array (just describing this makes me queasy), it seems counter-productive to incur the cost of transforming a larger array most of the times, for the sake of the one-cell case, which is clearly an unusual boundary case. I’ll let you decide how you want to handle this!

Comments

10/17/2009 4:36:51 AM #

Jon Peltier

That's one of the nice things about VBA and pre-.Net VB: You can declare an array with any starting and ending index values:

Dim vData(2 to 5, 3 to 6) As Variant

This array may have been declared thus to relate to the range C2:F5, and the indexes of an array element matches the row and column indexes of the corresponding cell.

.Net has made all arrays 0-based. Good for consistency, until you run into an OM-imposed inconsistency.

Jon Peltier United States | Reply

10/26/2009 10:01:34 PM #

Mathias

I guess I am torn on the issue. I really like the consistency of .NET, where everything is zero-based. I tend to regularly make mistakes whenever indexes, upper- and lower-bounds are involved, so having one less thing to worry about, the lower bound, is definitely a plus for me! I wish the arrays coming from Excel into .NET were systematically converted to 0-based, to keep everything coherent, but I assume there must have been a good reason for it not being the case...

Mathias | Reply

3/29/2010 5:55:18 PM #

Lonely

I go the following error: outOfMemoryException or "Insufficient Memory to continue the execution of the program". That exception occurred when assigning the values of the range to the worksheet to the Object array ValueArray:

object[,] valueArray = (object[,]) worksheetCells.Value2;

Lonely United States | Reply

3/29/2010 6:13:20 PM #

mathias

I am assuming you are not getting this systematically, but only with a specific worksheet, is that correct? I have never encountered the issue, but when I see outOfMemoryException my first instinct would be to look for a size problem - is the worksheet/array large?

mathias United States | Reply

3/29/2010 9:20:05 PM #

Lonely

Actualy my excel input file has 1.3 millions record and I am not sure if that is the problem. If it is the problem, Is there a way to overcome this problem and use the same application I developed. Keep in mind that the excel file is a stream of data.

Regards,

Lonely United States | Reply

3/30/2010 8:37:30 AM #

Mathias

I suspect the size of the array is the guilty party. I just tried instantiating a large array on my PC with the following statement, and I got an OutOfMemoryException.
var data = new object[1300000,10000];
If this is indeed the source of the problem, you should be able to address it by reading the contents in chunks, like 100,000 rows at a time, using a smaller array.
That being said, 1,300,000 rows is an awful lot of data to store in Excel! Is there a reason why you wouldn't use a database to store that?

Mathias | Reply

3/30/2010 1:51:03 PM #

Lonely

I already opened the same file in Excel 2007. I do not know if there is another solution to avoid using arrays in retrieving data from Excel sheets. Please help me to find out if there is another way to resolve this problem and access the sheet data without moving the data to another data structure like arrays.

Regards,

Lonely United States | Reply

3/30/2010 2:17:00 PM #

Lonely

Microsoft.Office.Interop.Excel.Range firstCell = well_output_worksheet.get_Range("A1", Type.Missing);
                    Microsoft.Office.Interop.Excel.Range lastCell = well_output_worksheet.Cells.SpecialCells(Microsoft.Office.Interop.Excel.XlCellType.xlCellTypeLastCell, Type.Missing);

I really do not understand what the Range object mean. I am really new to .net programming. Would you mind to explain to me what it means?

Regards,

Lonely United States | Reply

3/30/2010 2:33:32 PM #

Lonely

Another question to you. I would like to know if the array and range are the same. I mean any of them can be converted to the other.

Regards,

Lonely United States | Reply

3/31/2010 7:07:51 AM #

mathias

@Lonely,
The Range object is a collection of Excel cells; internally, Excel has no "cell" but only ranges. A cell is a Range, but a Range can be a selection of cells. In the code example, what we are doing is selecting the first cell of the sheet (A1), the last cell of the sheet (special cell), and the range defined by these 2 cells (ex: A1: F5 if F5 is the last one).
Regarding reading from a workbook, these 2 threads on StackOverflow may help, too - they describe
different techniques:
stackoverflow.com/.../reading-excel-files-from-c
stackoverflow.com/.../reading-excel-files-as-a-server-process

mathias United States | Reply

4/26/2010 8:33:15 AM #

trackback

Create an Excel 2007 VSTO add-in: read worksheets

Create an Excel 2007 VSTO add-in: read worksheets

Clear Lines Blog | Reply

5/17/2010 8:14:52 PM #

pingback

Pingback from itbiancheng.com

IT编程之家  » Blog Archive   » Create an Excel 2007 VSTO add-in: read worksheets

itbiancheng.com | Reply

7/30/2010 4:30:34 PM #

home bedding

I keep getting error on "cellValues = worksheetCells.Value2 as object[,];" What the heck?

home bedding United States | Reply

1/31/2011 9:56:02 AM #

charlie

This is very useful, I appreciate this example a lot...

... but what about the opposite way? Do you know any fast (bulk) method of updating Excel range object with values from 2d array?

I found out that using .NET to update sheets works very slow, even when Screen Update is set to false.

charlie United States | Reply

1/31/2011 10:44:54 AM #

Jon Peltier

If you update the sheet one cell at a time, it's slow, particularly if you have to keep crossing the .Net/VBA barrier.

Given an array A with i rows and j columns, use this approach. It's in VBA, but you can figure out the .Net equivalent:

ActiveSheet.Range("A1").Resize(i, j).Value = A

One single operation, not i x j operations.

Jon Peltier United States | Reply

2/1/2011 9:37:32 AM #

charlie

Thanks Jon! It works just great.

Now, let's say I want to do the same with Range.Interior.ColorIndex property.
I already tried the pattern:
Range.Interior.ColorIndex = myArray
This will not work.  Then I tried to do the same using Range.Interior and creating array of Excel.Interior objects but I failed again.
The only way I see right now is to define the range containing only selected cells that I want to change, like:
Excel.Range tmpRange =  (Excel.Range) mySheet.get_Range(selectedCells);
tmpRange.Interior.ColorIndex = 6;
The only inconvenience I see here, is that selectedCells string must be in "A1,B2" format, not "R1C1,R2C2".

charlie United States | Reply

2/1/2011 10:50:07 AM #

Jon Peltier

You can use an array to populate the contents of cells, but not the formats. The contents can be assigned using e.g., tmpRange.Value, tmpRange.Text, tmpRange.Formula, tmpRange.FormulaR1C1.

Ath R1C1 to A1 conversion is not too troublesome. In fact, in VBA you can use

mySheet.Cells(i,j)

to reference a cell you know the row and column numbers for. To define a rectangular region, use

mySheet.Range(mySheet.Cells(i,j),mySheet.Cells(k,l))

which is the rectangle with cells(i,j) on one corner and cells(k,l) on the diametrically opposite corner (need not be top left and bottom right).

Jon Peltier United States | Reply

2/2/2011 10:13:00 AM #

charlie

Actually, I want to avoid referring to Excel when my range is being prepared. That's why I concatenate the string of cell addresses. Let's say, my string looks like that:
String myCells = "A1,W7,AB11,[etc]";
Next, I just create myRange containing all selected cells:
Excel.Range myRange = mySheet.get_Range(myCells);
Then I can change their formatting quickly in one line:
myRange.Interior.ColorIndex = 6;
Now, there is a problem if the string with cell addresses is too long...
Excel throws an error when I try to pass more that 255 bytes.
My solution for now: create a "buffer" of cells to be changed and update excel sheet in chunks. It's still faster than editing cells one by one Smile


charlie United States | Reply

8/7/2011 11:12:00 AM #

trackback

Create an Excel 2007 VSTO add-in: read worksheets

Create an Excel 2007 VSTO add-in: read worksheets

Clear Lines Blog | Reply

1/3/2012 9:31:10 PM #

Sandeep

Hi Mathias,

I have an excel file, with many sheets, each sheet having data. I need to read each and every sheet and display the sheet data in multiple tabs. Could you please suggest me what is the best approach for this. I am currently using " Select * from Sheetname " using oledbConnection. I do have some cells which are merged. Using the select * from approach does not retrieve the merged cells properly.

Should I go for the interop. But somewhere on Stackoverflow I read for large number of rows, select * from Sheet is the best approach as reading cell by cell would be slower.

I tried reading cell by cell to, but it was considerably slower. So please help me out, as I have to read the cells as well as find out about merged cells.

Thanks

Sandeep India | Reply

6/26/2012 11:50:15 AM #

pingback

Pingback from cutebeachvacationideas.sims2dl.com

Possible details of Google Nexus tablet emerge | Cute Beach Vacation Ideas

cutebeachvacationideas.sims2dl.com | Reply

11/21/2012 5:21:38 PM #

SUJIL T

Hi,
Thanks for sharing the code ,i used your code this is gud,  i need to export cellvalues to xml is it possible?

SUJIL T India | Reply

12/4/2012 6:24:55 PM #

Mathias

Hi Sujil, it is definitely possible - once you have read data, there should be no issue saving it in whatever format you please. If your whole goal is to export to XML, you probably want to avoid Office interop, and grab data using the OpenXML SDK.

Mathias United States | Reply

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading



Comments

Comment RSS