Wednesday, December 20, 2006

Read Excel table via ADO

I think it's quite a common task to read data from Excel files. A straight-forward way is to use Excel COM interfaces (Workbook, Worksheet, Range, Cell, etc), but damn it's slow! On the other hand there is ADO (ActiveX Data Objects) that is fast, flexible and familiar to many developers. So it would be nice to access Excel files via ADO to speed up data import and simplify application code.

Here is a utility class that allows reading Excel worksheets as Recordsets. It uses ADOX.Catalog to collect Excel worksheet names and ADODB.Connection, ADODB.Recordset to access worksheet data with a simple "select * from [sheetname]" Recordset's command.

A typical scenario would be something like this:

Counter         cnTotal = 0;
ItemId          itemId;
ItemName        itemName;
AmountCur       price;
Filename        strFilename;
container       conSheets;
ExcelImportADO  xlImport;
;
strFilename = @"c:\import.xls";
xlImport    = new ExcelImportADO(strFilename);
try
{
    // open the first sheet by default
    if (!xlImport.openFile())
    {
        throw error(strfmt("Error opening Excel file '%1'", strFilename));
    }
    if (xlImport.getFieldsCount() < 3)
    {
        throw error(strfmt("Too few columns in the recordset: found %1, expected at least %2",
                           xlImport.getFieldsCount(), 3));
    }
    while (!xlImport.eof())
    {
        itemId   = xlImport.getFieldValue(1);
        itemName = xlImport.getFieldValue('ItemName');
        price    = xlImport.getFieldValue('ItemPrice', false);
        // process data...
        cnTotal++;
        xlImport.moveNext();
    }
    xlImport.finalize();
    Box::info(strfmt("%1 records read", cnTotal));
}
catch (Exception::Error)
{
    xlImport.finalize();
}

You can download the source code of the class at axaptapedia.com. I would like to thank Gustav and blokva for inspiration and very useful tips on improving the class.

Note: ADOX.Catalog returns table names (Excel sheet names) in an alphabetical order - not in the order they appear in a worksheet, and by default the class uses the first name returned by ADOX.Catalog!