I've been using
Spreadsheet::Read
backed by
Spreadsheet::ParseExcel
and
Spreadsheet::XLSX
to process Excel spreadsheets, and the combination works reasonably
well (despite various frustrating problems in retrieving dates).
Today, I wrote code with lots of diagnostics to parse a large
spreadsheet I'd never seen before. The code ran properly for a while,
and then complained of various inconsistencies in the data. After much
digging, I realised that Spreadsheet::XLSX wasn't parsing columns beyond
ZZ. LibreOffice showed me data that my Perl code wasn't seeing; but it,
too, has an arbitrary 1024 column limit. I had to unzip the XLSX file
and read the XML source to find the real number of columns: 1059.
Until today, I'd never imagined a spreadsheet that big either.
The fault lies with the following code in Spreadsheet/XLSX.pm, which
assumes that the column name is two letters followed by a number:
foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) {
if (/^\<c r=\"([A-Z])([A-Z]?)(\d+)\"/) {
$col = ord ($1) - 65;
if ($2) {
$col++;
$col *= 26;
$col += (ord ($2) - 65);
}
Fortunately, it's not hard to extend the code to handle longer names:
foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) {
if (/^\<c r=\"([A-Z])([A-Z]*)(\d+)\"/) {
$col = ord ($1) - 65;
my $rest = $2;
while (my $l = substr ($rest, 0, 1, '')) {
$col++;
$col *= 26;
$col += (ord ($l) - 65);
}
I sent
the patch to Dmitry Ovsyenko,
author of Spreadsheet::XLSX, but received no response.