Spreadsheet::XLSX and humongous spreadsheets
By Abhijit Menon-Sen <ams@toroid.org>
I've been using Spreadsheet::Read backed by Spreadsheet::ParseExcel and Spreadsheet::XLSX to process Excel spreadsheets, and the combination works reasonably well (despite various frustrating problems in retrieving dates).
Today, I wrote code with lots of diagnostics to parse a large spreadsheet I'd never seen before. The code ran properly for a while, and then complained of various inconsistencies in the data. After much digging, I realised that Spreadsheet::XLSX wasn't parsing columns beyond ZZ. LibreOffice showed me data that my Perl code wasn't seeing; but it, too, has an arbitrary 1024 column limit. I had to unzip the XLSX file and read the XML source to find the real number of columns: 1059.
Until today, I'd never imagined a spreadsheet that big either.
The fault lies with the following code in Spreadsheet/XLSX.pm, which assumes that the column name is two letters followed by a number:
foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) { if (/^\<c r=\"([A-Z])([A-Z]?)(\d+)\"/) { $col = ord ($1) - 65; if ($2) { $col++; $col *= 26; $col += (ord ($2) - 65); }
Fortunately, it's not hard to extend the code to handle longer names:
foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) { if (/^\<c r=\"([A-Z])([A-Z]*)(\d+)\"/) { $col = ord ($1) - 65; my $rest = $2; while (my $l = substr ($rest, 0, 1, '')) { $col++; $col *= 26; $col += (ord ($l) - 65); }
I sent the patch to Dmitry Ovsyenko, author of Spreadsheet::XLSX, but received no response.