Excel Spreadsheet Transformations

Today I had an request to transform data that came from a printed report into usable Excel data. The data had been printed to PDF, so a PDF to Excel converter tool did the bulk of the work. However, even after being placed in Excel, there were two remaining requirements to achieve this goal. The first was to remove the recurring headers and footers of the report. The second was to reassemble each reported item's three lines of data back into to a single coherent row of data.

To do the initial transformation (removing the headers) I used the following VB for Applications code:

Sub RemoveHeaders()
Dim CellRow As Integer, CellCol As Integer

CellRow = 1
CellCol = 1
Dim eof As Integer
eof = 0
Do Until eof = 1
    Select Case Cells(CellRow, CellCol).Value
    Case "EOF" 'We have found the manually inserted EOF. Stop.
        Exit Do
    Case "MIGBCL01" 'We have found the page header tag. 
        'The next two lines are not actually used in the final run, 
        'but were mearly a visual confirmation that the targeted lines were accurate.
        'Set myRange = range(Cells(CellRow, CellCol), Cells(CellRow + 13, CellCol))
        'myRange.Select

        'Remove the row and the 13 rows after it.
        For i = CellRow To CellRow + 13
            rows(CellRow).Delete
        Next
    Case Else 'We have an ordinary row, continue
        CellRow = CellRow + 1
    End Select
Loop
Cells(CellRow, CellCol).Select 'Select EOF, we are done.

End Sub

This is a very simple routine, but hopefully others will find the ideas in it helpful when confronted with data that is oddly formatted. There were a few manually performed steps: first was the insertion of the EOF to stop the routine and the second was manually handling the totals at the bottom of the report. However, those changes only took seconds to perform: the bulk of the work was removing the hundreds of page breaks. The code completed in under a minute (and would have been faster if Application.ScreenUpdating had been set to false for the actual run).

After getting the headers out of the way, it was time to transform the data itself. First, I created extra columns to move the data into:

Sub createNewColumns()
'This routine creates two extra columns for each existing
'column. This will allow dumping the data from the rows
'below a main row into the column.
Dim i As Integer
For i = 11 To 2 Step -1 'We have 11 columns,
    'we wish to insert two more for each to "flatten" the data.
    Columns(i).Insert
    Columns(i).Insert
Next
End Sub

Then we needed to move the data itself. Here we loop over the range, migrating the data from the old rows into the new columns.

Sub moveData()
'Move through the data grid, tranferring data from columns
'into a single row.

Dim startRow As Integer 'The first row of a data group.
Dim columnNumber As Integer 'The column to work in
Dim dataRow As Integer 'The subdata row to work with
For startRow = 13 To 3249 Step 4 'Hard coded start, end and skip length for run.
    For columnNumber = 1 To 31 Step 3 'Gap is 4, only 3 rows of data, one blank
        For dataRow = 1 To 2 'Simply 1,2 to get at the rows below the main.
            Cells(startRow, columnNumber + dataRow) = _
                Cells(startRow + dataRow, columnNumber) 'Move from rows to columns
        Next dataRow
    Next columnNumber
Next startRow
End Sub

Finally we remove the extra data rows, which are no longer necessary with the data in the new columns

Sub removeExtraRows()
'Remove the extra rows now that data is in single rows.
Dim CellRow As Integer, i As Integer
CellRow = 13 'Start on first data line
Do
    If Cells(CellRow, 1) = "EOF" Then
        Exit Do 'We are done.
    End If
    
    CellRow = CellRow + 1
    For i = 1 To 3 'three extra rows each.
        rows(CellRow).Delete
    Next
Loop
Cells(CellRow, 1).Select 'We are done, select EOF
End Sub

With these steps performed, I formatted a few columns to improve data display and forwarded the transformed spreadsheet on to the end user. I have not turned this into a generic tool because each situation will be different and require a lot of cusomization, but hopefully these code snippets will help someone looking for a template on how to manipluate Excel spreadsheets.

Category: