Excel Spreadsheet Transformations
Today I had an request to transform data that came from a printed report into usable Excel data. The data had been printed to PDF, so a PDF to Excel converter tool did the bulk of the work. However, even after being placed in Excel, there were two remaining requirements to achieve this goal. The first was to remove the recurring headers and footers of the report. The second was to reassemble each reported item's three lines of data back into to a single coherent row of data.
To do the initial transformation (removing the headers) I used the following VB for Applications code:
Sub RemoveHeaders() Dim CellRow As Integer, CellCol As Integer CellRow = 1 CellCol = 1 Dim eof As Integer eof = 0 Do Until eof = 1 Select Case Cells(CellRow, CellCol).Value Case "EOF" 'We have found the manually inserted EOF. Stop. Exit Do Case "MIGBCL01" 'We have found the page header tag. 'The next two lines are not actually used in the final run, 'but were mearly a visual confirmation that the targeted lines were accurate. 'Set myRange = range(Cells(CellRow, CellCol), Cells(CellRow + 13, CellCol)) 'myRange.Select 'Remove the row and the 13 rows after it. For i = CellRow To CellRow + 13 rows(CellRow).Delete Next Case Else 'We have an ordinary row, continue CellRow = CellRow + 1 End Select Loop Cells(CellRow, CellCol).Select 'Select EOF, we are done. End Sub
This is a very simple routine, but hopefully others will find the ideas in it helpful when confronted with data that is oddly formatted. There were a few manually performed steps: first was the insertion of the EOF to stop the routine and the second was manually handling the totals at the bottom of the report. However, those changes only took seconds to perform: the bulk of the work was removing the hundreds of page breaks. The code completed in under a minute (and would have been faster if Application.ScreenUpdating had been set to false for the actual run).
After getting the headers out of the way, it was time to transform the data itself. First, I created extra columns to move the data into:
Sub createNewColumns() 'This routine creates two extra columns for each existing 'column. This will allow dumping the data from the rows 'below a main row into the column. Dim i As Integer For i = 11 To 2 Step -1 'We have 11 columns, 'we wish to insert two more for each to "flatten" the data. Columns(i).Insert Columns(i).Insert Next End Sub
Then we needed to move the data itself. Here we loop over the range, migrating the data from the old rows into the new columns.
Sub moveData() 'Move through the data grid, tranferring data from columns 'into a single row. Dim startRow As Integer 'The first row of a data group. Dim columnNumber As Integer 'The column to work in Dim dataRow As Integer 'The subdata row to work with For startRow = 13 To 3249 Step 4 'Hard coded start, end and skip length for run. For columnNumber = 1 To 31 Step 3 'Gap is 4, only 3 rows of data, one blank For dataRow = 1 To 2 'Simply 1,2 to get at the rows below the main. Cells(startRow, columnNumber + dataRow) = _ Cells(startRow + dataRow, columnNumber) 'Move from rows to columns Next dataRow Next columnNumber Next startRow End Sub
Finally we remove the extra data rows, which are no longer necessary with the data in the new columns
Sub removeExtraRows() 'Remove the extra rows now that data is in single rows. Dim CellRow As Integer, i As Integer CellRow = 13 'Start on first data line Do If Cells(CellRow, 1) = "EOF" Then Exit Do 'We are done. End If CellRow = CellRow + 1 For i = 1 To 3 'three extra rows each. rows(CellRow).Delete Next Loop Cells(CellRow, 1).Select 'We are done, select EOF End Sub
With these steps performed, I formatted a few columns to improve data display and forwarded the transformed spreadsheet on to the end user. I have not turned this into a generic tool because each situation will be different and require a lot of cusomization, but hopefully these code snippets will help someone looking for a template on how to manipluate Excel spreadsheets.