How to create a nicely formatted Excel table from a pandas DataFrame using openpyxl

When I want to save the current state of a pandas DataFrame for “manual consumption”, I often write df.to_excel('foo.xlsx') within my IPython session or Jupyter Notebook. However, the default style does not look pretty and often needs manual adjustments (e.g. column widths) to be usable.

If you want to create custom reports from pandas, you therefore might want to to this styling programmatically. Package openpyxl does this job nicely. Here is a motivational example that shows you the basic ways of its API. First, have some random data:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""\
               alpha      beta     gamma
2000-01-01 -0.173215  0.119209 -1.044236
2000-01-02 -0.861849 -2.104569 -0.494929
2000-01-03  1.071804  0.721555 -0.706771
2000-01-04 -1.039575  0.271860 -0.424972
2000-01-05  0.567020  0.276232 -1.087401
2000-01-06 -0.673690  0.113648 -1.478427
2000-01-07  0.524988  0.404705  0.577046
2000-01-08 -1.715002 -1.039268 -0.370647
"""), sep="\s+", parse_dates=True)

output_filename = 'pandas-to-excel.xlsx'

No customization

Just for comparison, a plain export of the dataframe to the first worksheet:

sheet_name = 'No customization'
df.to_excel(output_filename, sheet_name)

Little customisation

As the first column is not readable with the default column width, we increase it slightly. The unit roughly corresponds to Excel’s column width unit. On my machine, the resulting worksheets has a column width of 20.29…

with pd.ExcelWriter(
        output_filename,
        mode='a',  # append; default='w' (overwrite)
        engine='openpyxl') as xlsx:
    sheet_name = 'Little customization'
    df.to_excel(xlsx, sheet_name)
    
    # set index column width
    ws = xlsx.sheets[sheet_name]
    ws.column_dimensions['A'].width = 21

Full customisation

Motivational example of what you can do with some more specifications, namely:

  • Conditional formatting, here with a custom color scale using percentiles for lower and upper bound (10%-90%), while the midpoint is defined by value (0 is white). Symbols and color bars are possible as well, of course.
  • Number format, using the same syntax as in Excel’s cell properties dialogue. Here, the index column’s date includes the shortened weekday name (Mo-Su).
  • Applying a builtin named style by name (Headline 2) to the title row.
  • Creating and applying a custom named style (Index Style) including number format, font and alignment.

First, the imports and rule/style definitions:

from openpyxl.formatting.rule import ColorScaleRule
from openpyxl.styles import Alignment, Font, NamedStyle
from openpyxl.utils import get_column_letter

percentile_rule = ColorScaleRule(
    start_type='percentile',
    start_value=10,
    start_color='ffaaaa',  # red-ish
    mid_type='num',
    mid_value=0,
    mid_color='ffffff',  # white
    end_type='percentile',
    end_value=90,
    end_color='aaffaa')  # green-ish
    
# custom named style for the index
index_style = NamedStyle(
    name="Index Style",
    number_format='YYYY-MM-DD, DDD',
    font=Font(color='999999', italic=True),
    alignment=Alignment(horizontal='left'))

# pass keyword args as dictionary
writer_args = {
    'path': output_filename,
    'mode': 'a',
    'engine': 'openpyxl'}

With that, the actual writing looks like this:

with pd.ExcelWriter(**writer_args) as xlsx:
    sheet_name = 'Full customization'
    df.to_excel(xlsx, sheet_name)
    ws = xlsx.sheets[sheet_name]
    
    # cell ranges
    index_column = 'A'
    value_cells = 'B2:{col}{row}'.format(
        col=get_column_letter(ws.max_column),
        row=ws.max_row)
    title_row = '1'

    # index column width
    ws.column_dimensions[index_column].width = 21

    # color all value cells
    ws.conditional_formatting.add(value_cells, 
                                  percentile_rule)
    
    # for general styling, one has to iterate over
    # all cells individually
    for row in ws[value_cells]:
        for cell in row:
            cell.number_format = '0.00'
    
    # builtin or named styles can be applied by using
    # the style object or their name (shown below)
    for cell in ws[index_column]:
        cell.style = index_style
    
    # style title row last, so that headline style
    # wins over index style in top-left cell A1
    for cell in ws[title_row]:
        cell.style = 'Headline 2'

If you want the above in one script, try my gist pandas-to-excel.py. It creates a single Excel file with the three appropriately named spreadsheets.


Posted

in

,

by

Tags: