Extraire le HTML d'un email au format mbox
Posted on 2017-08-07 in Trucs et astuces
Here is a small script to convert mail in the mbox format to HTML. For the script to work, the body of the message (recommended) or its first attachment must contain the HTML code.
By default, the script will convert all mbox files of the current folder. You can also give it a path to another folder as first argument. The HTML is saved in the same folder than the original file in a file with the same name and the .html extension.
#!/usr/bin/env python3 import sys from glob import glob from mailbox import mboxMessage from os.path import join, splitext path = '.' if len(sys.argv) > 1: path = sys.argv[1] for path in glob(join(path, '*.mbox')): with open(path, 'r') as mail_file: message = mboxMessage(mail_file.read()) content = message.get_payload() if isinstance(content, str): html = message.get_payload(decode=True).decode('utf-8') else: html = content[0].get_payload(decode=True).decode('utf-8') html = html.replace('charset=iso-8859-1', 'charset=utf-8') file_name, _ = splitext(path) with open(f'{file_name}.html', 'w') as html_file: html_file.write(html)
You can also find it on github: https://github.com/Jenselme/dot-files-shell/blob/master/bin/extract-html-email.py