• Taming tasks with Python

    How would I even start! Where would I find the time! What about all the mistakes I’d make.

    I had been asked to insert and edit over 204 images into 459 page word document, all images had to be the exact same size in the exact same place.

    Manually it would take forever. There had to be a better way.

    Many office tasks are boring and repetitive. If only there was a way to automate the boring stuff. In this example adding the images manually would have been time-consuming, error prone, and just plain boring. In a departure from my usually blog posts on Architectural photography, I though I would include a post on some of the technical solutions I’ve worked on in the office.

    “Talk Python to me” podcast episode #19 featured Al Sweigart.   His goal is to teach the python programming language to students, office workers , admin staff  to improve their work. His method teaches the python to automate common tasks. The site www.automatetheboringstuff.com is open source allowing any to delve in and learn the python programming language at your ease.

    Python is a simple, easy to learn and its syntax emphasizes readability. The language is very close to english and is very easy to learn. Python is used by many of the worlds biggest software companies. It is also gained popularity as a first language for University students.
    Python can be used to open, edit, save and delete files,it can find text and add images to document. This is just what I needed.

    For my project the task was to, open the document, search for the mention of a .jpg file, then insert that image file in the paragraph text.

    To open the .docx file we use PyDocx , the searching uses a Regular Expression Library: Regex. Then the script simply loops through all runs within all paragraphs and inserts the relavant image from a folder.

    Is there anything more?

    Only one thing, the add_picture function uses a “try/ except” flow so that any failures can be captured in a list.

    Simple, its so simple. Overall it took about 30 minutes to write and test the script. Heres the code.

    – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
    #The .docx element of the script is an adapation of code from https://automatetheboringstuff.com/chapter18/
    #The regex element of the script is an adapation of code from https://automatetheboringstuff.com/chapter7/
    import docx
    #Imports pyton-docx
    import re
    #Imprts Regex
    doc=docx.Document(‘testword.docx’)
    # The testword.docx is stored in the same folder as the script.
    faillist=[] #This initialises the faillist as a list
    print len(doc.paragraphs)
    #This prints to the console the lenght of paragraphs in the text
    for i in range(len(doc.paragraphs)):
    #This loops over the paragraphs
    for j in range(len(doc.paragraphs[i].runs)):
    #This loops over the runs ( blocks of text) in each paragraph
    imgname= re.compile(r’\d\d\d\d\d\d_00.jpg’)
    #Matches all filenames with a 6 digit prefix followed by _00.jpg
    imgret=imgname.search(doc.paragraphs[i].runs[j].text)
    # This searchs for a pattern match
    if imgret!=None:
    #If there is a pattern match in the run ( text block) search for a file that matches that name
    doc.paragraphs[i].runs[j].add_break()
    try:
    doc.paragraphs[i].runs[j].add_picture(“image_folder/”+imgret.group(),height=docx.shared.Cm(6))
    # Insert matching name form a local folder called ” image_folder”, sets the size to be 6cm high
    except:
    faillist.append(imgret.group())
    # If there is no match, add that to a list. This is so i can see if there are any problems
    print faillist
    # print ” This is awesome”
    # This is a optional print statement if you wish to celebrate your awesomeness.
    – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

    In python all this can be completed with relative ease. There is a library for opening .docx files, and a library for matching text. Fortunately the file-name of each image is mentioned on each page.
    https://www.python.org/doc/essays/blurb/
    https://automatetheboringstuff.com/