Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
orf committed May 24, 2015
0 parents commit 9c091e6
Show file tree
Hide file tree
Showing 50 changed files with 2,397 additions and 0 deletions.
58 changes: 58 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml

# Translations
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/

HtmlToWord.old
2 changes: 2 additions & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# file GENERATED by distutils, do NOT edit
setup.py
57 changes: 57 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
WordInserter
===
This module allows you to insert HTML or MarkDown into a Word Document, as well as allowing you to programmatically build
word documents in pure Python. The API is really simple to use:

``` python
from wordinserter import parse, render

operations = parse(html, parser="html") # or parser="markdown"
insert(operations, document=document, constants=constants)
```

Inserting HTML or Markdown into a Word document is a two step process: first the input has to be parsed into a sequence
of operations, which is then *rendered* into a Word document. This library currently only supports inserting using the
Word COM interface which means it is Windows specific at the moment.

Below is a more complex example including starting word that will insert a representation of the HTML code
into the new word document, including the image, caption and list.

``` python
from wordinserter import render, parse
from comtypes.client import CreateObject

# This opens Microsoft Word and creates a new document.
word = CreateObject("Word.Application")
word.Visible = True # Don't set this to True in production!
document = word.Documents.Add()
from comtypes.gen import Word as constants

html = """
<h3>This is a title</h3>
<p><img src="http://placehold.it/150x150" alt="I go below the image as a caption"></p>
<p><i>This is <b>some</b> text</i> in a <a href="http://google.com">paragraph</a></p>
<ul>
<li>Boo! I am a <b>list</b></li>
</ul>
"""

# Parse the HTML into a list of operations then feed them into render.
operations = parse(html, parser="html")
render(operations, document=document, constants=constants)
```

What's with the constants part? Wordinserter is agnostic to the COM library you use. Each library exposes constant
values that are needed by Wordinserter in a different way: the pywin32 library exposes it as win32com.client.constants
whereas the comtypes library exposes them as a module that resides in comtypes.gen. Rather than guess which one you
are using Wordinserter requires you to pass the right one in explicitly.


### Install
Get it [from PyPi here](https://pypi.python.org/pypi/wordinserter). This has been built with word 2010 and 2013, older
versions may produce different results.


## Supported Operations
WordInserter currently supports a range of different operations, including code blocks, font size/colors, images,
hyperlinks, numbered and bullet lists (
1 change: 1 addition & 0 deletions Tests/docs/BoldInLink.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>(<a href="http://google.com"><strong>bold</strong></a>) This should not be bold <strong>But this should be</strong></p>
21 changes: 21 additions & 0 deletions Tests/docs/break.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<p>Some Text</p>

<p>
TEXT YO<br/>
More break<br/>
This should not have a break in it<br/>
:)
</p>

<ul>
<li>83.3.136.74<br>
</li>
<li>1.100.136.75<br>
</li>
<li>83.2.1.76</li>
</ul>
<ul>
<li>www.paystobeprescot.co.uk</li>
</ul>

<p>sup doods</p>
79 changes: 79 additions & 0 deletions Tests/docs/complex_document.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<h1>Heading 1</h1>
<h2>Heading 2</h2>
<h3>Heading 3</h3>
<h4>Heading 4</h4>

<p>This is a test document showing what HtmlToWord can do. I hope this doesn't break.</p>

<p><b>Bold Text.</b> <i>Italic Text.</i> <b><i>Mix</i>ed</b><i> <b>St</b>yles</i><b><i>!</i></b></p>

<ul>
<li style="font-size: 20px;">ul tags are nested within li tags in this example</li>
<ul>
<li>I'm a child of the previous li</li>
</ul>
</ul>

<ul>
<li><strong>Bullet lists</strong></li>
<ul>
<li>With Indents</li>
<ul>
<li>Lots of <strong>Indents</strong></li>
</ul>
<li>And back</li>
</ul>
</ul>

<div>
<ol>
<li>Ordered Lists</li>
<ol>
<li>With indents</li>
<ol>
<li>Ad more indents</li>
</ol>
</ol>
<li>Test2</li>
</ol>
<div>
<img src="https://www.google.co.uk/images/srpr/logo3w.png"
style="cursor: default; float: none; margin: 0px; "
alt="Images! This is the 'alt' attribute"><br>
</div>
</div>

<table id="table49885" border=1>
<tbody>
<tr>
<td class="">TABLES&nbsp;</td>

<td><b>with styles</b>&nbsp;</td>

<td><i><u>and stuff</u></i>&nbsp;</td>

<td>cool eh?&nbsp;</td>
</tr>

<tr>
<td>
<ul>
<li>We can have these here&nbsp;<br>
</li>
</ul></td>

<td class="">
<ol>
<li>&nbsp;and these<br>
</li>
<ol>
<li>here</li>
</ol>
</ol></td>

<td class="">&nbsp;<img src="https://www.google.co.uk/images/srpr/logo3w.png" style="cursor: default; "></td>

<td class="current">&nbsp;meh</td>
</tr>
</tbody>
</table>
Loading

0 comments on commit 9c091e6

Please sign in to comment.