pdf2htmlEX: converts PDF to HTML without losing format

ocean

11年前

There are bascially 2 types of pdf-to-html converters:
One is roughly a pdf-to-text converter with a few pre-defined formats in HTML.
The other is render-everything-as-images converter, which loses all text and generated huge files.

But pdf2htmlEX takes advatanges of both, retaining both Text and Styling.
Features:
1.Extract and embed fonts from PDF
2.Optimizing for web while making sure render is precise
3.Non-text objects are rendered as images.
4.Single-file output mode — I know you hate spearated font/image files

To compile & install
grab a recent poppler (>=0.20.3), make sure ‘–enable-xpdf-headers’ is used for configure
grab the latest git version of fontforge https://github.com/fontforge/fontforge, because I submitted a few features/bugs for pdf2htmlEX
the boost c++ library. (See detailed depended components in the project home page)
cmake
GCC that supports c++11

Any suggestion, fork/star-at-gihub, bug-report is appreciated.

Demo comes first:
http://coolwanglu.github.com/pdf2htmlEX/demo/demo.html

Another (with CJK):
http://coolwanglu.github.com/pdf2htmlEX/demo/chn.html

Home page:
https://github.com/coolwanglu/pdf2htmlEX

Ubuntu PPA
https://launchpad.net/~coolwanglu/+archive/pdf2htmlex