{"id":3903,"date":"2013-04-24T00:58:10","date_gmt":"2013-04-23T16:58:10","guid":{"rendered":"https:\/\/www.icocean.com\/wp\/?p=3903"},"modified":"2013-04-24T00:58:10","modified_gmt":"2013-04-23T16:58:10","slug":"pdf2htmlex-converts-pdf-to-html-without-losing-format","status":"publish","type":"post","link":"https:\/\/www.icocean.com\/blog\/?p=3903","title":{"rendered":"pdf2htmlEX: converts PDF to HTML without losing format"},"content":{"rendered":"<p>There are bascially 2 types of pdf-to-html converters:<br \/>\nOne is roughly a pdf-to-text converter with a few pre-defined formats in HTML.<br \/>\nThe other is render-everything-as-images converter, which loses all text and generated huge files.<\/p>\n<p>But pdf2htmlEX takes advatanges of both, retaining both Text and Styling.<br \/>\nFeatures:<br \/>\n1.Extract and embed fonts from PDF<br \/>\n2.Optimizing for web while making sure render is precise<br \/>\n3.Non-text objects are rendered as images.<br \/>\n4.Single-file output mode &#8212; I know you hate spearated font\/image files<!--more--><\/p>\n<p>To compile &#038; install<br \/>\ngrab a recent poppler (>=0.20.3), make sure &#8216;&#8211;enable-xpdf-headers&#8217; is used for configure<br \/>\ngrab the latest git version of fontforge https:\/\/github.com\/fontforge\/fontforge, because I submitted a few features\/bugs for pdf2htmlEX<br \/>\nthe boost c++ library. (See detailed depended components in the project home page)<br \/>\ncmake<br \/>\nGCC that supports c++11<\/p>\n<p>Any suggestion, fork\/star-at-gihub, bug-report is appreciated.<\/p>\n<p>Demo comes first:<br \/>\nhttp:\/\/coolwanglu.github.com\/pdf2htmlEX\/demo\/demo.html<\/p>\n<p>Another (with CJK):<br \/>\nhttp:\/\/coolwanglu.github.com\/pdf2htmlEX\/demo\/chn.html<\/p>\n<p>Home page:<br \/>\nhttps:\/\/github.com\/coolwanglu\/pdf2htmlEX<\/p>\n<p>Ubuntu PPA<br \/>\nhttps:\/\/launchpad.net\/~coolwanglu\/+archive\/pdf2htmlex<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are bascially 2 types of pdf-to-html converters:  <a href='https:\/\/www.icocean.com\/blog\/?p=3903' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[2952,2346,2427,1967,2429],"class_list":["post-3903","post","type-post","status-publish","format-standard","hentry","category-4","tag-convert","tag-html","tag-pdf","tag-1967","tag-2429","category-4-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3903","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3903"}],"version-history":[{"count":0,"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3903\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.icocean.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}