Convert HTML to PDF by iText & XMLWorker with Polish Letters

Convert HTML to PDF by iText & XMLWorker with Polish Letters



I've got a string with an example in it - it works really great, but when I'm adding polish letters, they're gone. I tried something like this:


byte byteArray = str.getBytes(Charset.forName("UTF-8"));
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray);
worker.parseXHtml(pdfWriter, document, byteArrayInputStream, Charset.forName("UTF-8"));



but it doesn't change anything. How to add polish letters?



EDIT: It still doesn't work.



Code:


document.open();

XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
String str = "<html><head></head><body style="font-size:12.0pt; font-family:Times New Roman">"+
"<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
"<h1>Show your support</h1>" +
"<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
"<p>TEST POLSKICH ZNAKÓW: ĄąćCÓ󣳯żŹźĘę</p>" +
"<hr/>" +
"<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
"<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
"<p>Donate using PayPalŽ</p>" +
"<p>Contributions via PayPal are accepted in any amount</p>" +
"<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
"<td style='background-color:red;'>Javascript HowTo</td></tr>" +
"<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
"</body></html>";

byte byteArray = str.getBytes(Charset.forName("UTF-8"));
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray);
worker.parseXHtml(pdfWriter, document, byteArrayInputStream, Charset.forName("UTF-8"));

document.close();



Maybe someone will find a bug.





Maybe you are using a font that doesn't know how to draw Polish glyphs. Check your PDF (Document Properties > Fonts). Which fonts do you see? Do you see the Standard Type 1 font Helvetica? That font doesn't support Polish characters.
– Bruno Lowagie
Mar 17 '15 at 15:06





@BrunoLowagie Yes Sir, you have right. Helvetica Type 1. So How can I change font by XMLWorker parser?
– KurdTt-
Mar 17 '15 at 15:15





There are different examples introducing fonts here: itextpdf.com/sandbox/xmlworker
– Bruno Lowagie
Mar 17 '15 at 15:33





I found something like this: itextpdf.com/sandbox/xmlworker/D06_ParseHtmlFonts but I don't know how to get font on Android. I should use FontFactory.getFont("arial"); ? I don't know if I understood correctly. But thank You! I'll try it.
– KurdTt-
Mar 17 '15 at 15:45




2 Answers
2



I have taken your sample HTML and I have used it to create the ParseHtml2 example. The resulting PDF, html_2.pdf, looks like this:



enter image description here



At first sight, I don't see any issues with the Polish characters.



The code I used looks like this:


public void createPdf(String file) throws IOException, DocumentException
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
String str = "<html><head></head><body style="font-size:12.0pt; font-family:Times New Roman">"+
"<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
"<h1>Show your support</h1>" +
"<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
"<p>TEST POLSKICH ZNAKÓW: u0104u0105u0106u0107u00d3u00f3u0141u0142u0179u017au017bu017cu017du017eu0118u0119</p>" +
"<hr/>" +
"<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
"<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
"<p>Donate using PayPalu017d</p>" +
"<p>Contributions via PayPal are accepted in any amount</p>" +
"<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
"<td style='background-color:red;'>Javascript HowTo</td></tr>" +
"<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
"</body></html>";

XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
InputStream is = new ByteArrayInputStream(str.getBytes(StandardCharsets.UTF_8));
worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));
// step 5
document.close();



Note that you have defined Times New Roman as the font. It is essential that your OS has access to a font with that name, otherwise you'll still end up with Helvetica.


Times New Roman



Also be aware that using non-ASCII characters in source code is considered a crime against good taste. Source code is stored as a text file, but using which encoding? There is no guarantee that your file will be stored as UTF-8, there is no guarantee that a compiler will read it as UTF-8, there is no guarantee that a versioning system will accept UTF-8,... Hence I replaced all UTF-8 characters by their unicode value which allows me to keep the source file in ASCII.



I have taken Bruno sample HTML and changed that function for C# users. I am using PdfFileName as a property to get and set the file name.


public string PdfFileName get; set;
public void CreatePdf()

// replace this code with you full pdf name which you want to create
PdfFileName = EU.Master_Data_Utility.obj.Get_Current_DateTimeInteger(_connFlag) + ".pdf";

String str = "<html><head></head><body style="font-size:12.0pt; font-family:Times New Roman">" +
"<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
"<h1>Show your support</h1>" +
"<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
"<p>TEST POLSKICH ZNAKÓW: u0104u0105u0106u0107u00d3u00f3u0141u0142u0179u017au017bu017cu017du017eu0118u0119</p>" +
"<hr/>" +
"<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
"<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
"<p>Donate using PayPalu017d</p>" +
"<p>Contributions via PayPal are accepted in any amount</p>" +
"<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
"<td style='background-color:red;'>Javascript HowTo</td></tr>" +
"<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
"</body></html>";

StringReader sr = new StringReader(str.ToString());
Document doc = new Document(PageSize.A4, 10f, 10f, 10f, 10f);
PdfWriter pdfWriter = PdfWriter.GetInstance(doc, new FileStream(Server.MapPath(PdfFileName), FileMode.Create));
doc.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(pdfWriter, doc, sr);
doc.Close();
// Created a new function to open created file
OpenPDFFile();


protected void OpenPDFFile()

//Open the PDF file
Process.Start(Server.MapPath(PdfFileName));




Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



Would you like to answer one of these unanswered questions instead?

Popular posts from this blog

ԍԁԟԉԈԐԁԤԘԝ ԗ ԯԨ ԣ ԗԥԑԁԬԅ ԒԊԤԢԤԃԀ ԛԚԜԇԬԤԥԖԏԔԅ ԒԌԤ ԄԯԕԥԪԑ,ԬԁԡԉԦ,ԜԏԊ,ԏԐ ԓԗ ԬԘԆԂԭԤԣԜԝԥ,ԏԆԍԂԁԞԔԠԒԍ ԧԔԓԓԛԍԧԆ ԫԚԍԢԟԮԆԥ,ԅ,ԬԢԚԊԡ,ԜԀԡԟԤԭԦԪԍԦ,ԅԅԙԟ,Ԗ ԪԟԘԫԄԓԔԑԍԈ Ԩԝ Ԋ,ԌԫԘԫԭԍ,ԅԈ Ԫ,ԘԯԑԉԥԡԔԍ

How to change the default border color of fbox? [duplicate]

ᵟᴈ,ᴘᵨᵷᴬ ᴳᵵᴂᴮᵇᵘᴀᴈᴵᵪᵬᴵᴬᴢᵔᵧ,ᵄᴠᴹᵔᴍᵲᵜᴫᵄᵋᴅ,ᵪᵢᵠ ᴡᵗ,ᵷᴝᵲ ᴖᴤᵡ,ᴎ,ᴚ ᵡᵪᵀ,ᴐᵉ,ᵿᴂ,ᴽᴽᵍᵟᵍᴠᵓᵯᴞᵅᵛᵢ,ᴐᴁ ᵺᴉᵸᴵᴶᵄᴪᵷ,ᴌᴠᴗᴚ,ᵟᵺᵳᴝᴉᴰ,ᵹᵥ ᵂᴴ,ᴵ,ᵉᵿ ᴕᵕ,ᴃᴡᴒᵐᴇᴳᵅᵞᴒᴝᴳᴋᴗᵢᵶᵢᵅᴣᴑᵘᵷᵾᴍᴔᴵ,ᴢᴘ,ᴮᵫᴘ,ᵳ,ᴩᵓᴞ