Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I need a way to split PDFs into their single Pages within a Talend job to further process them.
Does anybody has a good solution for this?
Thanks
Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.
Ive written a small routine:
package routines;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter;
public static void splitPdf(String arg, String directory) throws IOException
{
PDDocument document = PDDocument.load(new File(arg));
Splitter splitter = new Splitter();
List<PDDocument> Pages = splitter.split(document);
Iterator<PDDocument> iterator = Pages.listIterator();
int i = 1;
while (iterator.hasNext()) {
PDDocument pd = iterator.next();
pd.save(directory+ i + ".pdf");
i++;
}
document.close();
}
It takes the PDF given and extracts every single page to a directory.
Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.
Ive written a small routine:
package routines;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter;
public static void splitPdf(String arg, String directory) throws IOException
{
PDDocument document = PDDocument.load(new File(arg));
Splitter splitter = new Splitter();
List<PDDocument> Pages = splitter.split(document);
Iterator<PDDocument> iterator = Pages.listIterator();
int i = 1;
while (iterator.hasNext()) {
PDDocument pd = iterator.next();
pd.save(directory+ i + ".pdf");
i++;
}
document.close();
}
It takes the PDF given and extracts every single page to a directory.