Solved: split a pdf into single pages - Qlik Community

Anonymous · ‎2019-10-15

Hi,

I need a way to split PDFs into their single Pages within a Talend job to further process them.

Does anybody has a good solution for this?

Thanks

Anonymous · ‎2019-10-15

Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.

Ive written a small routine:

package routines;
import java.io.File;
import java.io.IOException;
import java.util.List; 
import java.util.Iterator;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter; 


public static void splitPdf(String arg, String directory) throws IOException
    {
    	PDDocument document = PDDocument.load(new File(arg));
    	Splitter splitter = new Splitter();
    	List<PDDocument> Pages = splitter.split(document);
    	Iterator<PDDocument> iterator = Pages.listIterator();
    		
    	int i = 1;
    	while (iterator.hasNext()) {
    		PDDocument pd = iterator.next();
    		pd.save(directory+ i + ".pdf");
    		i++;
    	}
    	document.close();
    }

It takes the PDF given and extracts every single page to a directory.

View solution in original post

Anonymous · ‎2019-10-15

Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.

Ive written a small routine:

package routines;
import java.io.File;
import java.io.IOException;
import java.util.List; 
import java.util.Iterator;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter; 


public static void splitPdf(String arg, String directory) throws IOException
    {
    	PDDocument document = PDDocument.load(new File(arg));
    	Splitter splitter = new Splitter();
    	List<PDDocument> Pages = splitter.split(document);
    	Iterator<PDDocument> iterator = Pages.listIterator();
    		
    	int i = 1;
    	while (iterator.hasNext()) {
    		PDDocument pd = iterator.next();
    		pd.save(directory+ i + ".pdf");
    		i++;
    	}
    	document.close();
    }

It takes the PDF given and extracts every single page to a directory.

split a pdf into single pages

Talend Data Integration

v6.x

v7.x