I can see the issue here. every PDF document is a piece of software code written in the PostScript language. To get to each paragraph of text and each embedded image of text, you have to parse the ...