PostScript & GhostScript

 

PostScript is a page description language (PDL) developed by Adobe Systems. It is primarily a language for printing documents on laser printers, but it can be adapted to produce images on other types of devices. PostScript is the standard for desktop publishing.

All major printer manufacturers make printers that contain or can be loaded with Postscript software, which also runs on all major operating system platforms. A Postscript file can be identified by its “.ps” suffix.

Users can convert Postscript files to the Adobe Portable Document Format ( PDF )

PostScript is an object-oriented language, meaning that it treats images, including fonts, as collections of geometrical objects rather than as bit maps.

The principal advantage of object-oriented (vector) graphics over bit-mapped graphics is that object-oriented images take advantage of high-resolution output devices whereas bit-mapped images do not. A PostScript drawing looks much better when printed on a 600-dpiprinter than on a 300-dpi printer. A bit-mapped image looks the same on both printers.

 

Ghostscript is an interpreter for PostScript and Portable Document Format (PDF) files. Ghostscript can read a PostScript or PDF file and display the results on the screen or convert them into a form you can print on a non-PostScript printer.

Text extraction from pdf file with GhostScript

Windows:
gswin64c -sDEVICE=txtwrite -o output.txt input.pdf
Linux:
gs -sDEVICE=txtwrite -o output.txt input.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s