Package org.apache.pdfbox.util
Class PDFStreamEngine
- java.lang.Object
-
- org.apache.pdfbox.util.PDFStreamEngine
-
- Direct Known Subclasses:
PageDrawer
,PDFImageWriter
,PDFMarkedContentExtractor
,PDFTextStripper
,Type3StreamParser
public class PDFStreamEngine extends java.lang.Object
This class will run through a PDF content stream and execute certain operations and provide a callback interface for clients that want to do things with the stream. See the PDFTextStripper class for an example of how to use this class.- Version:
- $Revision: 1.38 $
- Author:
- Ben Litchfield
-
-
Constructor Summary
Constructors Constructor Description PDFStreamEngine()
Constructor.PDFStreamEngine(java.util.Properties properties)
Constructor with engine properties.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Map<java.lang.String,PDColorSpace>
getColorSpaces()
PDPage
getCurrentPage()
Get the current page that is being processed.java.util.Map<java.lang.String,PDFont>
getFonts()
java.util.Stack<PDGraphicsState>
getGraphicsStack()
PDGraphicsState
getGraphicsState()
java.util.Map<java.lang.String,PDExtendedGraphicsState>
getGraphicsStates()
PDResources
getResources()
Matrix
getTextLineMatrix()
Matrix
getTextMatrix()
int
getTotalCharCnt()
Get the total number of characters in the doc (including ones that could not be mapped).int
getValidCharCnt()
Get the total number of valid characters in the doc that could be decoded in processEncodedText().java.util.Map<java.lang.String,PDXObject>
getXObjects()
protected java.lang.String
inspectFontEncoding(java.lang.String str)
A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.boolean
isForceParsing()
Indicates if force parsing is activated.void
processEncodedText(byte[] string)
Process encoded text from the PDF Stream.void
processOperator(java.lang.String operation, java.util.List<COSBase> arguments)
This is used to handle an operation.protected void
processOperator(PDFOperator operator, java.util.List<COSBase> arguments)
This is used to handle an operation.void
processStream(PDPage aPage, PDResources resources, COSStream cosStream)
This will process the contents of the stream.void
processSubStream(PDPage aPage, PDResources resources, COSStream cosStream)
Process a sub stream of the current stream.protected void
processTextPosition(TextPosition text)
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.void
registerOperatorProcessor(java.lang.String operator, OperatorProcessor op)
Register a custom operator processor with the engine.void
resetEngine()
This method must be called between processing documents.void
setColorSpaces(java.util.Map<java.lang.String,PDColorSpace> value)
void
setFonts(java.util.Map<java.lang.String,PDFont> value)
void
setForceParsing(boolean forceParsingValue)
Enable/Disable force parsing.void
setGraphicsStack(java.util.Stack<PDGraphicsState> value)
void
setGraphicsState(PDGraphicsState value)
void
setGraphicsStates(java.util.Map<java.lang.String,PDExtendedGraphicsState> value)
void
setTextLineMatrix(Matrix value)
void
setTextMatrix(Matrix value)
-
-
-
Constructor Detail
-
PDFStreamEngine
public PDFStreamEngine()
Constructor.
-
PDFStreamEngine
public PDFStreamEngine(java.util.Properties properties) throws java.io.IOException
Constructor with engine properties. The property keys are all PDF operators, the values are class names used to execute those operators. An empty value means that the operator will be silently ignored.- Parameters:
properties
- The engine properties.- Throws:
java.io.IOException
- If there is an error setting the engine properties.
-
-
Method Detail
-
isForceParsing
public boolean isForceParsing()
Indicates if force parsing is activated.- Returns:
- true if force parsing is active
-
setForceParsing
public void setForceParsing(boolean forceParsingValue)
Enable/Disable force parsing.- Parameters:
forceParsingValue
- true activates force parsing
-
registerOperatorProcessor
public void registerOperatorProcessor(java.lang.String operator, OperatorProcessor op)
Register a custom operator processor with the engine.- Parameters:
operator
- The operator as a string.op
- Processor instance.
-
resetEngine
public void resetEngine()
This method must be called between processing documents. The PDFStreamEngine caches information for the document between pages and this will release the cached information. This only needs to be called if processing a new document.
-
processStream
public void processStream(PDPage aPage, PDResources resources, COSStream cosStream) throws java.io.IOException
This will process the contents of the stream.- Parameters:
aPage
- The page.resources
- The location to retrieve resources.cosStream
- the Stream to execute.- Throws:
java.io.IOException
- if there is an error accessing the stream.
-
processSubStream
public void processSubStream(PDPage aPage, PDResources resources, COSStream cosStream) throws java.io.IOException
Process a sub stream of the current stream.- Parameters:
aPage
- The page used for drawing.resources
- The resources used when processing the stream.cosStream
- The stream to process.- Throws:
java.io.IOException
- If there is an exception while processing the stream.
-
processTextPosition
protected void processTextPosition(TextPosition text)
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.- Parameters:
text
- The text to be processed.
-
inspectFontEncoding
protected java.lang.String inspectFontEncoding(java.lang.String str)
A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.- Parameters:
str
- The string to be processed.
-
processEncodedText
public void processEncodedText(byte[] string) throws java.io.IOException
Process encoded text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.- Parameters:
string
- The encoded text- Throws:
java.io.IOException
- If there is an error processing the string
-
processOperator
public void processOperator(java.lang.String operation, java.util.List<COSBase> arguments) throws java.io.IOException
This is used to handle an operation.- Parameters:
operation
- The operation to perform.arguments
- The list of arguments.- Throws:
java.io.IOException
- If there is an error processing the operation.
-
processOperator
protected void processOperator(PDFOperator operator, java.util.List<COSBase> arguments) throws java.io.IOException
This is used to handle an operation.- Parameters:
operator
- The operation to perform.arguments
- The list of arguments.- Throws:
java.io.IOException
- If there is an error processing the operation.
-
getColorSpaces
public java.util.Map<java.lang.String,PDColorSpace> getColorSpaces()
- Returns:
- Returns the colorSpaces.
-
getXObjects
public java.util.Map<java.lang.String,PDXObject> getXObjects()
- Returns:
- Returns the colorSpaces.
-
setColorSpaces
public void setColorSpaces(java.util.Map<java.lang.String,PDColorSpace> value)
- Parameters:
value
- The colorSpaces to set.
-
getFonts
public java.util.Map<java.lang.String,PDFont> getFonts()
- Returns:
- Returns the fonts.
-
setFonts
public void setFonts(java.util.Map<java.lang.String,PDFont> value)
- Parameters:
value
- The fonts to set.
-
getGraphicsStack
public java.util.Stack<PDGraphicsState> getGraphicsStack()
- Returns:
- Returns the graphicsStack.
-
setGraphicsStack
public void setGraphicsStack(java.util.Stack<PDGraphicsState> value)
- Parameters:
value
- The graphicsStack to set.
-
getGraphicsState
public PDGraphicsState getGraphicsState()
- Returns:
- Returns the graphicsState.
-
setGraphicsState
public void setGraphicsState(PDGraphicsState value)
- Parameters:
value
- The graphicsState to set.
-
getGraphicsStates
public java.util.Map<java.lang.String,PDExtendedGraphicsState> getGraphicsStates()
- Returns:
- Returns the graphicsStates.
-
setGraphicsStates
public void setGraphicsStates(java.util.Map<java.lang.String,PDExtendedGraphicsState> value)
- Parameters:
value
- The graphicsStates to set.
-
getTextLineMatrix
public Matrix getTextLineMatrix()
- Returns:
- Returns the textLineMatrix.
-
setTextLineMatrix
public void setTextLineMatrix(Matrix value)
- Parameters:
value
- The textLineMatrix to set.
-
getTextMatrix
public Matrix getTextMatrix()
- Returns:
- Returns the textMatrix.
-
setTextMatrix
public void setTextMatrix(Matrix value)
- Parameters:
value
- The textMatrix to set.
-
getResources
public PDResources getResources()
- Returns:
- Returns the resources.
-
getCurrentPage
public PDPage getCurrentPage()
Get the current page that is being processed.- Returns:
- The page being processed.
-
getValidCharCnt
public int getValidCharCnt()
Get the total number of valid characters in the doc that could be decoded in processEncodedText().- Returns:
- The number of valid characters.
-
getTotalCharCnt
public int getTotalCharCnt()
Get the total number of characters in the doc (including ones that could not be mapped).- Returns:
- The number of characters.
-
-