![]() Returns a tuple containing the coordinates of the bounding box aroundĪn element pt_from_pixel ( pxl ) → Rect ¶ classmethod element_coordinates ( element : Element ) → Rect ¶ Returns a tuple containing the baseline slope and intercept. classmethod baseline ( element : Element ) → tuple ¶ HocrTransform ( *, hocr_filename : str | Path, dpi : float ) ¶Ī class for converting documents from the hOCR format. An existing link at theĭestination is removed. Used since symlinks may require administrator privileges. Think of this as copying input_file to soft_link_name with less overhead. safe_symlink ( input_file : PathLike, soft_link_name : PathLike )Ĭreate a symbolic link at soft_link_name, which references input_file. Remove all log handlers, usually used in a child process. Get one-based page number implied by filename (000002.pdf -> 2) ocrmypdf.helpers. page_number ( input_file : PathLike ) → int monotonic ( seq : Sequence ) → boolĭoes this sequence increase monotonically? ocrmypdf.helpers. Is this is an iterable type, other than a string? ocrmypdf.helpers. is_iterable_notstr ( thing : Any ) → bool We intend to write to the output file if and only if we succeed andĬan replace it atomically. Intentionally racy test if target is writable. ![]() is_file_writable ( test_file : PathLike ) → bool clamp ( n, smallest, largest )Ĭlamps the value of n to between smallest and largest. check_pdf ( input_file : Path ) → boolĬheck if a PDF complies with the PDF specification.Ĭhecks for proper formatting and proper linearization. Resolution objects are considered “equal” for = purposes if they areĮqual to a reasonable tolerance. The number of pixels per inch in each 2D direction. NeverRaiseĪn exception that is never raised class ocrmypdf.helpers. exit_code = 9 ¶ message = 'Error occurred while parsing a Tesseract configuration file' ¶ exception ocrmypdf.exceptions. exit_code = 7 ¶ exception ocrmypdf.exceptions. SubprocessOutputError ¶Ī subprocess returned an unexpected error. exit_code = 6 ¶ exception ocrmypdf.exceptions. exit_code = 2 ¶ message = 'Failed to merge PDF image layer with OCR layer\n\nUsually this happens because the input PDF file is malformed and\nocrmypdf cannot correct the problem on its own.\n\nTry using\n ocrmypdf -pdf-renderer sandwich \n' ¶ exception ocrmypdf.exceptions. exit_code = 5 ¶ exception ocrmypdf.exceptions. OutputFileAccessError ¶Ĭannot access the intended output file path. exit_code = 3 ¶ exception ocrmypdf.exceptions. MissingDependencyError ¶Ī third-party dependency is missing. exit_code = 2 ¶ exception ocrmypdf.exceptions. exit_code = 15 ¶ message = '' ¶ exception ocrmypdf.exceptions. ExitCodeException ¶Īn exception which should return an exit code with sys.exit(). already_done_ocr = 6 ¶ bad_args = 1 ¶ child_process_error = 7 ¶ ctrl_c = 130 ¶ encrypted_pdf = 8 ¶ file_access_error = 5 ¶ input_file = 2 ¶ invalid_config = 9 ¶ invalid_output_pdf = 4 ¶ missing_dependency = 3 ¶ ok = 0 ¶ other_error = 15 ¶ pdfa_conversion_failed = 10 ¶ exception ocrmypdf.exceptions. The encryption must be removed to\nperform OCR.\n\nFor information about this PDF's security use\n qpdf -show-encryption infilename\n\nYou can remove the encryption using\n qpdf -decrypt ] infilename\n" ¶ class ocrmypdf.exceptions. exit_code = 8 ¶ message = "Input PDF is encrypted. Missing information about input image DPI. exit_code = 1 ¶ exception ocrmypdf.exceptions. Invalid arguments on the command line or API. PluginManager for processing the current PDF. pdfinfo : PdfInfo ¶ĭetailed data for this PDF. The path will be in a temporary folder that is common for all processing Generate a Path for an intermediate file involved in processing. Holds the context for a particular run of the pipeline. PdfContext ( options : Namespace, work_folder : Path, origin : Path, pdfinfo : PdfInfo, plugin_manager ) ¶ The specified options for processing this PDF. The path will be based in a common temporary folder and have a prefix based Generate a Path for a file that is part of processing this page. Must be pickable, so stores only intrinsic/simple data elements or thoseĬapable of their serializing themselves via _getstate_. ![]() PageContext ( pdf_context : PdfContext, pageno ) ¶ Should mainly of interest to plugin developers. This page summarizes the rest of the public API.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |