tesseract 4.1.1
Loading...
Searching...
No Matches
cjkpitch.h
Go to the documentation of this file.
1
2// File: cjkpitch.h
3// Description: Code to determine fixed pitchness and the pitch if fixed,
4// for CJK text.
5// Copyright 2011 Google Inc. All Rights Reserved.
6// Author: takenaka@google.com (Hiroshi Takenaka)
7// Created: Mon Jun 27 12:48:35 JST 2011
8//
9// Licensed under the Apache License, Version 2.0 (the "License");
10// you may not use this file except in compliance with the License.
11// You may obtain a copy of the License at
12// http://www.apache.org/licenses/LICENSE-2.0
13// Unless required by applicable law or agreed to in writing, software
14// distributed under the License is distributed on an "AS IS" BASIS,
15// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16// See the License for the specific language governing permissions and
17// limitations under the License.
18//
20#ifndef CJKPITCH_H_
21#define CJKPITCH_H_
22
23#include "blobbox.h"
24
25// Function to test "fixed-pitchness" of the input text and estimating
26// character pitch parameters for it, based on CJK fixed-pitch layout
27// model.
28//
29// This function assumes that a fixed-pitch CJK text has following
30// characteristics:
31//
32// - Most glyphs are designed to fit within the same sized square
33// (imaginary body). Also they are aligned to the center of their
34// imaginary bodies.
35// - The imaginary body is always a regular rectangle.
36// - There may be some extra space between character bodies
37// (tracking).
38// - There may be some extra space after punctuations.
39// - The text is *not* space-delimited. Thus spaces are rare.
40// - Character may consists of multiple unconnected blobs.
41//
42// And the function works in two passes. On pass 1, it looks for such
43// "good" blobs that has the pitch same pitch on the both side and
44// looks like a complete CJK character. Then estimates the character
45// pitch for every row, based on those good blobs. If we couldn't find
46// enough good blobs for a row, then the pitch is estimated from other
47// rows with similar character height instead.
48//
49// Pass 2 is an iterative process to fit the blobs into fixed-pitch
50// character cells. Once we have estimated the character pitch, blobs
51// that are almost as large as the pitch can be considered to be
52// complete characters. And once we know that some characters are
53// complete characters, we can estimate the region occupied by its
54// neighbors. And so on.
55//
56// We repeat the process until all ambiguities are resolved. Then make
57// the final decision about fixed-pitchness of each row and compute
58// pitch and spacing parameters.
59//
60// (If a row is considered to be proportional, pitch_decision for the
61// row is set to PITCH_CORR_PROP and the later phase
62// (i.e. Textord::to_spacing()) should determine its spacing
63// parameters)
64//
65// This function doesn't provide all information required by
66// fixed_pitch_words() and the rows need to be processed with
67// make_prop_words() even if they are fixed pitched.
68void compute_fixed_pitch_cjk(ICOORD page_tr, // top right
69 TO_BLOCK_LIST *port_blocks); // input list
70
71#endif // CJKPITCH_H_
void compute_fixed_pitch_cjk(ICOORD page_tr, TO_BLOCK_LIST *port_blocks)
Definition: cjkpitch.cpp:1040
integer coordinate
Definition: points.h:32