| 美容室専門のアプリ開発　株式会社カッチブー

OCRのCode

以下のコードは、文字のバウンディングボックスのリストを行単位でソートし、それらを含む行のリストを返す関数 get_sorted_lines です。

コードをいくつかの処理単位に分割し、それぞれの目的を説明します。

関数定義と、bounds という空のリストの初期化:

def get_sorted_lines(response):
    document = response.full_text_annotation
    bounds = []

入力された response の full_text_annotation を用いて、各ページ、ブロック、段落、単語、そして記号をイテレートするためのネストされたループの構造:

    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:

各記号の左上の頂点の座標とテキストを取得し、それらとバウンディングボックスを bounds リストに追加:

                        x = symbol.bounding_box.vertices[0].x
                        y = symbol.bounding_box.vertices[0].y
                        text = symbol.text
                        bounds.append([x, y, text, symbol.bounding_box])

bounds リストを y 座標に基づいてソート:

    bounds.sort(key=lambda x: x[1])

各バウンディングボックスを処理し、行に分割するための変数の初期化:

    old_y = -1
    line = []
    lines = []
    threshold = 30

バウンディングボックスをイテレートし、行に分割する:

    for bound in bounds:
        x = bound[0]
        y = bound[1]
        if old_y == -1:
            old_y = y
        elif old_y - threshold <= y <= old_y + threshold:
            old_y = y
        else:
            old_y = -1
            line.sort(key=lambda x: x[0])
            lines.append(line)
            line = []
        line.append(bound)

最後の行を処理し、行単位でソートされたリストを返す:

    line.sort(key=lambda x: x[0])
    lines.append(line)
    return lines

このコードは、入力された response のテキストアノテーションに基づいて、バウンディングボックスのリストを行に分割し、行単位でソートして返す関数です。

これをわかりやすくより抽象的に書き直す。

def extract_bounds(document):
    bounds = []
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        x = symbol.bounding_box.vertices[0].x
                        y = symbol.bounding_box.vertices[0].y
                        text = symbol.text
                        bounds.append([x, y, text, symbol.bounding_box])
    return bounds


def sort_bounds_by_y(bounds):
    return sorted(bounds, key=lambda x: x[1])


def is_in_same_line(y1, y2, threshold=30):
    return y1 - threshold <= y2 <= y1 + threshold


def split_bounds_into_lines(bounds):
    old_y = -1
    line = []
    lines = []

    for bound in bounds:
        x, y = bound[0], bound[1]

        if old_y == -1:
            old_y = y
        elif is_in_same_line(old_y, y):
            old_y = y
        else:
            line.sort(key=lambda x: x[0])
            lines.append(line)
            line = []
            old_y = -1

        line.append(bound)

    line.sort(key=lambda x: x[0])
    lines.append(line)
    return lines


def get_sorted_lines(response):
    document = response.full_text_annotation
    bounds = extract_bounds(document)
    sorted_bounds = sort_bounds_by_y(bounds)
    lines = split_bounds_into_lines(sorted_bounds)
    return lines

このリファクタリングでは、以下のような関数に分割しました。

extract_bounds: 文書からバウンディングボックスのリストを抽出します。

sort_bounds_by_y: バウンディングボックスをy座標でソートします。

is_in_same_line: 2つのy座標が同じ行にあるかどうかを判定します。

split_bounds_into_lines: ソートされたバウンディングボックスを行に分割します。

OCRのCode

目次にハイライト

OCRのCode

正規表現について（日付）

GitHub初心者必見！Visual Studio Codeでリポジトリを簡単にプッシュする方法