Japanese characters that resemble each other

From Poké Sources

Here are some Japanese characters, as well as some translingual characters often used in Japan, that look a lot like each other. This information may be useful when doing Japanese OCR.

Katakana that resemble each other[edit]

The following katakana were sometimes confused by my OCR program:

(n)
(so)
Longest stroke goes in a different direction.

(shi)
(tsu)
Longest stroke goes in a different direction. Also, the two short strokes (the 'eyes') are different.

(ku)
(ke)
Lowest stroke connects differently.

(gu)
(ge)
Lowest stroke connects differently.

Katakana, hiragana, and kanji that resemble each other[edit]

The following katakana, hiragana, and kanji were sometimes confused by my OCR program:

katakana: (ta)
kanji: (yu)
Diagonal line in the middle differs.

katakana: (ka)
kanji: (chikara or riki)
The kanji is bigger.

katakana: (ka)
hiragana: (ka)
katakana: (ga)
hiragana: (ga)
The dakuten (little lines on the right) differ, and the bents right underneath it differ.

katakana: (mu)
kanji: (pronunciation varies)
Very similar. The kanji is slightly bigger, and the strokes are thicker. The kanji is an hyōgaiji, an uncommon type of kanji. My OCR program mixed these two up quite a lot.

kanji: (ma)
kanji: (bun)
Inner parts differ.

Vertical lines[edit]

Japanese symbol: (chōonpu)
Japanese kanji: (ichi)
Translingual symbol: (dash)
Translingual symbol: - (hyphen-minus)

The kanji (ichi) means one. The hyphen-minus symbol can be found on all standard Western keyboards, and is used in substraction.

The OCR software I used, ABBYY FineReader, had a nasty habit of mixing up the chōonpu (ー) and the dash (—), resulting in this:

ポケットモンスター
was translated by GT as “Pokemon”, while
ポケットモンスタ—
was translated by GT as “Pokemon Star”.

Here is a full list of all vertical line characters: https://en.wiktionary.org/wiki/Appendix:Variations_of_%22-%22

Also, the kanji (ni) and (san), meaning two and three, were sometimes erroneously recognized as two or three consecutive chōonpu, because Japanese text in books is usually written vertically.

Ten[edit]

(ju)
+ (plus)

The kanji 十 means "ten", or 10.
+ is the universal plus sign.

Big/small kana[edit]

Big/small katakana:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Of course, every good OCR program should know how to differentiate between big and small characters.

These big/small katakana resemble each other (see above):
and
and

These big/small katakana have kanji that resemble them (see above):
(katakana, small) and (kanji, big)
(katakana, small) and (kanji, bigger... slightly)