Batch conversion of Simplified Chinese articles to Traditional Chinese

Conversion between simplified and traditional Chinese is a common requirement, such as multilingual support for websites, multilingual operation manuals, input methods, etc. Recently, I found that some traditional Chinese users visit this site, so it is necessary to provide traditional Chinese language support.

The simplified and traditional conversion operation under the Linux operating system is as follows:

1. Install opencc

1sudo apt-get install opencc

opencc is an open source project, thanks to the selfless dedication of relevant personnel.

2. Example of single file conversion:

1➜  ~ cat simplified.txt
2静夜思
3床前明月光,疑是地上霜。
4举头望明月,低头思故乡。
5➜ ~ opencc -i simplified.txt -o traditional.txt -c s2t.json
6➜ ~ cat traditional.txt
7靜夜思
8牀前明月光,疑是地上霜。
9舉頭望明月,低頭思故鄉。

It's that simple, the Simplified-Traditional conversion of 《静夜思》is completed.

Parameter description

-i input file

-o output file

-c configuration file

  • s2t.json Simplified Chinese to Traditional Chinese Simplified to Traditional Chinese
  • t2s.json Traditional Chinese to Simplified Chinese Traditional to Simplified
  • s2tw.json Simplified Chinese to Traditional Chinese (Taiwan Standard)
  • tw2s.json Traditional Chinese (Taiwan Standard) to Simplified Chinese
  • s2hk.json Simplified Chinese to Traditional Chinese (Hong Kong variant) Simplified to Hong Kong Traditional
  • hk2s.json Traditional Chinese (Hong Kong variant) to Simplified Chinese Hong Kong Traditional to Simplified
  • s2twp.json Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom
  • tw2sp.json Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom
  • t2tw.json Traditional Chinese (OpenCC Standard) to Taiwan Standard
  • hk2t.json Traditional Chinese (Hong Kong variant) to Traditional Chinese Hong Kong Traditional to Traditional (OpenCC standard)
  • t2hk.json Traditional Chinese (OpenCC Standard) to Hong Kong variant Traditional Chinese (OpenCC Standard) to Hong Kong Traditional
  • t2jp.json Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai)
  • jp2t.json New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai)
  • tw2t.json Traditional Chinese (Taiwan standard) to Traditional Chinese
 1     ➜ ~ opencc --help
 2
 3     Open Chinese Convert (OpenCC) Command Line Tool
 4     Author: Carbo Kuo <byvoid@byvoid.com>
 5     Bug Report: http://github.com/BYVoid/OpenCC/issues
 6
 7     Usage:
 8
 9     opencc [--noflush <bool>] [-i <file>] [-o <file>] [-c <file>] [--]
10             [--version] [-h]
11
12     Options:
13
14     --noflush <bool>
15         Disable flush for every line
16
17     -i <file>, --input <file>
18         Read original text from <file>.
19
20     -o <file>, --output <file>
21         Write converted text to <file>.
22
23     -c <file>, --config <file>
24         Configuration file
25
26     --, --ignore_rest
27         Ignores the rest of the labeled arguments following this flag.
28
29     --version
30         Displays version information and exits.
31
32     -h, --help
33         Displays usage information and exits.
34
35
36     Open Chinese Convert (OpenCC) Command Line Tool

3. Multi-file batch conversion

Use fd command, taking batch conversion of markdown files as an example:

1fd -e.md -x opencc -i {} -o {/.}.zh-tw.md -c s2t.json
  • fd -e .md means to find files ending with .md

  • -x/--exec perform additional operations on each result of the previous search

  • opencc -i {} -o {/.}.zh-tw.md -c s2t.json performs simplified and traditional conversion for each file, {} and {./} are the fd command Grammar (the example means, assuming that the input is aaa.md, the output is aaa.zh-tw.md. The reason why this is written is the multilingual rule requirement of the blog system. Readers can make appropriate changes according to the following instructions):

    • {}: A placeholder token that will be replaced with the path of the search result (documents/images/party.jpg).
    • {.}: Like {}, but without the file extension (documents/images/party).
    • {/}: A placeholder that will be replaced by the basename of the search result (party.jpg).
    • {//}: The parent of the discovered path (documents/images).
    • {/.}: The basename, with the extension removed (party).

If you want to convert a file, you just need to write the search matching rules and output the file name you want. I used the opencc and fd commands to convert thousands of articles from simplified to traditional in more than ten minutes. It is convenient and easy to switch between various styles of Chinese characters.

Lastmod: Monday, August 28, 2023

See Also:

Translations: