Skip to content

中文搜索支持 – 中文搜索​支持

¥Chinese search support – 中文搜索​支持

Insiders 为内置搜索插件添加了实验性的中文支持——鉴于中国用户数量众多,这一功能已被呼吁许久。

¥Insiders adds experimental Chinese language support for the built-in search plugin – a feature that has been requested for a long time given the large number of Chinese users.

继美国和德国之后,MkDocs 用户使用 Material 的第三大来源国是中国。长期以来,内置搜索插件无法正确分割中文字符,主要是因为缺少用于搜索分词和词干提取的lunr-languages支持。最新的 Insiders 版本为内置搜索插件添加了期待已久的中文支持,这是许多用户一直呼吁的。

¥After the United States and Germany, the third-largest country of origin of Material for MkDocs users is China. For a long time, the built-in search plugin didn't allow for proper segmentation of Chinese characters, mainly due to missing support in lunr-languages which is used for search tokenization and stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users.

MkDocs 的材料终于​支持​中文​了!文本​被​正确​分割​并且​更​容易​找到。

¥Material for MkDocs終於​支持​中文​了!文本​被​正確​分割​並且​更​容易​找到。

本文介绍如何在几分钟内为内置搜索插件设置中文支持。

¥This article explains how to set up Chinese language support for the built-in search plugin in a few minutes.

配置

¥Configuration

MkDocs 的 Material 版中文支持由jieba提供。jieba 是一个优秀的中文文本分词库。如果安装了jieba ,内置的搜索插件会自动检测中文字符并进行分词。您可以使用以下命令安装jieba

¥Chinese language support for Material for MkDocs is provided by jieba, an excellent Chinese text segmentation library. If jieba is installed, the built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install jieba with:

pip install jieba

仅当您在mkdocs.yml中指定了分隔符配置时,才需要执行下一步。文本使用零宽度空格字符进行分段,因此在搜索模式中呈现的效果完全相同。调整mkdocs.yml以使分隔符包含\u200b字符:

¥The next step is only required if you specified the separator configuration in mkdocs.yml. Text is segmented with zero-width whitespace characters, so it renders exactly the same in the search modal. Adjust mkdocs.yml so that the separator includes the \u200b character:

plugins:
  - search:
      separator: '[\s\u200b\-]'

这就是全部需要的。

¥That's all that is necessary.

用法

¥Usage

如果您按照配置指南中的说明操作,中文单词现在将使用jieba进行标记。请尝试搜索“支持” ,了解它如何与内置搜索插件集成。

¥If you followed the instructions in the configuration guide, Chinese words will now be tokenized using jieba. Try searching for 支持 to see how it integrates with the built-in search plugin.


请注意,这是一个实验性功能,而我@squidfunk 的中文还不够流利(至少目前还不够)。如果您发现任何错误或认为可以改进,请提交问题

¥Note that this is an experimental feature, and I, @squidfunk, am not proficient in Chinese (yet?). If you find a bug or think something can be improved, please open an issue.