corpora
corpora 英 [ˈkɔ:pərə] 美 [ˈkɔrpəs]
n. 任何事物之主体;全集
名词复数:corpora
- 请先登录
- n. 任何事物之主体;全集
-
1. NLTK corpora documents often come pre-tagged for parts of speech, but you can certainly add your own tags to untagged documents.
NLTK 全集文档通常有部分专门语言已经预先添加了标签,不过,您当然可以 将您自己的标签添加到没有加标签的文档。
-
2. Tokenization matters a lot for random text collections; in fairness to NLTK, its bundled corpora have been packaged for easy and accurate tokenization with WSTokenizer().
断词方法对随机文本集合来说至关重要;公平地讲,NLTK 捆绑的全集已经通过 WSTokenizer() 打包为易用且准确的断词工具。
-
3. One fairly simple thing you are likely to do with linguistic corpora is analyze frequencies of various events within them, and make probability predictions based on these known frequencies.
对于语言全集,您可能要做的一件相当简单的事情是分析其中各种 事件(events) 的 频率分布,并基于这些已知频率分布做出概率预测。
- 请先登录
0 个回复