Skip to Main Content
Python 机器学习实践:测试驱动的开发方法
book

Python 机器学习实践:测试驱动的开发方法

by Matthew Kirk
January 2018
Intermediate to advanced content levelIntermediate to advanced
211 pages
8h 31m
Chinese
China Machine Press
Content preview from Python 机器学习实践:测试驱动的开发方法
96
6
我们的布朗解析类:
CorpusParser
分词的关键在于数据源。最重要的一点在于只有导入包含合适信息的数据源分词器模
型才会从中不断学习。首先需要做一些希望它如何工作的假设。我们希望存储每一个
过渡过程,该过程是由词语和标记两个数组组合而成的,然后将该过程封装在名为
CorpusParser::TagWord 的简单类中。初步测试如下:
The Seam of Our Part-of-Speech Tagger: CorpusParser
The seam of a part-of-speech tagger is how you feed it data. The most important
point is to feed it proper information so the part-of-speech tagger can utilize and
learn from that data. First we need to make some assumptions about how we want it
to work. We want to store each transition from a word tag combo in an array of two
and then wrap that array in a simple class called
CorpusParser::TagWord. Our initial
test looks like ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Python for Bioinformatics

Mastering Python for Bioinformatics

Ken Youens-Clark

Publisher Resources

ISBN: 9787111581666