Skip to Main Content
Python 机器学习实践:测试驱动的开发方法
book

Python 机器学习实践:测试驱动的开发方法

by Matthew Kirk
January 2018
Intermediate to advanced content levelIntermediate to advanced
211 pages
8h 31m
Chinese
China Machine Press
Content preview from Python 机器学习实践:测试驱动的开发方法
150
8
现在我们的代码应该可以使用了。但需要注意的是,这里面仍有
Unicode
空格,它被
表示为 \u00a0
现在有一个新问题,那就是所有数据的总和并不等于
1
。我们将引入一个新的函数来
做归一化,它对数值进行散列,并使用 x / sumx来处理所有的数值。请注意,我
使用的是分数的形式,这增加了计算的可靠性,直到需要时才进行浮点运算:
Now we have a new problem, though, which is that the data does not add up to 1. We
will introduce a new function,
normalize, which takes a hash of values and applies
the function
x/sum(x) to all values. Note that I used Fraction, which increases the
reliability of calculations and doesn’t do floating-point arithmetic until needed:
class Tokenizer:
# tokenize
@classmethod
def normalize(cls, dist):
sum_values = sum(dist.values())
return {k: Fraction(v, sum_values) for k, v in dist.iteritems()} ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Python for Bioinformatics

Mastering Python for Bioinformatics

Ken Youens-Clark

Publisher Resources

ISBN: 9787111581666