Errata

Errata for Mining the Social Web

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
	Ex 2-6 United States	Hi, Example 2-6 displays code for microformats_mapquest_geo.py and suggests that the example URL should be http://local.mapquest.com/franklin-tn. However Mapquest Local is no longer suppported and they suggest that we now use Mapquest Vibe (mqvibe.mapquest.com)	Femi Anthony	Jul 10, 2012
	Example 2-9 code section	The purpose of this example is to demonstrate parsing restaurant review information as delineated by the hReview tag. However the suggested url - http://www.yelp.com/biz/bangkok-golden-fort-washington-2 no longer makes use of such a tag and no data is produced.	Femi Anthony	Jul 12, 2012
Printed	Page page 108 ~middle	when running the code, unsure where connections['values'] is defined? Thank you. Andrew M. Neiderer	Andrew M. Neiderer	Apr 02, 2016
Printed	Page 5 Example 1-3. Retrieving Twitter search trends	It seems that the last correction on this section was for API 1.0, but, at least for me, it no longer works even for that version without authentication. Here's what I had to do in order to get this example working: First, I created an account on Twitter and registered an application at https://dev.twitter.com/apps/new . Then, I used >>> consumer_key, consumer_secret = "4papHqXEJLsqVkkq4zuUhO", "bgaopWSAIP7x1245a60kMXeQ8jNIo0BZZLl2aNKd2k" to store my application's authentication codes (which of course are not these), as given in the app's page. After that, I authorized the app to use my user account with >>> twitter.oauth_dance("My App Name", consumer_key, consumer_secret, "token.txt") . "token.txt" is the file name I chose to store the retrieved ouath data. I then called >>> oauth_token, oauth_secret = read_token_file("token.txt") to load the recently stored data. At this point, I created an API object like described by the last workaround: >>> twitter_api = twitter.Twitter(domain="api.twitter.com", api_version='1', auth=twitter.OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)) but with the added authentication. Then I again followed the workaround and did: >>> world_trends = api.trends._(1) # Using 1 for global location >>> [trend for trend in world_trends()[0]['trends']] And everything worked fine. But it should be noted that this method uses the deprecated API 1.0, and using the newest version (1.1) requires little modification: >>> twitter_api = twitter.Twitter(domain="api.twitter.com", api_version="1.1", auth=twitter.OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)) >>> world_trends = twitter_api.trends.places._(1) # Here's the major difference >>> [trend for trend in world_trends()[0]['trends']]	Andr? S? de Mello	Jan 11, 2013
PDF	Page 5, 7 Example 1-4, Example 1-6	I thin there was some issue regarding the Twitter API changing. A similar issue appears to be for this section as well. The result of the code on my system is: twitter.api.TwitterHTTPError: Twitter sent status 404 for URL: 1.1/search.json using parameters: (q=SNL&rpp=100&page=1) details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]} Opening the link in the paragraph that followed works. After some searching around, I signed up for an API key and used OAuth in instantiating the Twitter object. I then had to drop a couple of the params being passed, as apparently, they are no longer used either. My ending code is as follows: ===== from twitter import * from vars import * #private vars for auth import json twitter_api = Twitter(auth=OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)) search_results = twitter_api.search.tweets(q="SNL", count=2) ===== I presume that this change may also be affecting the iteration on page 7, Example 1-6, which states "TypeError: string indices must be integers" I'd like some clarification on how to parse this data given the change in the search code. Thanks!	jktravis	Apr 28, 2013
PDF	Page 9 Last Paragraph	All of the word tokens in the example are in lower case. Yet, when I run the code I get some results that are in upper or mixed case. I believe there's a missing call to lower(), perhaps as early as in example 1.7's list comprehension. I suspect the 3rd line of the example 1.7 listing was intended as: words += [ w.lower() for w in t.split() ]	Peter Haglich	Sep 20, 2012
PDF	Page 31 Example 2-6	MapQuest local seems to have changed their URL format from http://local.mapquest.com/franklin-tn to http://local.mapquest.com/us/tn/franklin/	Peter Haglich	Sep 21, 2012
PDF	Page 31 Example 2-6	MapQuest Local doesn't seem to embed geo microformat data any more. $ python microformats__mapquest_geo.py http://local.mapquest.com/us/tn/franklin/ No location found	Peter Haglich	Sep 21, 2012
PDF	Page 103 example 1-11	Last executable line of program is: sorted(nx.degree(g)) This should produce the degree of each node as shown in the book as: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...... However in order to get that output, this line should be: sorted(nx.degree(g).values())	abayomi king	May 22, 2012
PDF	Page 111 example1=12	The value of n1 for one tweet caused the following error : UnicodeEncodeError: 'charmap' codec can't encode character u'\u201c' in position 14: character maps to <undefined> File "C:\Users\me\Desktop\python programs\collective intell book\ex1.3.py", line 89, in <module> print n1, n2, g[n1][n2]['tweet_id'] File "C:\Python27\Lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) The value of n1 that caused this error was seen on mousing over a breakpoint as: u'@NfamousKaye :\201c@RollingStone' The n2 that corresponds is: HotFemaleRapper I don't know enough of the regular expression notation to fix it. But removing the last + sign in : re.compile(r"(RT\|via)((?:\b\W*@\w+)+) Prevents the error from occurring	abayomi king	May 22, 2012
Printed	Page 126 Example 5.4	I can't seem to run the script example 5.4. I've had some problems installing couchdb, so I accept this may be a configuration issue, but I'm writing this to you just in case... I ran script 5.3 (latest from github) python the_tweet__harvest_timeline.py user 16 envagency this creates the db, the curl result is below curl http://127.0.0.1:5984/tweets-user-timeline-envagency {"db_name":"tweets-user-timeline-envagency","doc_count":1624,"doc_del_count":0,"update_seq":1624,"purge_seq":0,"compact_running":false,"disk_size":4518001,"data_size":4459942,"instance_start_time":"1335271904150644","disk_format_version":6,"committed_update_seq":1624} If I then run example 5.4 (latest from github) I get an error. python the_tweet__count_entities_in_tweets.py tweets-user-timeline-envagency (I note that you can supply a second param FREQ_THRESHOLD) but I believe if it's not supplied I get a default value... anyway error is: Traceback (most recent call last): File "the_tweet__count_entities_in_tweets.py", line 85, in <module> db.view('index/entity_count_by_doc', group=True)], File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 871, in __iter__ for row in self.rows: File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 893, in rows self._fetch() File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 881, in _fetch data = self.view._exec(self.options) File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 766, in _exec resp, data = self.resource.get(self._encode_options(options)) File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 978, in get return self._request('GET', path, headers=headers, params) File "/usr/lib/pymodules/python2.6/couchdb/client.py", line 1035, in _request raise ServerError((status_code, error)) couchdb.client.ServerError: (500, ('EXIT', '{{badmatch,[]},\n [{couch_query_servers,new_process,3},\n {couch_query_servers,lang_proc,3},\n {couch_query_servers,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}')) I see if I go to futon: http://127.0.0.1:5984/_utils/database.html?tweets-user-timeline-envagency that in the View drop-down I now get Index > entity_count_by_doc (so this is created by script 5.4) but if I try to view from within futon I get the same error: <127.0.0.1> Error: EXIT {{badmatch,[]}, [{couch_query_servers,new_process,3}, {couch_query_servers,lang_proc,3}, {couch_query_servers,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} Any idea what I haven't got configured correctly? I'm running Apache CouchDB 1.2.0 on Ubuntu 10.04 Many thanks	Anonymous	Apr 24, 2012
PDF, Other Digital Version	Page 176 2nd paragraph	Jaccard distance implementation as: len(X.union(Y)) - len(X.intersection(Y)))/float(len(X.union(Y)) should be: (len(X.union(Y)) - len(X.intersection(Y)))/float(len(X.union(Y)))	Zhitong He	Feb 19, 2012