Errata

Mining the Social Web

Errata for Mining the Social Web

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF
Page 131
Example 5-5

Result is wrong when "screen_name" includes an uppercase character.
CouchDB store data with lowercase characters while Redis data include also uppercase, then a user friend is not recognized.

Line 23, to solve
friend_screen_names.append((json.loads(r.get(getRedisIdByUserId(friend_id, 'info.json')))['screen_name']).lower())

just .lower()

Note from the Author or Editor:
Good catch. Thank you!

Alessio Dovico Lupo  Nov 08, 2011 
Other Digital Version
5
Example 1.3

Example 1.3 has a confirmed errata but even following that leads to an error apparently initially relating stdin and finishing with a message suggesting that things are out of date:
"code":68,"message":"This endpoint is deprecated and should no longer be used"

Note from the Author or Editor:
Update #2

We need to add a warning to Page 4 or 5 somewhere around or before Example 1-3 that says the following:

"There may be a lag between the latest version of the <code>twitter</code> package as reflected in its source tree on GitHub and what is available from <a href="http://pypi.python.org/pypi/twitter/">PyPi</a>, which <code>easy_install</code> references for its installation. You can install the latest version of <code>twitter</code> (or any other GitHub hosted project) using <code>pip</code>, an alternative to <code>easy_install</code>. First, install <code>pip</code> via <code>easy_install pip</code> and then use <code>pip</code> to install from GitHub like so: <code>pip install -e git+git://github.com/sixohsix/twitter.git#egg=twitter-latest</code>.


We need to change the code on page 5 to reflect the change in the Twitter Trends API that caused this to be an issue like so:

>>> import twitter
>>> twitter_api = twitter.Twitter(domain="api.twitter.com", api_version='1')
>>> WORLD_WOE_ID = 1 # The Yahoo! Where On Earth ID for the entire world
>>> world_trends = twitter_api.trends._(WORLD_WOE_ID) # get back a callable
>>> [ trend for trend in world_trends()[0]['trends'] ] # call the callable and iterate through the trends returned

We need to make a few changes to the prose on page 5 to reflect the change to the Twitter Trends API:

Change the "Twitter's Trends API" link target in the first paragraph to <https://dev.twitter.com/docs/api/1/get/trends/%3Awoeid>.

Make the following change to the second sentence of the paragraph following Example 1-3: "For example, twitter_api.trends() initiates an HTTP call to GET http://api.twitter.com/1/trends .json, which you could type into your web browser to get the same set of results." ==> "For example, <code>world_trends()</code> initiates an HTTP call to GET http://api.twitter.com/trends/1.json, which you could type into your web browser to get the same set of results."

I think that gets everything back to where it needs to be. I'd appreciate the option to review these changes before you take them live so that we can make sure it all lands correctly.
************

Update #1:

See <https://github.com/sixohsix/twitter/issues/56> for a workaround for Example 1.3. An follow up to this ticket with updates to the prose will follow as Update #2

************
Original Response:

It's true that there is now a new problem with this example (and some other scripts that use the /trends) api later in the book because Twitter has once again changed their Trends API endpoint. See <https://dev.twitter.com/blog/changing-trends-api> for details on that fairly recent change. Unfortunately, as far as I can tell, the twitter Python library used by the book really doesn't lend itself all that well to the new kind of API call that would be required to get trends i.e. a call of the form <http://api.twitter.com/1/trends/1.json>. In the meanwhile, urllib2 could be used to initiate the call to the trends API. I'll have to think on this and figure out how to best handle the overall situation given Twitter's increasing movement towards this API endpoint scheme. I'll also work with the author of the twitter python library on something that might work as a workaround in the meanwhile.

Shirley  Oct 05, 2011 
PDF
Page 213
Example 7.4 and Table 7.3

In the Example 7.4, the call for tf_idf function should consider corpus.values() instead of just corpus:
wrong: score = tf_idf(term, corpus[doc], corpus)
fix: score = tf_idf(term, corpus[doc], corpus.values())

In the Table 7.3 some values for tf-idf are wrong due this mistake. For example, tf-idf(mr.) in corpus['a'] should be equal to 0.2209 and not 0.1053.

Note from the Author or Editor:
Nice catch! Thanks so much for noticing this mistake and reporting it. The code and manuscript is now updated and will be reflected in future print versions and ebook updates soon.

Renato Silva das Neves  Aug 11, 2011 
Printed
Page 125
2nd line from the end of code sample

In the code, the line "wait_period=2" should be removed (2nd line fromt the end).
This line is removed from the sample code file.

Note from the Author or Editor:
Thank you. Next printings/ebook updates will reflect this fix.

Uzi  Jun 14, 2011 
PDF
Page 45
Example 3-2. Message flow from Example 3-1

Two instances of 2001 should be 2009 in the example.
(Posted previously with wrong page number, apologies. The page is labelled 45)

Note from the Author or Editor:
On the first and third line of page 45, change "Dec 2001" to "Dec 2009" in both locations.

On page 42, in example 3-1, change "Dec 2001" to "Dec 2009" on the 5th line.

On page 43, in example 3-1, change "Dec 2001" to "Dec 2009" in what looks like the 16th line.

Computermacgyver  Jun 10, 2011 
Printed, PDF, , Other Digital Version
Page 48
first paragraph

The files enron.mbox.gz and enron.mbox.json.gz should be hyperlinks to rather large files that I've pre-converted for readers. Somehow, these hyperlinks got left out during final editing. The files are located at http://zaffra.com/static/matthew/enron.mbox.gz and http://zaffra.com/static/matthew/enron.mbox.json.gz but I'd prefer that we move them to an oreilly.com destination so that they're guaranteed to be available.

Matthew Russell
Matthew Russell
 
Jun 04, 2011 
Printed, PDF,
Page 5
Example 1-3. Retrieving Twitter search trends

When i try in my browser (http://search.twitter.com/trends.json) i get:

The page you were looking for doesn't exist.
You may have mistyped the address or the page may have moved.

So is not valid anymore.

>>> trends = twitter_search.trends()

Note from the Author or Editor:
Unfortunately, this particular issue is a result of ongoing Twitter API changes. In particular, the "trends" API has been moved from search.twitter.com to api.twitter.com as documented at <http://groups.google.com/group/twitter-api-announce/browse_thread/thread/6f734611ac57e281/2f39498beeaa25bf?show_docid=2f39498beeaa25bf> as "[Soon] The trends endpoints on search.twitter.com are being turned off as they exist on api.twitter.com instead"

So, this isn't errata in the sense that it is an error that always existed, but it is errata in the sense that something in the book is not broken.

To address this particular issue, Example 1-3 should instead read:


>>> import twitter
>>> twitter_api = twitter.Twitter(domain="api.twitter.com", api_version='1')
>>> trends = twitter_api.trends()
>>> [ trend['name'] for trend in trends['trends'] ]
[u'#youdeservetobesingle', u'#palmsunday', u'OJD', u'#gratwe', u'Jessica Jung', u'SwiftMusicVideos', u'T\u0336I\u0336D\u0336A\u0336K\u0336 \u0336L\u0336U\u0336L\u0336U\u0336S\u0336', u'Hosanna', u'LEE TAEMIN', u'Ferrer']

However, the final line with the data in it can remain the same as in the print since the results of the trends are not relevant.

There are a few other changes that are now also necessary on page 5 where Example 1-3 is shown:

The sentence "Without further ado, let?s find out what people are talking about by inspecting the trends available to us through Twitter?s search API." that precedes Example 1-3 is also now incorrect and should now read "Without further ado, let?s find out what people are talking about by inspecting the trends available to us through Twitter?s trends API." The hyperlink in the existing sentence to "Twitter's search API" should now point to <http://dev.twitter.com/doc/get/trends>.

The caption for Example 1-3 should now read "Retrieving Twitter trends" instead of "Retrieving Twitter search trends".

The sentence "For example, twit ter_search.trends() initiates an HTTP call to GET http://search.twitter.com/trends .json, which you could type into your web browser to get the same set of results." should now read "For example, twit ter_api.trends() initiates an HTTP call to GET http://api.twitter.com/1/trends .json, which you could type into your web browser to get the same set of results."

Finally, Example 1-4 now also needs to change to reflect the change we made to Example 1-3. We need to prepend this line to the listing to define the twitter_search variable:

>>> twitter_search = twitter.Twitter(domain="search.twitter.com")

Please let me know when this is updated so that I can do a review on the page since these changes are somewhat extensive.

And unfortunately, I'm sure that there's example code in Mining the Social Web and 21 Recipes for Mining Twitter that will also now break because of this change. I'll make a note to try and track it down, file "errata" on myself and fix it ASAP (sometime in the next week).

FunkyM0nk3y  Apr 16, 2011  Apr 21, 2011
PDF
Page 12
Code Example 1-11

>>> sorted(nx.degree(g))

should be:

>>> sorted(nx.degree(g).values())

Note from the Author or Editor:
This is due to a versioning problem. As of the latest version of NetworkX (v1.6), you do indeed need to add .values() as you describe. (An earlier version did not require that to be the case.)

Darryl Amatsetam  Mar 31, 2011 
PDF
Page 11
1st paragraph after tip at top of page

"By convention, Twitter usernames being with an @ symbol" should be "By convention, Twitter usernames begin with an @ symbol".

Note from the Author or Editor:
Good catch. This must have been missed during copyedit.

Keith McDonald  Mar 22, 2011  Apr 21, 2011
PDF
Page 15
The example file introduction__retweet_visualization.py and page 14 example 1-12

When querying Twitter for data I often get usernames back that contain non-ascii characters. The poses a problem when running the example code - specifically when writing to the output dot file.

I have found that this can be fixed by changing the line:
f.write('''strict digraph {
%s
}''' % (';\n'.join(dot), ))

to the following, that will ensure we are using UTF-8 encoding:
f.write('''strict digraph {
%s
}''' % (';\n'.join(dot).encode('utf-8'), ))

I am using Python2.6 on a Ubuntu installation.

I don't know if it is only a problem because of my locale (I am living in Denmark) or if it is a general problem - but I thought you might want to take it into consideration.

Note from the Author or Editor:
Note that this particular example is not an "in-print error" - it's an issue with sample code that isn't in print, so no update of the PDF/book is needed.

This particular example was noted at https://github.com/ptwobrussell/Mining-the-Social-Web/wiki/Confirmed-Bugs-and-Errata and the example code was fixed in this commit - https://github.com/ptwobrussell/Mining-the-Social-Web/commit/a1118d89a6ca4df61d148b9afa45a631320a5a99

It's entirely possible that there may be some other examples with glitches related to Unicode, and I'll fix them and check them into the repository as promptly as they are reported or I discover them.

Søren Blond Daugaard  Feb 02, 2011