www.it-ebooks.info
www.it-ebooks.info
Learn how to turn > The Rock (1996) ... ...
These bits of meta /> %s: %s' % \ (p['image']['url'], p['id'], p['displayName'])] HTML(''.join(html))
4.2. Exploring the Google+ API
www.it-ebooks.info
|
141
Sample results are displayed in Figure 4-2 and provide the “quick fix” that we’re looking for in our search for the particular Tim O’Reilly of O’Reilly Media.
Figure 4-2. Rendering Google+ avatars as images allows you to quickly scan the search results to disambiguate the person you are looking for Although there’s a multiplicity of things we could do with the People API, our focus in this chapter is on an analysis of the textual content in accounts, so let’s turn our attention to the task of retrieving activities associated with this account. As you’re about to find out, Google+ activities are the linchpin of Google+ content, containing a variety of rich content associated with the account and providing logical pivots to other platform ob‐ jects such as comments. To get some activities, we’ll need to tweak the design pattern we applied for searching for people, as illustrated in Example 4-3. Example 4-3. Fetching recent activities for a particular Google+ user import httplib2 import json import apiclient.discovery USER_ID = '107033731246200681024' # Tim O'Reilly
142
|
Chapter 4: Mining Google+: Computing Document Similarity, Extracting Collocations, and More
www.it-ebooks.info
# XXX: Re-enter your API_KEY from # if not currently set # API_KEY = ''
https://code.google.com/apis/console
service = apiclient.discovery.build('plus', 'v1', http=httplib2.Http(), developerKey=API_KEY) activity_feed = service.activities().list( userId=USER_ID, collection='public', maxResults='100' # Max allowed per API ).execute() print json.dumps(activity_feed, indent=1)
Sample results for the first item in the results (activity_feed['items'][0]) follow and illustrate the basic nature of a Google+ activity: { "kind": "plus#activity", "provider": { "title": "Google+" }, "title": "This is the best piece about privacy that I've read in a ...", "url": "https://plus.google.com/107033731246200681024/posts/78UeZ1jdRsQ", "object": { "resharers": { "totalItems": 191, "selfLink": "https://www.googleapis.com/plus/v1/activities/z125xvy..." }, "attachments": [ { "content": "Many governments (including our own, here in the US) ...", "url": "http://www.zdziarski.com/blog/?p=2155", "displayName": "On Expectation of Privacy | Jonathan Zdziarski's Domain", "objectType": "article" } ], "url": "https://plus.google.com/107033731246200681024/posts/78UeZ1jdRsQ", "content": "This is the best piece about privacy that I've read ...", "plusoners": { "totalItems": 356, "selfLink": "https://www.googleapis.com/plus/v1/activities/z125xvyid..." }, "replies": { "totalItems": 48, "selfLink": "https://www.googleapis.com/plus/v1/activities/z125xvyid..." }, "objectType": "note" }, "updated": "2013-04-25T14:46:16.908Z", "actor": {
4.2. Exploring the Google+ API
www.it-ebooks.info
|
143
"url": "https://plus.google.com/107033731246200681024", "image": { "url": "https://lh4.googleusercontent.com/-J8nmMwIhpiA/AAAAAAAAAAI/A..." }, "displayName": "Tim O'Reilly", "id": "107033731246200681024" }, "access": { "items": [ { "type": "public" } ], "kind": "plus#acl", "description": "Public" }, "verb": "post", "etag": "\"WIBkkymG3C8dXBjiaEVMpCLNTTs/d-ppAzuVZpXrW_YeLXc5ctstsCM\"", "published": "2013-04-25T14:46:16.908Z", "id": "z125xvyidpqjdtol423gcxizetybvpydh" }
Each activity object follows a three-tuple pattern of the form (actor, verb, object). In this post, the tuple (Tim O’Reilly, post, note) tells us that this particular item in the results is a note, which is essentially just a status update with some textual content. A closer look at the result reveals that the content is something that Tim O’Reilly feels strongly about as indicated by the title “This is the best piece about privacy that I’ve read in a long time!” and hints that the note is active as evidenced by the number of reshares and comments. If you reviewed the output carefully, you may have noticed that the content field for the activity contains HTML markup, as evidenced by the HTML entity I've that appears. In general, you should assume that the textual content="text/html; charset=UTF-8"/> %s """ blog_ content="text/html; charset=UTF-8"/> %s """ blog_) # Get the collection stats (collstats) on a collection # named "mbox" print json.dumps(db.command("collstats", "mbox"), indent=1) # Use the db.command method to issue a "text" command # on collection "mbox" with parameters, remembering that # we need to use json_util to handle serialization of our JSON print json.dumps(db.command("text", "mbox", search="raptor", limit=1), indent=1, default=json_util.default)
MongoDB’s full-text search capabilities are quite powerful, and you should review the text search documentation to appreciate what is possible. You can search for any term out of a list of terms, search for specific phrases, and prohibit the appearance of certain terms in search results. All fields are initially weighted the same, but it is also even possible to weight fields differently so as to tune the results that may come back from a search. In our Enron corpus, for example, if you were searching for an email address, you might want to weight the To: and From: fields more heavily than the Cc: or Bcc: fields to improve the ranking of returned results. If you were searching for keywords, you might want to weight the appearance of terms in the subject of the message more heavily than their appearance in the content of the message. In the context of Enron, raptors were financial devices that were used to hide hundreds of millions of dollars in debt, from an accounting standpoint. Following are truncated sample query results for the infamous word raptor, produced by running a text query in the MongoDB shell: > db.mbox.runCommand("text", {"search" : "raptor"}) { "queryDebugString" : "raptor||||||", "language" : "english", "results" : [ { "score" : 2.0938471502590676, "obj" : { "_id" : ObjectId("51a983dfe391e8ff964c63a7"), "Content-Transfer-Encoding" : "7bit", "From" : "
[email protected]", "X-Folder" : "\\SSHACKL (Non-Privileged)\\Shackleton, Sara\\Inbox", "Cc" : [ "
[email protected]" ],
6.3. Analyzing the Enron Corpus
www.it-ebooks.info
|
261
"X-bcc" : "", "X-Origin" : "Shackleton-S", "Bcc" : [ "
[email protected]" ], "X-cc" : "'
[email protected]'", "To" : [ "
[email protected]", "
[email protected]", "
[email protected]", "
[email protected]", "
[email protected]" ], "parts" : [ { "content" : "Maricela, attached is a draft of one of the...", "contentType" : "text/plain" } ], "X-FileName" : "SSHACKL (Non-Privileged).pst", "Mime-Version" : "1.0", "X-From" : "Ephross, Joel ", "Date" : ISODate("2001-09-21T12:25:21Z"), "X-To" : "Trevino, Maricela