38193819

David Harry 2013/01/07

Google and the Social Graph; more data to feed the beast

Unwrapping the Secrets of SEO Time to read: 16 min

Did you know there are graphs beyond the link graph at Google? Yes, indeed my friend it is true. One of the more interesting and notable ones of course being, the social graph. Over the last few years there’s been an (understandable) interest from SEOs into how Google views social media/networking and how it plays into the greater scheme of things.

To that end, there were a few patents awarded to Google last year, (links at the end) that I thought might be interesting to get into and see if we can get a sense of the mind-set over there.

It was an interesting journey in that I haven’t put a whole lot of stock in the “social as a ranking factor” school of thought. While that hasn’t entirely changed, indeed I can see better how Google might be using social signals in the larger context of search.

Let’s get into it shall we?

What are they looking at?

Google seems to want to look at what people are posting on social channels including text, links, images and videos. The are also looking for connections between users. Some examples given include;

explicit acquaintance relationship (e.g., designation as friends, colleagues, fans, blog feed followers, etc.),
an implicit acquaintance relationship (e.g., friends in common, messages sent between users, viewing another user’s profile page, etc.),
a common group membership (e.g., membership in a group related to a particular interest, membership in a group related to a particular geographic area, etc.),
participation in a common activity (e.g., users posting messages to the same forum, users playing an online game together, etc.), etc.

When trying to establish the relationship of links of and between users, they might look at;

Users that share a link may be likely to have common interests and may be likely to post content to their profile pages related to similar topics.
Links for a user with a profile page containing content known to pertain to a particular subject of interest can indicate that the profile pages of other linked-to users are likely to also contain content that pertains to the particular subject of interest.
Links among users of a social network can be used to propagate classifications (e.g., advertisement-related content, illegal content, inappropriate content for minors, etc.) for content that has already been identified as pertaining to a particular subject of interest to other content for which a classification is unknown.

When establishing connections they may do this on a common set of interests. The more common content shared between ‘friends’ on a network, the tighter the implicit relationship can be considered. This can then be used for classification of the users. Keep in mind that even when Google does ‘personalization‘, it isn’t on a user-by-user basis, but actually on a set of users classified to be related topically (by searching activities). This seems to be a large part of the approach when considering social elements as well.

They discuss in some of the patents the differences between explicit and implicit connections;

Explicit; designation as friends, colleagues, fans, blog feed followers, etc.
Implicit; friends in common, messages sent between users, viewing another user’s profile page, etc.
Common groups; membership in a group related to a particular interest, membership in a group related to a particular geographic area, etc.
Common activities; e.g., users posting messages to the same forum, users playing an online game together, etc.

The strength of the connections and topicality can then be scored. Certainly the goal here seems to be for delivering advertising, but obviously it can also play into the social elements of personalized search as well.

This is what we’ve commonly called the ‘social graph’ and indeed there even used to be an API offered by Google once upon a time. A definition touched on in one of the patents includes;

“The system can also include a social graph linking module configured to determine a social network graph for at least a portion of the social network from the information received by the interface, the graph including a plurality of nodes connected by links, each node corresponding to a user that is registered with the social network and that has a profile page on the social network. The system can further include a score seeding component that identifies first nodes from the plurality of nodes as including content associated with a particular subject of interest and that seeds the identified first nodes with first scores that indicate profile pages for the identified first nodes are positively identified as including content associated with the particular subject of interest. “

And, of interest, an advantage noted reads;

“Content can be detected on a social network with greater efficiency. Instead of relying upon manual review of the pages of a social network, pages that likely contain content can be quickly located based upon links between users of the social network. A greater amount of content can be located on a social network in less time than under traditional manual review. Additionally, detection of content on the social network using links between users permits for a high degree of accuracy. Furthermore, detecting content based on links between users of a social network can have greater accuracy and efficiency than other automated techniques, such as content-based detection techniques. “

Which is interesting in that they seem to be interested more in the commonly shared content, than the actual content itself. It’s almost a ‘TrustRank’ type concept, for social. In fact, there are elements that talk about also identifying those sharing copyrighted material which also leans towards the TR type concepts.

Google's view of the social graph — Google’s view of the social graph

Going beyond known data

It seems another one of the core interests in social data, over other known data points they already had, can include;

the user’s age or age range,
educational level or range,
income level or range,
language preferences,
marital status,
geographic location (e.g., the city, state and country, street address, zip code, area code),
cultural background or preferences,
whether the user is a member of in one or more defined groups (e.g., organizations, companies, associations, clubs, committees, and the like).
psychographic information (e.g., personality trait information, or other personality descriptive information)

These can be derived from aspects of the user profile, or expressly provided by the user. And of course, given how limited their data was prior to social networking, we can see what a boon this kind of deeper data can mean to Google.

Building trust and authority

This all does seem to be interesting in the fact that they can adapt trust concepts to the social graph to not only add a trust score to users/entities but they might also use it to assign a sense of authority or even influence, something we’ve seen before from various methods of social graph analysis.

Back in 2008 I wrote; Social networks are open for profiling

This was one of the early sets of patents from Google (filed in 2007) on social profiling and certainly a forerunner to these ones. Again, the ultimate goal in those was about targeting influencer types and advertising.

The case could be made that this type of trust/authority data can also be used for natural and personalized search results and ranking at some point. Surely it’s already crept into the latter (personalized search). And I’d venture a guess that it can be useful as far as dealing with spam as the spammers often won’t have a strong social signal to accompany their other spammmy efforts.

The Advertising Angle

And of course we have to consider what Google’s interest is in identifying the social graph; advertising. We all know that Google is big on tailoring advertising to contextual content, seems this is no different;

“… a social network may desire to provide advertising that is related to the content on a user’s profile page. However, without a designation (e.g., content tag, content classification, etc.) associated with customized content added by users, the social network may not be able to accurately provide such content-related advertising. “

But this isn’t limited to advertising ON the social network, but in search results as well. In one of the patents related to ad serving, they talk about possibly using social graph information in context with other known signals to display targeted advertisements to (logged in) users.

“The user profile performance information may be user specific or aggregated. User specific information is the performance information for a particular individual user’s profile. Aggregated user profile information is information for a defined user profile, aggregated (e.g., averaged) from all individual users who user profiles match the defined user profile. The difference is illustrated as follows. A particular user, say John Q. Searcher, will have a user profile describing his interests in, for example {sports, baseball, football}. Associated with Mr. Searcher’s user profile may be specific performance information, including Mr. Searcher’s click-through rate, click through count, and conversion rate and count for sports related advertisements, for baseball related advertisements, and for football related advertisements. These would be examples of user profile specific performance information. “

They even discuss looking at CTR as well;

“Now, assume that there are several thousands users, each with their own user profiles which happen to include one or more of {sports, baseball, football} as interests. Seven different aggregated user profiles can be defined (from the combinations of these 3 interests) and for each user profile, aggregate performance information can be calculated. For example, an average click through rate for the user profile {sports, football} can be calculated from those profiles that include both of these interests. Thus, an advertiser (or the system operator) can specify how to adjust the advertisers’ offer prices based on either specific or aggregate profiles. “

As you would imagine, classifying users by relations and content types would be of particular interest to most advertisers. We’re used to seeing this with AdSense for content, this just takes it to the next logical level on the social graph. In fact, we can also surmise that this could extend to the core vanilla search on Google as well. More user data will lend itself to tighter targeting in general. This certainly does give some deeper insight into why there’s such a big push at Google to get Google Plus into the mainstream.

All roads lead to Google Plus

In reading these patents I do seem to get a sense of, as mentioned, the importance of Google Plus and it’s success. From what they seem to be looking at, consider;

User connections
Geo-graphic connections (and Local)
Companies (in common)
Communities

By not only looking at the content being shared (and re-shared) but the other factors above, Google can indeed make some interesting social connections and gain some detailed data on those using the network. Sure, they can (and do) also look at other networks, but that pales in comparison to the detailed data they can get from those on Google +.

Again, the data they can get from the social network is massive in delivering more tailored search results and of course, advertising. For marketers, it can mean that we need to be ever more vigilant in establishing our demographic and psychographic information of our target groups. While I am still not sold on Google using social signals for non-personalized rankings, indeed there does seem to be plenty of data points that they have access to, that weren’t there before.

It is certainly something to watch…

Google patents covered in this post;