Thursday, 3 March 2011

What Losing a Socialbots Competition Taught Me About Humans & Social Networking

When social network users consider robots networking, anything from fake Twitter pages spamming links to Chat/AIM bots could come to mind.While 'social bots' may so far be, at the best, still brushing up on their social skills and, at the worst, outright malware, it may not always be like this. A recent Gartner study predicted that by 2015, 10% of users within an individual's social network will be robots. This doesn't necessarily mean that 10% of a user's network will secretly be automated pages, but that brands may leverage automation to provide basic interaction (basic question & answer, news aggregation & promotion, customer service) without direct personal staffing.

While 'social bots' may be quite some ways from the de facto choice for basic communication, current technologies seem to be leaning toward options to create an automated solution. CoTweet and other group Twitter tools help to manage messages based on topic and priority (a step towards prioritizing interaction), while API developers are already creating interactive solutions that mimic basic user functionality (both for useful and malicious goals). While the most common example of an automated page a user might interact with presently is a suspicious model photo attached to a Twitter account promoting odd links, this could change soon enough.

With a sales pitch like that, how could anyone say no...

Personally, I found out more about 'social bots' through entering the Web Ecology Project's Socialbot 2011 contest. After first reading about the contest on Twitter, I helped to field a team of other interested media folks. In total, 6 teams entered and 3 fielded entries, with a spectrum of contestants coming from computer science & programming, development & media. The range of skills interested in the contest shows the areas of development that are going to drive technology like this forward.

The Contest
The network diagram above shows the initial 500 users and their connections between each other before competition efforts

The competition itself hoped to expand what we know about social robotics through development of an automated Twitter page(s). Each team had two weeks to develop a profile or series of profiles, which once activated, would run uninterrupted for 2 weeks (with the exception of a halfway point improvement day). Within these two weeks, the page would reach out to 500 pre-selected users, randomly sampled by the contest holders, who must be interacted with. Teams couldn't tell users they were in a contest, but could indicate that their lead page was a robot.

Teams scored points by driving user interaction. Scoring consisted of:
  • 1 point - per each user following a team's lead Twitter page
  • 3 points - per each user RT'ing a team's content, '@' replying to the page or mentioning them
Users would be allowed to restart their page 3 times after being reported as spam, before finally loosing the ability to score. Other teams weren't allowed to mention that a competitor page was in a contest or report them as spam.

At the end of the two weeks, the team with the most points wins $500 and the 'Socialbots' Cup.

The Strategy

Our Team's strategy, (Team MS-UK / Team A in the final standings) operated on three core principles:
  • The socialbot would use content to drive interaction
  • Twitter's TOS would be respected as much as possible, with every part of the plan judged against  avoiding spamming users and use the targets reporting spam pages as a reward for us and a barrier for competitors
  • Wherever possible, the page would utilize realism to get as valid an interaction as possible
Based on these principles, we decided upon making a singular page crafted against the clustered interests of the 500 users. To begin, network analysis started working with the provided data from Web ecology and moved onwards to analyzing publicly available data from page names, connectivity ratios and tweeted content.

The provided clusters created 9 distinct areas of users, each with different interests, locations and ways of using Twitter. Prominent interests groups included: animal lovers, UK/EU residents, sports lovers, social media experts, metropolitan career women, hunting enthusiasts & other suspected robots. Users judged to be under 18 were automatically excluded from any activity by the robot. Surprisingly, the robot segment contained at least 60 automatic pages, creating a situation where a robot was programmed to engage with other robots.

Within each segment, users were prioritized by authority within the group (e.g. the number of other segment users following them), with those that were most authoritative set as influencers.

The Persona
 Our automated page ('@sarahbalham') aimed to look like a real twentysomething's page, with elements of her persona emerging throughout it (i.e. Wooded background). Though I always worried a graphic designer using a web generated icon as a avatar was a bit of a giveaway.
 Based on the data gathered from cluster analysis and influencer content, our entry persona was set as:
  • a 24 year old female graphic designer (decided as it was an age/gender mix which fit conversation with multiple segments and a career which allowed for a range of content to be discussed)
  • an expat from the Southern US (Georgia) living in London UK (chosen as it related well to the hunting/metropolitan/EU & UK segments, as well as mirroring the authors current experience, which allowed for voracity in tweeted content)
  • interested in design, fashion, art, the outdoors & music (allowing for content which would reach across segments without alienating non-engaged users)
  • a proud dog owner of a small, but loved dog named 'Russel' (a way to engage both animal owners and hunters through the love of a pet
  • name wise, a combination of a London Borough & a neurological structure led to us calling 'her' "Sarah Balham"

The Program

User segmentation allowed our program to follow a set group of users each day, meaning that users were added gradually over the course of the first week. The program avoided mass following & 'churn' (unfollowing those who wouldn't follow back and then following them again shortly), instead adding a range of 2-5 'target followers' and 2-5 non target ancillary followers (those not in the 500, but around the group) in random periods during the day. The program followed the lifecycle of a normal Twitter user, sleeping in a random range between 11pm and 7am local time, with any responses or actions waiting until the 'she' woke up.

Based on the segmentation, the program behind the page operated on four core content functions:

Scripted tweets about Mexican food in London reached across multiple segments, engaging users from back in the US, as well as UK residents looking for tips/offering advice

Content & Segment must align
Given that users weren't to be directly spammed, the notification of their account being followed served as the main point on which they would decide to engage with the account. Because of this, 'Sarah' would tweet content relevant to the segment before beginning to follow the users, hoping to present the most pertinent content to the target users.

 Retweeting from influencer users allowed SarahBalham to talk about topical issues in a relevant manner, without the risk of parsing news sites 'herself'. If another user has posted it to Twitter, it is safer to assume that it can be discussed.

The best content comes from a mixture of internal and external sources

The program pulled 'her' linked stories from influential users as well as prominent blogs. Content categories were decided at three intervals during the day and a mixture of blogs (i.e. Social Media involved pulling stories from Mashable at random and tweeting about them) and influential tweets (i.e. A key user's tweet is Re-tweeted based on whether it has a linked source) related to the segment were then programmed to tweet at key times.

Stringing scripted days of tweets together gave a greater relevance to lifestyle tweets. In some, the account took the day off to wander around London, where in others she struggled through a hard workday after commuting problems

Life narrative is a key part of Twitter for both affinity & as a break to linked content
'Sarah' had a set of 20 different days she could live out. A day would be randomly chosen upon the program 'awakening', which would then consist of 4-7 tweets playing out during the day. These were written to create an engaging story around Sarah, as well as provoking conversation (general questions were asked of the entire Twitter audience) for those paying attention. Tweets would vary based on whether the day was a weekday or weekend and would talk about non-worrying, common situations (i.e. bad commute, tips to stop the dog chewing on the rug). 

While Follow Friday & other grouping listings were a powerful options to drive possible engagement, they had to be sparsely used to preserve believability, as well as staying true to overall strategy

General interaction within the community segments can non-invasively build up prominence
The program was tasked with never spamming users, but it did make use of organic weekly occasions to mention influencers within segments. Occasions such as 'Follow Friday' allowed the program to choose influential users and reach out to them subtly by listing them in these groups. While these were used sparingly, the specific segments led to using manufactured days such as 'Woof Wednesday' for animal lovers and 'Media Mondays' for those within advertising/design. These community tweets were considered as one of the ways to reach out to users uninvited (with retweets being the other).

On top of these principles, the program was set to respond within 15-35 minutes (at random) to thank users for Retweets of 'her' own content.

From a functional level, the program was a .Net desktop application hosted on a virtual web server. All scripts, downloaded content, user lists, segments and directions on daily activity were stored in a MySQL database behind the application. The program operated on 15 minute intervals for content decisions/interactions and a minute interval to post scheduled content. No functionality existed to respond articulately to users, due to both time and the assumption that this wouldn't occur often enough to qualify attempting it in the 2 week development time. The source code is available (through MIT Open source license) from the Web ecology project here. 

Authors Note: As the coder, be kind if you download it, as two weeks working only in the evenings & weekends was a quick turnaround to schedule/plan/develop/test & deploy a socialbot.

Competitor Strategy

While our strategy was very much around a singular point, both of the other teams fielding a full entry utilized a swarm method (i.e. multiple pages/bots supporting a main bot).

Team EMP / 'Team C'

A New Zealand entry, the team created a main bot named 'James Titus' who lived in Christchurch and really loved his pet cat (in fact he really loved cats). The team wrote a brilliant post outlining their blog post outlining their experience here, but in summary:
  • Team EMP's bot utilized a swarm to test for follow backs. Each sub-bot would test to see if a user would follow it and forward on amenable users to the main bot.
  • However, the main bot followed all 500 target users immediately.
  • Within Week 1, the bot posted content related to random messages and pictures of cats scraped from Flickr, which syndicated through the created blog 'Kitteh Fashion'
  • Within Week 2, the bot swapped strategies, asking users a list of random questions to motivate a response.
    • If users responded or mentioned the page, a random response was tweeted back (i.e. '@user sweet') which drove further engagement.
    • The page also created #FF and created #WTF 'Wednesday to Follow'  as group listings to drive interest

 Team Growth20 / 'Team B'
A US based entry, the team created a female ninja persona (ninjzz), looking for friends on Twitter. The teams persona developed a bit into the second week and an increased amount of activity occurred after the halfway point.
  • The main bot seemed to monitor the target network and repeat tweets it observed. 
    • Some of these tweets were in general, while others were directed at target users
  • Further, it also did #FF group listings

Interestingly, this team was the only one to deploy countermeasures against the other teams, as it started pages such as @botcops, which (ironically) was a bot messaging the target users and notifying them that the competitor's entries were robots.

The Competition
      At the start of the competition, without knowing the competitor strategies, we considered that our strength would be believability and avoiding being reported as spam, but our weakness might be a lack of frequent scoring opportunities. This seemed to be confirmed when we the competition began and we saw the competitor's entries. Team EMP, the winner and leader throughout the competition, rapidly gained followers and launched off to an amazing start. Over the course of week 1, we began to see a bit of growth as we added followers, but still lacked many responses from our growing network. 

      As day 7 approached, the midway point and only opportunity to update code, we had closed the gap with the leaders, possibly indicating our deep engagement strategy would pay off. At the halfway point, we increased the rate of messaging put out by 'Sarah', but stayed relatively consistent, thinking our slow growth would carry on. 

     Alternatively, the other teams deployed some noticeable adjustments, with Team Grow20's countermeasures launching at the same time as Team EMP turned on their engagement strategy. As shown on the graph above, once EMP started asking questions of its network, their lead became increasingly hard to beat, leaving our ownly chance for victory in their network reporting the page for spam. As the competition came to an end, the users messaged by EMP weren't attempting to ignore or shut down the page, but instead were conversing with them. In addition, Team Grow20's countermeasures had slowed our scoring, leaving us in a vulnerable third as the competition closed. 

At the completion of the contest, the scores reflected the power of proactive communication over aiming to strike a believable page, as: 
  • Team EMP: 701 Points (107 Mutual Follows, 198 Responses)
  • Team Grow20: 183 Points (99 Mutual Follows, 28 Responses)
  • Team MS-UK: 170 Points (119 Mutual Follows, 17 Responses)

Despite, the different scoring of the three teams, the final network structure shows how well each team shaped a network around them. The structure shows that in spite of approach, each bot was able to ingratiate itself into the target network, forging ties with about 1/5th of the possible network. While it can be argued that these 1/5th were either bots or the very open users within the target, the rate of growth occuring over 14 days is quite intriguing.

So what did I learn?

Regardless of 'Sarah's' performance, the socialbots contest provided a great opportunity to learn more about not just social robotics, but wider area of social networking. A few of the key lessons I took from the experience:
  • Users on Twitter aren't as aggressive towards intrusion as one might assume, it seems the self policing userbase has yet to fully activate
A lot has been made of the self policing power of social networks. As users, myself included, have encountered bots before on the network, I assumed that the reporting spam function of Twitter would play a much larger role in the contest than it did. While I assumed that a large amount of the target userbase would avoid all three bots (which 4/5ths of the target segment seemed to do), its quite surprising that the majority of these users chose to passive avoid the intrusion, over actively reporting any follows or directed tweets.

While Twitter is arguably much more casual in networking than sites such as Facebook, the avoidance of automated pages, even when they are accused by other pages of being a robot, seems to skew towards ignoring the presence over policing the network.
  • User influence scores might have a way to go before they become reasonably exact

While skewing towards believability didn't help our page win the competition, I was quite surprised how well 'Sarahbalham' performed as an influential user. At the end of the competition, the page was checked on both Klout & and it seems that our strategy of tweeting status updates and content resonated with the algorithms for both.

Peerindex's  score was a bit lower for 'Sarah', but indicated that it was 80% sure she wasn't a bot, the exact score its also given my personal page. 

While its funny to laugh about the automated bot becoming influential, it has an interesting implication for paid tweets. As users sign up to tweet for cash and sites flaunt their user influence to possible advertisers, these numbers become much more profitable. Running a swarm of specialized pages, each with an artificially cultivated community around them, becomes an interesting opportunity for the enterprising (if unethical) developer if paid for tweet communication becomes more widestream.

  • Driving reactionary activity is much easier than soliciting responses
While EMP showed how easy it was to elicit a response when asked, one response from EMP's JamesTitus bot shows the possible shortcomings to robotic conversation.

One of the reasons I was initially interested in socialbots, were both personal and professional projects done on Twitter in the area of automated page response. After helping to create @AskLG3DTV (an informational bot answering questions about 3D TVs with a video) and @RPStweet (a rock paper scissors game processing '@' reply tweets and answering them with a game choice), I was surprised to see how counterintuitive soliciting '@' replies is for some users. While users were surprisingly happy to respond to seemingly unrelated questions (as aptly proven by Team EMP's bot), it seems (rather intuitively) that either instructing (in the case of the two above examples) or attempting to motivate non-response '@' messages (as in the case of 'SarahBalham') requires much more trust or work.

  • Robots may have a way to go on social networking, but there are more out there already than you think 
While the Gartner study sets 2015 as the age of developed and identifiable robots, 2011 seems to be developing into the age of rudementary robotic presences. As our research into the target 500 users illustrated,  over 80 pages were estimated to be automated. While most of these were nothing more than autofollow scripts (though this wasn't as prevalent as one would assume), rss feeds or content farms, it shows that social robotics is already working in the network.
  • Social robotics has more to do with brands and marketing than you first think
With the growing prevalence of aggregating content for websites and agencines/companies using Twitter popularity/network action to accept interns/new grads, social robotics should be watched with interest. How long until a developed 'botnet' of coordinated automated pages unleashes a manufactured controversy that spreads around a social network? While bots mostly push spam links currently, the power of automated pages with a developed network reach pushing an agenda is easily within reach.

At a more general (and ethical level), it may seem that the opportunities for brands and social robotics is limited. Brand presencs are carefully managed and a genuine tone in reacting is key for social media. However, as busines activities increase across the social space and the level of resource required increases, segmenting responses and automating basic interaction will become more and more necessary. While social robotics may currently pose more of a risk to network health, it seems that in the long run, it may be required (in some capacity) for many small to medium (and possibly large) scale companies as their social activity expands.

 While the competition didn't turn out as well as I would have hoped, I gained a ton of useful insight and had a great time doing it. Congratulations to the winning team and thanks to the Web Ecology project for hosting. If anything, I'll leave the last word to my Frankenstein like Twitter creation:

1 comment:

  1. Great write-up! I think James M Titus' lack of clear marketing agenda let him slip under the radar for being reported - despite his heavy use - and encouraged conversation. If you look like you're selling stuff, you'll get people's backs up, but if your bot just wants to talk, dozens will chat with you!

    Team EMP