unearthing the dna of a distruptive event: SAM

By Michelle Ferguson

James Neufeld has long recognized the potential of user-generated content (UGC). 

Before launching SAM, an Edmonton-based tech startup, Neufeld worked in broadcast journalism, where he was part of a team who conceived and designed the world’s first social integration platform for TV. This system allowed television presenters to integrate tweets into their reports seamlessly.

“Watching the rise of platforms like Twitter was always a really interesting thing to everybody sitting in a newsroom,” says Neufeld. “Time and time again, Twitter proved itself to be a major source of information for breaking news events and updates from on the ground sources.”

Given the platform’s popularity, the sheer volume and noise always meant it was difficult to identify this high-quality content.

“We all knew it was there, but it [took] a lot of luck and a fair amount of work… to find [it],” says Neufeld.

SAM makes this process effortless. Unlike manual search tools like TweetDeck, SAM is not limited to the knowledge or input of an operator; it uses artificial intelligence (AI) and machine learning to autonomously sift through millions of tweets daily and identify disruptive events within minutes of their occurrence.

The system thinks like a journalist — once it detects an anomaly (i.e. mudslide, power outage, shooting), it triangulates location data, identifies corroborating Tweets and scores the authenticity of each user, before triggering an alert.

To ensure SAM can tell the difference between a Tweet describing Lebron’s performance (‘Lebron is on fire’) and one that is reporting an actual blaze (‘The U of A is on fire’), the data science and engineering teams — led by founding member and chief technical officer Sean Solbak — had to create custom AI stack fully capable of understanding all the slang, hashtags and colloquialisms used on social media when describing crisis events.

 Sean Solbak, CTO - SAM

Sean Solbak, CTO - SAM

“Natural Language Processing is an important component of the SAM AI stack. But most of the available models are trained off long form English literature: books, novels or large articles,” explains Solbak. “With Twitter we’re forced to make decisions off three to ten words. No off-the-shelf systems came close to understanding the value hidden in hashtags, @handles and regional local geo references that our system craves.”

The SAM real-time knowledge engine (aka SAM Alerts) launched in beta this February. Initially, the system functioned as a social newsgathering platform geared towards media. SAM’s workflow tool made UGC teams 10x more efficient in identifying and verifying social content and was widely adopted within large newsrooms like the AP, Bloomberg, Reuters and the New York Times. Even with this massive improvement in efficiency, the SAM team would often receive the same feedback: customers wanted access to information faster while requiring less manpower.

The team started dabbling with automation — tapping into the social media content its users had curated over the years to understand the DNA of breaking news content on social media networks  (keywords, location descriptors i.e. #yeg), slang and unique ways eyewitness describe a major event). SAM’s database is the largest data set of verified user generated content — giving them a unique vantage point of understanding how news events take shape in real-time.

SAM_Low Res_4427.jpg

Launching a breaking news alert tool has opened market opportunities beyond news and journalism, says Neufeld. SAM now works with security teams, emergency responders, banks, analysts, NGOs and other entities that depend on the delivery of fast, accurate data. (SAM typically detects events an hour before major media.)

With the launch of the new product, SAM has been rapidly growing its footprint within the U.K. The international office serves as the base for SAM’s sales and data teams, while Edmonton purposefully remains the startup’s technical hub.

“The University of Alberta and the startup community, led by organizations like Startup Edmonton, is producing fantastic talent that we definitely take full advantage of,” Neufeld says. “It’s extremely competitive globally to find good AI and engineering talent. There aren’t that many cities in the world you can go to that deliver what Edmonton can.”


Glossary:

SAM_Low Res_4388.jpg

Natural Language Processing

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken. NLP is a component of artificial intelligence (AI).

SAM_Low Res_4428.jpg

Stack

A tech stack is the combination of programming languages, tools, and frameworks that developers use to build applications.