Data is the world’s most potent and flourishing man-made resource.
Even the most mundane and routine tasks spew large volumes of data as a by-product. At first, these unsalted, flavorless residues deposited when people and organizations churn away their operations may look ordinary.
Plot twist: These heaps of data have been proved to be inherently predictive.
They’ve resulted in a gold rush of sorts, for data scientists to start digging up for insightful gems. Does crime increase after a pop concert? Do vegetarians have lesser heart attacks? Does your email address reveal your personality type? Does online dating result in successful relationships? Does a hurricane in Thailand cause a huge dip in the market at Wall Street?
We’ve entered a golden age of predictive discoveries, thanks to big data. The number-crunching frenzy has taken over the world as a colossal tycoon. Over a short period of time, we’ve birthed an unbelievable volume of colorful, valuable, and sometimes bizarre insights.
This post, we’ll be picking brains with the Data Analytics Sultan at Xtract.io, Ranga Eunny.
Why is it that businesses have got this sudden burst of interest in data sources not confined to their internal databases or traditionally relied upon data?
Ranga: The amount of human-generated data on Earth is set to grow from around 4.4 zettabytes in 2010 to 44 zettabytes by 2020. That is a quantum jump. This data is not in secret storages or under the lock and key of a few. This data is everywhere, on the internet, in devices, in sensors, and within the reach of everyone.
There is a lot of activity on the internet. The entire new economy has emerged with online commerce. The internet is now a primary space for transactions, interactions, and reactions. Governments are putting their data on the internet in the name of open data. Open source movement leverage the power of collaboration to standardize, share and utilize.
This standardization enabled information generation and consumption by removing boundaries and walls. Even ships and aircrafts are leaving digital footprints that can be accessed by anyone. It is an era of abundant data. Naturally, there is an all-round interest to use it and repurpose it.
You mentioned repurposing data. Could you elaborate on that?
Ranga: Take trading. Traditionally traders use data like historical time series, fundamentals and market estimates to model expected returns. But there is this well-documented study that used data on New York between 1927 to 1989 and associated market returns to cloud cover.
Similarly, high temperatures were found to be associated with low returns and vice versa. It basically hinges on the mood of humans affected by changing weather patterns which further affects their judgment and analytical abilities.
This is a case of alternative data, a dataset that goes beyond the traditional data. What was a meteorological data is now repurposed as a financial analysis input? Of course this is an academic study but currently, hedge funds managers are using other data beyond traditional market inputs and indicators to try and beat their benchmarks.
In fact, alternative data within hedge funds managers have gone beyond the peak of inflated expectations if you were to plot it on the Gartner hype cycle.
That’s a relatively new term in the business world- Alternative data. Can you explain the term and point out a few sources of alternative data?
Ranga: This term is specific to the financial markets and refers to any dataset that may be useful to investors which are outside the traditional realm of data references or framework. Basically, any data that can help evaluate a company in the absence of traditional metrics of growth like revenue or assets.
Satellite data, social media, customer reviews, credit card transaction data are all part of the alternative data for hedge fund managers. The usual examples given is that of satellite images of parking lots, color, and spread of crops for farm health, number of job listings, type of jobs listed, frequency, etc., can tell a lot about employee count and thereby on performance and growth.
These are not the days of tips, newspaper reports, price movements or gut instinct. The data is all lying out there on the internet. It is all about the models that can be built and the wherewithal to aggregate this information. The advent of data science as a profession and the expansion of computing power, cloud infra, and sharp web data aggregation tools all have enabled hedge fund managers to look beyond the traditional.
Out of all the mentioned data sources, web data is gaining more prominence. How can businesses tap into this wide resource of web data?
Ranga: The primary sources of alternative data are Credit and Debit card transactions, Web traffic, Geo Location, and Satellite data. Most of these data sets are rich in valuable content and are always a first-party or second-party source. This is the data most quants and fundamentals analysts seek. But this data comes at a cost and is usually very specific to some geographies and demographics.
Besides data protection laws can prevent the use or storage of personal data. The web, on the other hand, is relatively an open space or a public domain. When it comes to the web as a source, there is social media, public datasets, online retail, business websites, macroeconomic data, new product variants and pricing, hospitality bookings, opinions, polls, reviews, and much more.
By using crawled e-commerce product data, financial firms can notice fluctuations in product pricing or stock over time, which can offer further insight into the current state of a company they are keeping an eye on. Sentiment analysis techniques and algorithms reveal shifts in public opinion regarding a company or its products, which could often be a predicting factor into its future performance.
Jobs data reveal hiring patterns which can reveal a lot about expansions and product development.
Whoa! There’s so much you can do with web data! What stops businesses from tapping into this wealth of web data?
Ranga: For one, there is the case of application. Hedge fund and investment firms were driven by the need to go beyond their traditional data sources to beat their alphas and get more accurate, faster, or more granular insights into business performance. Similarly, credit rating firms too sought data beyond the usual to predict the creditworthiness of individuals and businesses, especially the ones that are not within the banking fold.
These firms were driven by their need to adapt to alternative data. Businesses must be able to visualize the competitive advantage that alternative data provides them, in order to push them for adoption. If you consider the examples I quoted, not all categories of businesses benefit from existing alternative data. Service industries, retail, travel and tourism, hospitality and automotive industries are perhaps some of the industries that have wide and deep footprints on the web.
Some industries like oil and gas rely on the market news as well as satellite data like the movement of tankers or the launch of new ocean tankers etc. E-commerce platforms find the web as the best source for competitive intelligence given that it is fairly easy to identify platform users and potential customers. There are some inherent challenges with alternative data. Firstly, data aggregation is not cheap.
Secondly, companies that build upon data sciences leverage data to invest a bigger part of their funding in hiring the best data scientist and investing in AI and big data infrastructure. They would not want to go far in experimenting or exploring with various datasets to build models. The expectation is that the market has the right data. The disconnect happens when the market is unable to give them the right data or the right volumes. After all business analytics is more random and chaotic than the traditional applications of data sciences like computational biology or astrophysics.
Therefore there is work from both sides – vendors must be able to talk the language with the engineers and data scientists be ready to support their hypothesis and the companies engage with vendors in creating the right ecology for data aggregation.
Liked what you read? Ranga has a very gripping Twitter handle. He’s constantly sharing exciting insights from his research. His enthusiasm is tangible and contagious. Feel free to chat with him on Twitter and LinkedIn.