This article was
first published by Global Association of Risk Professionals, Risk Intelligence,
on September 21, 2018. Coauthor: Sanjay Fuloria
How
readily available data sets and crowdsourcing can promote problem-solving and
policy solutions
While reading
a post on the Reddit social news aggregation site, we were amazed by a link to download
government data. The data pertained to parking tickets issued by the Chicago
Police Department over a decade. The data was anonymized but had all the
details regarding the reasons for the tickets and demographic details of the
violators.
The local
government put this data up in a comma separated value (csv) format to invite
inputs from researchers and intellectuals for help in policy formulation. This
would improve the performance of the police department. They could answer
questions, for example, about gender or racial bias in issuing parking tickets.
This approach
should be used by the Indian government at the Center, and by state governments
and local civic bodies. Anonymity is important to maintain, but other than
that, there is no reason why this data should not be made available to the
public at large for analysis. While the government has started data.gov.in, the amount of data available there is pretty limited.
Apart from
that there is this need for accuracy. We compared the actual seasonal rainfall
data for the state of Telangana on two government websites, www.imd.gov.in and http://www.tsdps.telangana.gov.in/.
They were not the same. As per IMD, the actual rainfall to date was 665.8 millimeters as of September 5, 2018, which is 6% in excess of the 50-year Long Period Average (LPA). The other website showed rainfall of 584.8 millimeters as of September 7, 2018, which is a 7% deficit.
IMD publishes data weekly, whereas the Telangana data is daily. Which to believe? It looks like the two agencies don’t talk to each other. No wonder the forecasting models used by these government agencies are far from accurate.
They were not the same. As per IMD, the actual rainfall to date was 665.8 millimeters as of September 5, 2018, which is 6% in excess of the 50-year Long Period Average (LPA). The other website showed rainfall of 584.8 millimeters as of September 7, 2018, which is a 7% deficit.
IMD publishes data weekly, whereas the Telangana data is daily. Which to believe? It looks like the two agencies don’t talk to each other. No wonder the forecasting models used by these government agencies are far from accurate.
The data
availability would enhance the scope for better inputs to shape public policy.
The areas could be as diverse as traffic management, crime control, queuing in
hospitals, school admissions, etc. The government could in fact have contests,
with prizes given for best policy recommendation or for the best machine
learning algorithm to solve a particular problem.
Downloadability
The
government could take a cue from the likes of Kaggle, where such
contests are the norm. On Kaggle, a lot of companies provide their data free of
charge to solve their problems. The only requirement is the availability of
data in an easy to download format.
The census
data available through www.censusindia.gov.in is very difficult to download. The navigation of the website is
itself a bit challenging. The data sets are distributed into different files.
The best alternative would be to make it available in one file and at the
village level. Such granularity is needed for analysis and to make sense of
data.
Census data,
if easily downloadable, could lead to a lot of analysis. Much of it would be
superficial, but some would definitely be meaningful and could be used by the
government to inform its policy choices.
As James Surowiecki says in The Wisdom of Crowds, “A diverse collection of independently deciding individuals make better predictions than individuals or even experts.” The wisdom of crowds can be leveraged.
As James Surowiecki says in The Wisdom of Crowds, “A diverse collection of independently deciding individuals make better predictions than individuals or even experts.” The wisdom of crowds can be leveraged.
While
financial data is made available by the government, data on other socially
relevant fields can be hard to come by. With so many open source tools
available for handling the data, it has become relatively easier to make sense
of data. Moreover, there are many MOOCs (Massive Open Online Courses) available
to whoever is interested in learning data handling.
With the
democratization of education, it is high time the government thinks of
democratizing data. We are not saying data privacy is not important, but as
long as the details don’t identify an individual, it should not be an issue.
Let a hundred flowers blossom.