Roobaroo: The Power of Democratized Data

Tuesday, September 25, 2018

The Power of Democratized Data

This article was first published by Global Association of Risk Professionals, Risk Intelligence, on September 21, 2018. Coauthor: Sanjay Fuloria

https://www.garp.org/#!/risk-intelligence/technology/data/a1Z1W000004B9QmUAK

How readily available data sets and crowdsourcing can promote problem-solving and policy solutions

While reading a post on the Reddit social news aggregation site, we were amazed by a link to download government data. The data pertained to parking tickets issued by the Chicago Police Department over a decade. The data was anonymized but had all the details regarding the reasons for the tickets and demographic details of the violators.

The local government put this data up in a comma separated value (csv) format to invite inputs from researchers and intellectuals for help in policy formulation. This would improve the performance of the police department. They could answer questions, for example, about gender or racial bias in issuing parking tickets.

This approach should be used by the Indian government at the Center, and by state governments and local civic bodies. Anonymity is important to maintain, but other than that, there is no reason why this data should not be made available to the public at large for analysis. While the government has started data.gov.in, the amount of data available there is pretty limited.

Apart from that there is this need for accuracy. We compared the actual seasonal rainfall data for the state of Telangana on two government websites, www.imd.gov.in and http://www.tsdps.telangana.gov.in/.

They were not the same. As per IMD, the actual rainfall to date was 665.8 millimeters as of September 5, 2018, which is 6% in excess of the 50-year Long Period Average (LPA). The other website showed rainfall of 584.8 millimeters as of September 7, 2018, which is a 7% deficit.

IMD publishes data weekly, whereas the Telangana data is daily. Which to believe? It looks like the two agencies don’t talk to each other. No wonder the forecasting models used by these government agencies are far from accurate.

The data availability would enhance the scope for better inputs to shape public policy. The areas could be as diverse as traffic management, crime control, queuing in hospitals, school admissions, etc. The government could in fact have contests, with prizes given for best policy recommendation or for the best machine learning algorithm to solve a particular problem.

Downloadability

The government could take a cue from the likes of Kaggle, where such contests are the norm. On Kaggle, a lot of companies provide their data free of charge to solve their problems. The only requirement is the availability of data in an easy to download format.

The census data available through www.censusindia.gov.in is very difficult to download. The navigation of the website is itself a bit challenging. The data sets are distributed into different files. The best alternative would be to make it available in one file and at the village level. Such granularity is needed for analysis and to make sense of data.

Census data, if easily downloadable, could lead to a lot of analysis. Much of it would be superficial, but some would definitely be meaningful and could be used by the government to inform its policy choices.

As James Surowiecki says in The Wisdom of Crowds, “A diverse collection of independently deciding individuals make better predictions than individuals or even experts.” The wisdom of crowds can be leveraged.

While financial data is made available by the government, data on other socially relevant fields can be hard to come by. With so many open source tools available for handling the data, it has become relatively easier to make sense of data. Moreover, there are many MOOCs (Massive Open Online Courses) available to whoever is interested in learning data handling.

With the democratization of education, it is high time the government thinks of democratizing data. We are not saying data privacy is not important, but as long as the details don’t identify an individual, it should not be an issue. Let a hundred flowers blossom.

Roobaroo

Tuesday, September 25, 2018

The Power of Democratized Data

No comments:

Labels

Blog Archive

Total Pageviews

Followers