Data privacy is another huge concern, which increases as one equates such privacy with the power of Big Data. For electronic health records, there are strict laws governing what can and cannot be done. For other data, regulations, particularly in the United States, are less forceful. However, there is great public fear about the inappropriate use of personal data, particularly through the linking of data from multiple sources. Managing privacy is effectively both a technical and a sociological problem, and it must be addressed jointly from both perspectives to realize the promise of Big Data.
Take, for example, the data gleaned from location-based services. A situation in which new architectures require a user to share his or her location with the service provider results in obvious privacy concerns. Hiding the user’s identity alone without hiding the location would not properly address these privacy concerns.
An attacker or a (potentially malicious) location-based server can infer the identity of the query source from its location information. For example, a user’s location information can be tracked through several stationary connection points (e.g., cell towers). After a while, the user leaves a metaphorical trail of bread crumbs that lead to a certain residence or office location and can thereby be used to determine the user’s identity.
Several other types of private information, such as health issues (e.g., presence in a cancer treatment center) or religious preferences (e.g., presence in a church), can also be revealed by just observing anonymous users’ movement and usage pattern over time.
Furthermore, with the current platforms in use, it is more difficult to hide a user location than to hide his or her identity. This is a result of how location-based services interact with the user. The location of the user is needed for successful data access or data collection, but the identity of the user is not necessary.
There are many additional challenging research problems, such as defining the ability to share private data while limiting disclosure and ensuring sufficient data utility in the shared data. The existing methodology of differential privacy is an important step in the right direction, but it unfortunately cripples the data payload too severely to be useful in most practical cases.
Real-world data are not static in nature, but they get larger and change over time, rendering the prevailing techniques almost useless, since useful content is not revealed in any measurable amount for future analytics. This requires a rethinking of how security for information sharing is defined for Big Data use cases. Many online services today require us to share private information (think of Facebook applications), but beyond record-level access control we do not understand what it means to share data, how the shared data can be linked, and how to give users fine-grained control over this sharing.
Those issues will have to be worked out to preserve user security while still providing the most robust data set for Big Data analytics.
Taken from : Big Data Analytics: Turning Big Data into Big Money
0 comments:
Post a Comment