The introduction of
new technologies or adaptation of existing technologies within an organisation
can bring with it resistance from the different layers of the organisation;
from the management resisting introduction to the operational staff resisting
the use/uptake. There are many reasons
for management/operational staff resisting changes in an organisation and
approaches to mitigating them (Davidson, 2009), so the focus will be those
specific to data mining.
Privacy
Data mining against
individuals inevitably makes use of large amounts of personal data (Busovky,
2011), with this brings concerns of data privacy and the high profile data
breaches reported in the media (BBC, 2009).
Wilder and Soat (Wilder
et al, 2001) cite an example of N2H2 a Seattle based company that provides
internet filtering content software to schools, using that data they planned to
sell the anonymised aggregated data.
“N2H2 began marketing the data, called Class Clicks, that it’s filtering
tools collected on the website usage trends of elementary and high school
students. The data contained no names or
personal information and complied with the new deferral Children’s Online
Privacy Protection Act. Yet N2H2’s new
line of business brought such loud howls of protest from online privacy
advocates that the company scrapped the effort”
A fictitious example
is given by Wang and Liu (Wang et al, 2011) to illustrate the real privacy
concerns that could exist when mining a medical database.
“released mining output can also be leveraged to uncover some
combinations of symptoms that are so special that only rare people match them”
“which qualifies as a severe threat to individuals privacy”
Many countries have
legislation in place to protect individuals and ensure organisations put in
place safe guards and controls to protect personal data, the main act in the
United Kingdom being the Data Protection Act 1998, which covers many areas of
data protection. Specific to privacy the
seventh principle of the act applies;
“Appropriate
technical and organisational measures shall be taken against unauthorised or
unlawful processing of personal data and against accidental loss or destruction
of, or damage to, personal data” (Information Commissioners
Office A, 2012).
There are techniques
to prevent unauthorised disclosure of personal data through data mining:
Anonymisation
Privacy can be
ensured through anonymising data, however simply removing customer reference
number/names is not in itself always sufficient as discussed by Vaidya et al
(2005 p.8) “just because the individual
is not identifiable in the data is not sufficient; joining the data with other
sources must not enable identification”.
An established
approach to ensure that data is truly anonymised is “k-anonymity” which is a
process that involves the grouping of individuals together within the data
(Vaidya et al, 2005 p.8).
Suppression can also be
introduced to hide groups/data that consist of small and easily identified
sample sizes; this requires footnotes and an accompanying narrative to explain
that this has been done; to prevent a misunderstanding of any summarised data
(Vaidya et al, 2005 p.8).
Clearly defined use of data
Another method to
control concerns about privacy is to clearly outline to the data subjects at
point of data collection what the data will be used for and the associated
benefits to them.
This is evidenced by
the success of the Tesco clubcard scheme and its changed perception amongst it
customers and it’s the separation of its mailings from previously “dumb” junk
mail.
“research consistency suggest that customers perceive the quarterly
mailing from Tesco clubcard not as ‘junk mail’, but as personal mail” (Humby
et al, 2004 p.116).
An example of poor
understanding between the data subject and the organisaiton carrying out the
data mining process is the case of pharmacies in the US that were selling data
gathered from prescriptions to pharmaceutical companies to be data mined. The pharmaceutical companies were then using
that data to target marketing/sales towards specific doctors, based on the
prescriptions they had written (Silverman, 2008). The data subjects in this case (the doctors)
represented by the American college of Physicians have opposed the use of this
data for marketing (Walker, 2011).
However the example
also speaks about the use of the data for other purposes;
“direct safety messages to doctors, to track disease progression, to aid
law enforcement, to implement risk-mitigation programs, and to do
post-marketing surveillance required by the FDA” (Walker, 2011)
It is where there is
a benefit and consent between the data subject, the organisation and its use of
the data mining, that there is less likelihood of resistance to data being
mined.