Pauls Place: Dissertation series - Data mining overview

Sunday 19 May 2013

Dissertation series - Data mining overview

My masters dissertation specialised in the topic of data mining and I will be publishing some of the content that I assembled as part of my research for others to use.

Data mining is a discipline within IT that involves the manipulation of data through algorithms to extract undiscovered patterns/correlations (Dunham, 2003 p.3), often in large and diverse databases (Thuraisingham, 1999 p.2). “Data mining uses classification algorithms and learns from the related or linked data” (Thiruvadi and Patel, 2011 p.711).

Gouda and Hassaan (2011 p.179) give a straight forward example of data mining;

“Let’s say the database records the books bought by each customer over a period of time. The discovered patterns are the sequences of books most frequently bought by the customers. An example could be that, “70% of the people who buy introduction to visual basic and introduction to C++ also buy introduction to Perl within a month.” Stores can use these patterns for promotions, shelf placement etc.”

Data mining is distinctly different to traditional data manipulation (i.e. the use of SQL, a data analyst and reporting tools) where data analysts have to undertake a large amount of manual work (Daniel, 2004 p. xi) and usually have something specific that they are looking for (Dunham, 2003 p. 3) which would fail to uncover certain patterns. Instead data mining makes use of a more automated and open minded approach to the identification of patterns (Hanna, 2004 p. 132).

Data mining can also be referred to as Knowledge Discovery in Databases (Baker, 2011), machine learning (Alpaydin, 2004 p.2) and data archaeology (Thuraisingham, 1999 p.2).

The application of data mining can be seen as an additional way of gaining further value from data collected and held “enhance the value of existing information resources” (Thearling, 2010) a common theme in area of data mining is the plethora of data stored by organisations “data rich” (Al-atta, 2011) and the inability to do anything with it (Kushima et al, 2011 p.215).

3 comments:

paul21 May 2013 at 18:41
Yes I am taking out small chunks of my literature review into datamining and certain select portions that made use of SSAS and will be publishing them over the coming weeks.

Subscribe for updates
ReplyDelete
Replies
seo master10 March 2021 at 10:19
Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. a10 innosilicon
ReplyDelete
Replies
SHAHZAIB11 April 2021 at 11:10
Hello I am so delighted I located your blog, I really located you by mistake, while I was watching on google for something else, Anyways I am here now and could just like to say thank for a tremendous post and a all round entertaining website. Please do keep up the great work. mining
ReplyDelete
Replies

Add comment