This isn't really a blog, its more of a holding page for my domain (seems a shame not to have a page), if I know you then add me on either LinkedIn or Facebook (links are on the right), however if I don't know you then I won't add you!

Sunday 19 May 2013

Dissertation series - Data mining overview

My masters dissertation specialised in the topic of data mining and I will be publishing some of the content that I assembled as part of my research for others to use.


Data mining is a discipline within IT that involves the manipulation of data through algorithms to extract undiscovered patterns/correlations (Dunham, 2003 p.3), often in large and diverse databases (Thuraisingham, 1999 p.2).  “Data mining uses classification algorithms and learns from the related or linked data” (Thiruvadi and Patel, 2011 p.711).
Gouda and Hassaan (2011 p.179) give a straight forward example of data mining;
“Let’s say the database records the books bought by each customer over a period of time.  The discovered patterns are the sequences of books most frequently bought by the customers.  An example could be that, “70% of the people who buy introduction to visual basic and introduction to C++ also buy introduction to Perl within a month.”  Stores can use these patterns for promotions, shelf placement etc.”
Data mining is distinctly different to traditional data manipulation (i.e. the use of SQL, a data analyst and reporting tools) where data analysts have to undertake a large amount of manual work (Daniel, 2004 p. xi) and usually have something specific that they are looking for (Dunham, 2003 p. 3) which would fail to uncover certain patterns.  Instead data mining makes use of a more automated and open minded approach to the identification of patterns (Hanna, 2004 p. 132).
Data mining can also be referred to as Knowledge Discovery in Databases (Baker, 2011), machine learning (Alpaydin, 2004 p.2) and data archaeology (Thuraisingham, 1999 p.2).
The application of data mining can be seen as an additional way of gaining further value from data collected and held “enhance the value of existing information resources” (Thearling, 2010) a common theme in area of data mining is the plethora of data stored by organisations “data rich” (Al-atta, 2011) and the inability to do anything with it (Kushima et al, 2011 p.215).

3 comments:

  1. Yes I am taking out small chunks of my literature review into datamining and certain select portions that made use of SSAS and will be publishing them over the coming weeks.

    Subscribe for updates

    ReplyDelete
  2. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. a10 innosilicon

    ReplyDelete
  3. Hello I am so delighted I located your blog, I really located you by mistake, while I was watching on google for something else, Anyways I am here now and could just like to say thank for a tremendous post and a all round entertaining website. Please do keep up the great work. mining

    ReplyDelete