Herself’s Artificial Intelligence

Humans, meet your replacements.

Herself’s Artificial Intelligence header image 1

Data Mining

Most datamining is either done using private corporate databases, online government databases, or with web bots, spiders and scrapers. RSS has made data mining the web trivial with PERL. I’m told it is trival with PHP also, I’m still experimenting with that.

Currently, with the help of computers, most fields of science, the government and businesses are collecting data faster than they can comb through it. Some agencies have what would be hundreds of years worth of data if it had to be parsed by humans. So we need to use artificially intelligent datamining to sort the data, develop useful informative rules about the data, and or put it into useful formats for us humans.

This task of artificial intelligence is often put under the category of ‘machine learning’. Sometimes a set of rules is used. The rules may be created by an expert ( domain knowledge ) or discovered through machine learning using statistics.

Problems with this type of machine learning include coming up with an insane number of rules that are far too specific ( over fitting ) and using example data that skews the learning. Other problems include when do you decide you have the best set of rules? At what point is your algorithm good enough? Do you want all possible out comes or is it only specific outcomes you need? An example would be do you need 40 categories of healthy plant or only descriptions and diseases for unhealthy ones?

Datamining has four main styles of sorting through data. Classification: classes are presented and future data is to be sorted into one of the given classes. Association: associations between data are sought. Clustering: data is sorted into clusters usually using various traits as vectors. Prediction: in which some specific information, usually numeric is to be output.

Data details for data mining is often stored in ARFF Attribute-Relation File Format

More information:
Applications of Machine Learning and Rule Induction
Machine Learning ( Theory )
UT ML Group: Text Data Mining ( several papers here )
UCI Machine Learning Repository has over 160 data sets for you to use to test and develop your AI.

See also:
Electronic cop solves crimes
Finding new diseases for known cures

Tags: topics in artificial intelligence

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

You must log in to post a comment.