Automatic generation of relational attributes: An application to product returns
Samorani, M., Ahmed, F. and Zaïane, O.R.
2016 IEEE International Conference on Big Data (Big Data), 5-8 Dec. 2016
Although statistical and machine learning methods require the input data to be in a tabular format, in real-world applications data are often stored across several tables in a relational database. How to build a single mining table from a relational database is a critical pre-processing step of any classification method, because including the right attributes may dramatically boost the accuracy of the classifier. We propose a methodology and implement a software program, Dataconda, to automatically mine a relational database. The user selects a class attribute contained in a table of the database and the procedure builds and selects predictors by exploring the whole database and aggregating information, without any user intervention. For example, our procedure may find that the best predictor for “product return” is the proportion of products returned by the same customer in the past, even if the user has not built any such attribute. Our procedure produces more expressive attributes than existing methods. Our experiments on the ISMS Durable Goods Datasets, a publicly available data set of product returns in retailing, suggest that our method allows new knowledge to emerge.