The Insurance Company (TIC) Benchmark

Original Problem Task Description

 

(This was the original text at the CoIL Challenge 2000 website. The Challenge is closed now.)

Direct mailings to a company’s potential customers – “junk mail” to many – can be a very effective way for them to market a product or a service. However, as we all know, much of this junk mail is really of no interest to the people that receive it. Most of it ends up thrown away, not only wasting the money that the company spent on it, but also filling up landfill waste sites or needing to be recycled. 

If the company had a better understanding of who their potential customers were, they would know more accurately who to send it to, so some of this waste and expense could be reduced. Therefore, following a successful CoIL competition last year (See Synergy Issue 1, Winter 1999), CoIL has just announced a new competition challenge for 2000:

Can you predict who would be interested in buying a caravan insurance policy and give an explanation why?  

The competition consists of two tasks:

Participants need to provide a solution for both tasks. For both tasks only one winner will be chosen.

We want you to predict whether a customer is interested in a caravan insurance policy from other data about the customer. Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers of whom only the organisers know if they have a caravan insurance policy.

For the prediction task, the underlying problem is to the find the subset of customers with a probability of having a caravan insurance policy above some boundary probability. The known policyholders can then be removed and the rest receives a mailing. The boundary depends on the costs and benefits such as of the costs of mailing and benefit of selling insurance policies. To approximate this problem, we want you to find the set of 800 customers in the test set that contains the most caravan policy owners. For each solution submitted, the number of actual policyholders will be counted and this gives the score of a solution. Only the indexes of the selected records need to be sent in, assuming that the first record has index number 1 (e.g. 1,7,24,…,3980,4000). Please also mention the technique or algorithm used. The candidate winner for the prediction task will need to motivate his or her approach in a short paper (right after the closing of the deadline and before the CoIL Symposium).

The purpose of the description task is to give a clear insight to why customers have a caravan insurance policy and how these customers are different from other customers. Descriptions can be based on regression equations, decision trees, neural network weights, linguistic descriptions, evolutionary programs, graphical representations or any other form. The descriptions and accompanying interpretation must be comprehensible, useful and actionable for a marketing professional with no prior knowledge of computational learning technology. Since the value of a description is inherently subjective, submitted descriptions will be evaluated by the jury and an expert in insurance marketing.

Peter van der Putten (putten@liacs.nl; back to the TIC homepage)