【问题求解&思路探讨】数据准备过程中的数据选取
今天在《ApplyingDataMiningTechniquesUsingSASEnterpriseMiner》2-25中看到这么一段话:
"The data used to build the model often does not represent the true target population.For example, in credit scoring, information is collected on all applicants. Some are rejected based on the current criterion. The eventual outcome (good/bad) for the rejected applicants is not known. If a prediction model is built using only the accepted applicants, the results may be distorted when used to score future applicants. Undercoverage of the population continues when a new model is built using data from accepted applicants of the current model. Credit-scoring models have proven useful despite this limitation.
Reject inference refers to attempts to include the rejected applicants in the analysis.There are several ad hoc approaches, all of which are of questionable value (Hand 1997). The best approach (from a data analysis standpoint) is to acquire outcome data on the rejected applicants by either extending credit to some of them or by purchasing follow-up information on the ones who were given credit by other companies."
有感而发,昨天老板问我,对于客户接触偏好模型的应用情况,“目标客户群选定之后在渠道执行的时候会根据这个模型的结果进行匹配,然后执行结果会返回进行模型优化,形成闭环。”
这个时候问题出现了,对此渠道响应度较高的客户会不断的接触,然后形成良性循环,响应度较低的客户连被接触的机会都没的,所谓的良性循环+恶性循环,促进两极分化。
这个问题如何解决呢?
"The data used to build the model often does not represent the true target population.For example, in credit scoring, information is collected on all applicants. Some are rejected based on the current criterion. The eventual outcome (good/bad) for the rejected applicants is not known. If a prediction model is built using only the accepted applicants, the results may be distorted when used to score future applicants. Undercoverage of the population continues when a new model is built using data from accepted applicants of the current model. Credit-scoring models have proven useful despite this limitation.
Reject inference refers to attempts to include the rejected applicants in the analysis.There are several ad hoc approaches, all of which are of questionable value (Hand 1997). The best approach (from a data analysis standpoint) is to acquire outcome data on the rejected applicants by either extending credit to some of them or by purchasing follow-up information on the ones who were given credit by other companies."
有感而发,昨天老板问我,对于客户接触偏好模型的应用情况,“目标客户群选定之后在渠道执行的时候会根据这个模型的结果进行匹配,然后执行结果会返回进行模型优化,形成闭环。”
这个时候问题出现了,对此渠道响应度较高的客户会不断的接触,然后形成良性循环,响应度较低的客户连被接触的机会都没的,所谓的良性循环+恶性循环,促进两极分化。
这个问题如何解决呢?
还没人转发这篇日记