FP-Growth错误警告“检测到非二项属性”

TripartioTripartio 成员职位:33因素二世
你好,

在最新版本的RapidMiner 9.10.1中,我注意到FP-Growth上有一个错误的警告,这是以前没有的。下面是一个示例过程,说明了这个问题:


<?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程version = " 9.10.001”>
> <上下文
<输入/ >
<输出/ >
<宏/ >
> < /上下文

<参数键= " logverbosity " value = " init " / >
<参数键= " random_seed " value = " 1234 " / >
<参数键= " send_mail " value = "永远" / >
<参数键= " notification_email“价值= " / >
<参数键= " process_duration_for_mail " value = " 30 " / >
<参数键=“编码”值= "系统" / >
<过程扩展= " true " >


< /操作符>
<运算符激活="true" class="blending:pivot" compatibility="9.10.001" expanded="true" height="82" name=" pivot" width="90" x="179" y="34">
<参数键= " group_by_attributes " value = "发票" / >

<列出关键= " aggregation_attributes " >
<参数键= "订单" value = "数" / >
< / >列表
<参数键= " use_default_aggregation " value = " false " / >
<参数键= " default_aggregation_function " value = "第一次" / >
< /操作符>

<参数键= " attribute_filter_type " value = "所有" / >
<参数键= "属性" value = " / >
<参数键= "属性" value = " / >
<参数键= " use_except_expression " value = " false " / >
<参数键= " value_type " value = " attribute_value " / >
<参数键= " use_value_type_exception " value = " false " / >
<参数键= " except_value_type " value = "时间" / >
<参数键= " block_type " value = " attribute_block " / >
<参数键= " use_block_type_exception " value = " false " / >
<参数键= " except_block_type " value = " value_matrix_row_start " / >
<参数键= " invert_selection " value = " false " / >
<参数键= " include_special_attributes " value = " false " / >
<参数键=“replace_what”值= "数\ \(订单)_ " / >
<参数键= " replace_by“价值= " / >
< /操作符>

<参数键= " attribute_name " value = "发票" / >
<参数键= " target_role " value = " id " / >
<列出关键= " set_additional_roles " / >
< /操作符>

<参数键= " return_preprocessing_model " value = " false " / >
<参数键= " create_view " value = " false " / >
<参数键= " attribute_filter_type " value = "所有" / >
<参数键= "属性" value = " / >
<参数键= "属性" value = " / >
<参数键= " use_except_expression " value = " false " / >
<参数键= " value_type " value = " attribute_value " / >
<参数键= " use_value_type_exception " value = " false " / >
<参数键= " except_value_type " value = "时间" / >
<参数键= " block_type " value = " attribute_block " / >
<参数键= " use_block_type_exception " value = " false " / >
<参数键= " except_block_type " value = " value_matrix_row_start " / >
<参数键= " invert_selection " value = " false " / >
<参数键= " include_special_attributes " value = " false " / >
<参数键=“默认”值= " 0 " / >
<列出关键= "列" / >
< /操作符>
<运算符激活="true" class="numerical_to_binominal" compatibility="9.10.001" expanded="true" height="82" name="Numerical to Binominal" width="90" x="715" y="136">
<参数键= " attribute_filter_type " value = "所有" / >
<参数键= "属性" value = " / >
<参数键= "属性" value = " / >
<参数键= " use_except_expression " value = " false " / >
<参数键= " value_type " value = "数字" / >
<参数键= " use_value_type_exception " value = " false " / >
<参数键=“except_value_type”值= "真正的" / >
<参数键= " block_type " value = " value_series " / >
<参数键= " use_block_type_exception " value = " false " / >
<参数键= " except_block_type " value = " value_series_end " / >
<参数键= " invert_selection " value = " false " / >
<参数键= " include_special_attributes " value = " false " / >
<参数键= "分钟" value = " 0.0 " / >
<参数键=“max”价值= " 0.0 " / >
< /操作符>

<参数键= " attribute_filter_type " value = " value_type " / >
<参数键= "属性" value = " / >
<参数键= "属性" value = " / >
<参数键= " use_except_expression " value = " false " / >
<参数键= " value_type " value = "二项式" / >
<参数键= " use_value_type_exception " value = " false " / >
<参数键= " except_value_type " value = "时间" / >
<参数键= " block_type " value = " attribute_block " / >
<参数键= " use_block_type_exception " value = " false " / >
<参数键= " except_block_type " value = " value_matrix_row_start " / >
<参数键= " invert_selection " value = " false " / >
<参数键= " include_special_attributes " value = " false " / >
< /操作符>


<参数键= " item_separators " value = " | " / >
<参数键= " use_quotes " value = " false " / >
<参数键= value =“quotes_character“;" / >
<参数键= " escape_character " value = " \ " / >
<参数键= " trim_item_names " value = " true " / >
<参数键= " min_requirement " value = "支持" / >
<参数键= " min_support " value = " 0.05 " / >
<参数键= " min_frequency " value = " 100 " / >
<参数键= " min_items_per_itemset " value = " 1 " / >
<参数键= " max_items_per_itemset " value = " 0 " / >
<参数键= " max_number_of_itemsets " value = " 1000000 " / >
<参数键= " find_min_number_of_itemsets " value = " true " / >
<参数键= " min_number_of_itemsets " value = " 100 " / >
<参数键= " max_number_of_retries " value = " 15 " / >
<参数键= " requirement_decrease_factor " value = " 0.9 " / >
<枚举关键= " must_contain_list " / >
< /操作符>















> < /过程
< /操作符>
> < /过程


正如您所看到的,即使我只选择了二项式操作符,我仍然会得到一个警告,即检测到“非二项式属性”:


经过一些测试后,问题似乎是带有ID角色(Invoice)的属性触发了这个错误。也就是说,FP-Growth操作符检测到该ID不是二项式的,因此标记该警告。然而,虚假警告似乎并不影响FP-Growth算子在9.10.1中的正确操作;虽然有警告,但运行得很好。

当我将Select Attributes调整为“包括特殊属性”(即消除特殊的ID属性),那么FP-Growth警告就消失了:



因此,这似乎是一个错误的错误警告,否则不会影响操作人员的正确操作。请问是否有人可以确认这确实是一个bug,也就是说我并没有误解操作人员的正确操作。这里是报告这种错误的正确位置吗?

最佳答案

  • MartinLiebigMartinLiebig 管理员,版主,员工,RapidMiner认证分析师,RapidMiner认证专家,大学教授职位:3286年RM数据科学家
    解决方案接受
    @Tripartio
    我可以开一张票给它,但需要一段时间才能修好。
    最好的
    马丁
    - RapidMin乐鱼体育官方apper数据科学服务主管-
    德国多特蒙德

答案

  • MartinLiebigMartinLiebig 管理员,版主,员工,RapidMiner认证分析师,RapidMiner认证专家,大学教授职位:3286年RM数据科学家
    你好,
    它抱怨本我。显然,这里对元数据的检查也会检查特殊属性,这不是您真正想要的。但这些元数据警告只是空谈,往往被忽视。

    BR,
    马丁
    - RapidMin乐鱼体育官方apper数据科学服务主管-
    德国多特蒙德
  • TripartioTripartio 成员职位:33因素二世
    @mschmitz是的,但是之前版本的RapidMiner从未抱怨过这一点。我教的是RapidMiner,这些错误的警告给学生造成了很多困惑。因此,与其简单地忽略它,我更希望删除错误警报。这里是正式提交bug报告的地方吗?
  • TripartioTripartio 成员职位:33因素二世
    谢谢,@mschmitz.只要它在RapidMiner的下一个版本(可能是9.10.002?)得到修复报告,那么现在就可以了。注意,这应该表明这是一个回归(即,一些工作良好之前在某个更新中被破坏);也许这些信息可以帮助bug修复者更容易地识别问题。
登录注册置评。