"Filter text on regex."

mavi16abmavi16ab MemberPosts:13Contributor I
edited May 2019 in帮助

I want to find all text snippets containing 1 or several words via regex. if I write select Filter Examples, and set it to "Expression" and provide it with: finds(Text, "(?i)\blootbox|micro\b") it doesn't work, although it is syntactically correct.

If I remove |micro, it only returns all snippts that contain lootbox - why does it not return an example with one of them? If I use RapidMiner's regex checker on some dummy data it works with the match on both of them, just not with "Filter Examples".

Kindly help!

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hi,
    Try to use the following expression: finds(Text, ".*lootbox.*|.*micro.*")
    This will match all texts which contain either one of those strings surrounded by arbitrary other stuff. The process below shows a simple example.
    Hope this helps,
    Ingo
    <?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程版本sion="9.2.000">
     
       
       
       
     

     
       
       
       
       
       
       
       
         
           
           
           
           
           
           
           
           
           
           
           
           
           
           
           
         

         
           
           
           
           
           
           
         

         
         
         
         
         
       

     


    rfuentealba mavi16ab
  • mavi16abmavi16ab MemberPosts:13Contributor I
    edited March 2019
    Appreciate the input, but sadly this regex matches on anything, that contains those letters. Say I have the word microsoft - your regex would trigger that, but I'm only looking for an exact match :-)
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    I'm only looking for an exact match :-)

    Well, this expression actually IS an exact match ;-)

    So I assume you would like to only match if there is a non-word character before and after? Is that what you mean? In this case, the correct expression is finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*") - process below.

    Please note however that in this case you would also no longer find plurals easily, so for example "lootboxes" would not trigger this any longer.

    Cheers,
    Ingo

    <?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程版本sion="9.2.000">
     
       
       
       
     

     
       
       
       
       
       
       
       
         
           
           
           
           
           
           
           
           
           
           
           
           
           
           
           
         

         
           
           
           
           
           
           
         

         
         
         
         
         
       

     

    sgenzer
Sign InorRegisterto comment.