How to clean tweets from hashtags and @

baranbaran MemberPosts:5Contributor II
edited November 2018 inHelp
Hi everybody
I tried for 3 days to clean tweets from hashtags and @ but I couldn' t. Is there anybody for help
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    Hi,

    Do you mean just getting rid of the symbols "@ and#" or do you also want to remove what is following after, e.g. "@ingomierswa"and "#datascience" should be completely removed?

    Both is easily possible with the operator "Replace" and a simple regular expression. Below is a small sample process showing you how this is done.

    Hope this helps,

    Ingo





    <宏/ >





    @ingomierswaon #datascience - end of tweet.""/>





    @|#"/>;


    @[a-zA-Z]*|#[a-zA-Z]*"/>;












  • baranbaran MemberPosts:5Contributor II
    Yes exactly Thank you I will try it tomorrow then edit this post.
  • HyramHyram MemberPosts:39Contributor II
    Hi@IngoRM. This worked thank you, but I'm left with characters other than letters. So this clears up letters after the # but not other characters. For example, I had@g_smugand it only removed@gand stopped at the underscore. Any suggestions?

    Thanks
  • kaymankayman MemberPosts:662Unicorn

    Extend your regex a bit like this :

    \b(@|#)[^\. \s, ]+

    It looks a bit ugly but basically means find anything 'word' that starts with either @ or #, and select everything till the next space, dot or comma. You replace this with nothing and it's gone.

Sign InorRegisterto comment.