"Issues Importing CSV"

jbartotjbartot MemberPosts:4Contributor I
edited May 2019 inHelp
Hi,

I am trying to import a CSV that is about 25M in size. RM really struggles to process the file. It maxes out the processors for about 10 minutes before it finally gives up and runs out of memory. I specifically set the java heap size on launch and can see that the OS is giving RM the 2G memory space I specified. I tried this on a smaller file (1/5 size) and got similar behavior.

I have tried this importing either to a repository or to the workspace. I get the same behavior both ways. The data itself is 500 x 12,000 (bag of words document vectors). Even assuming each feature value takes up 8 bytes (for doubles) of space, it doesn't make sense that this is such a struggle.

Any ideas? Am I thinking about this right?

Any help would be appreciated.

松鸦
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    well shouldn't happen. Is the data confidental? Would it be possible otherwise if you send it to me? Then I will include it into our checks.

    Greetings,
    Sebastian
  • jbartotjbartot MemberPosts:4Contributor I
    Happy to share the data. Given its size, where should I post it to?

    Thanks

    松鸦
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    please send me an email. If you compress the data it should fit in my mail account.

    Greetings,
    Sebastian
Sign InorRegisterto comment.