"Search for a phrase in multiples PDFs via a list of URLs"

carlcarl MemberPosts:30Guru
edited June 2019 inHelp

Is it possible to input a list of URLs (which contain PDFs), then search for a phrase in all the PDFs, and return a table with the URL path and the searched-for text? I'd like to do this without downloading all the PDFs.

This is what I have so far, which runs, but doesn't create an attribute.

1 - Read Excel (with URLs)

2 - Loop Examples

2a - Extract Macro

2b - Open File

2c - Read Document

2d - Extract Information

3 - Select Attributes





<宏/>

























<参数键= " url " value = " % {GetURL} " / >


























<操作符= " true " class = " select_attribute激活s" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="34">











Tagged:

Best Answer

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    Solution Accepted

    Here you go. You needed a Documents to Data operator to change your PDF text into an ExampleSet.





    <宏/>

























    <参数键= " url " value = " % {GetURL} " / >































    <操作符= " true " class = " select_attribute激活s" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="34">













    stevefarr

Answers

Sign InorRegisterto comment.