Labelling training cases in polynominal text clasification task
Each case in my dataset contains multiple sentences as shown below.
"Criterion 4Writing General – language and grammar and referencing.UnacceptableSentence structure and grammar inadequate for clarity and/or incomplete referencing of sourced material.AcceptableSentence structure and grammar adequate, but errors cause distraction and/or errors in referencing.GoodSentence structure and grammar adequate, with minor errors that do not distract reader from the main message.Very GoodSentence structures and grammar are good with correct referencing of all sourced material.ExcellentEmploys words with fluency for ease of reading. Writing and references are essentially error free."
I would like to classify the cases according to their main focus. My labels are ["Information Literacy", Written Communication", Digital Literacy"...] 8 in total.
When developing the training set some cases clearly relate to one area such as Information Literacy... In those instances my training data looks like this:
ID, Text, Lable
01 "string", "Information Literacy"
However, some cases relate to multiple labels.
My question is how should these cases be documented in the training set?
Hope that makes sense.
Tagged:
0
Best Answer
-
rfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568UnicornHello,
Let's use a simpler case here to make an example.
Label | Text
weather | This will be a cold winter
food | a few sandwiches for me
weather | It's raining today
food | Give me some coffee
sports | Michael Jordan is the greatest basketball player ever
The result for this one should be:
weather, food | Today it was cold, I made coffee and sandwiches.
Right?
所以我所做的lve this was to train three different models (8, in your case). One that can recognize weather from not-weather, other that can recognize food from not-food, and a third one that can recognize sports from not-sports.
You can make use of Multiply, Macros, and a few other things to train multiple models and then apply these models iteratively.
It's not the most elegant solution and maybe@Telcontar120has another one. I'll try to find an example to share with you, ok?
All the best,
Rodrigo.
6
Answers