Exisitng Data Sets and Training

Hi all,

I am working on researching/developing a chatbot based Intelligent Tutoring System (ITS). The data I am looking to use is a series of questions and answers from students and tutors respectively. I’m quite new to Botpress so I was looking for some advice on using existing datasets to train the NLU, rather than say, manually writing out intents for each question.

Any general advice would be much appreciated, however in particular I was wondering what format should the data be in so that it can be integrated with the bot and how I could start using that data for training (any tutorials or example code would be great).

Thank you :slightly_smiling_face:

Hi @kirkwood,

I suggest you create one of each and locate the file on your system. All data is found under /data/bots/botID. For example, the questions and answers are located under /data/bots/botID/qna.

All files are in JSON format, and you will find their required properties in them. Q&As have an import feature; provided you format your questions, they can be added with a button click. Intents may be formatted and added directly to the folder, here’s an example:

{
  "name": "intent",
  "filename": "intent.json",
  "contexts": ["global"],
  "slots": [],
  "utterances": {
    "en": [
     "first utterance",
     "second utterance",
     "etc..."
]
  }
}

I would advise against the mass-import of user-generated questions and answers, since they may contain overlap in their intents, which will result in poor performance.

Good luck and happy building :robot:

1 Like

Hi @Maxime_Joannette,

Thank you very much for the advice, I’ll be sure to be careful with how I manage importing the dataset, and ask if I have any more queries.

Rhys

1 Like