Importing hundreds of QnAs

Has anyone tried importing hundreds of question and answers into a bot? Each question may also have 10+ alternate questions. I ask because I’ve been experimenting with this in particular and have noticed some extended NLU training times. Training seems fairly good with 50-100 questions, but when I try the main dataset (~700 questions) it begins to choke. Briefly inspecting the code, my thought was because it was using an SVM classifier for the training, which starts to get very resource intensive the larger the dataset. Just curious if anyone else has experienced this.

Reference: https://github.com/botpress/botpress/blob/master/src/bp/ml/svm.ts#L28

I guess it reaches a hard limit:

botpress_1  | 16:25:18.238 Mod[nlu] Error training NLU model [Error, The size of the file data/bots/test-bot/models/en/question__intent-l1.bin is over the 100mb limit]
botpress_1  | STACK TRACE
botpress_1  | Error: The size of the file data/bots/test-bot/models/en/question__intent-l1.bin is over the 100mb limit
botpress_1  |     at ScopedGhostService.<anonymous> (/snapshot/build-linux/out/bp/core/services/ghost/service.js:0:0)
botpress_1  |     at Generator.next (<anonymous>)
botpress_1  |     at __awaiter (/snapshot/build-linux/out/bp/core/services/ghost/service.js:0:0)
botpress_1  |     at Promise._execute (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/debuggability.js:427:9)
botpress_1  |     at Promise._resolveFromExecutor (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/promise.js:518:18)
botpress_1  |     at new Promise (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/promise.js:103:10)
botpress_1  |     at __awaiter (/snapshot/build-linux/out/bp/core/services/ghost/service.js:0:0)
botpress_1  |     at ScopedGhostService.upsertFile (/snapshot/build-linux/out/bp/core/services/ghost/service.js:0:0)
botpress_1  |     at Storage._persistModel (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/dist/backend/storage.js:236:26)
botpress_1  |     at Promise.map.model (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/dist/backend/storage.js:253:45)
botpress_1  |     at tryCatcher (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/util.js:16:23)
botpress_1  |     at MappingPromiseArray._promiseFulfilled (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/map.js:68:38)
botpress_1  |     at MappingPromiseArray.PromiseArray._iterate (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/promise_array.js:115:31)
botpress_1  |     at MappingPromiseArray.init (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/promise_array.js:79:10)
botpress_1  |     at MappingPromiseArray._asyncInit (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/map.js:37:10)
botpress_1  |     at _drainQueueStep (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/async.js:97:12)
botpress_1  |     at _drainQueue (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/async.js:86:9)
botpress_1  |     at Async._drainQueues (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/async.js:102:5)
botpress_1  |     at Immediate.Async.drainQueues [as _onImmediate] (/botpress/modules/.cache/module__f21b22cdb91304a860852bb6a0972f4eed394f43a241bc9a33d90818ff44b497/node_production_modules/bluebird/js/release/async.js:15:14)
botpress_1  |     at runCallback (timers.js:696:18)
botpress_1  |     at tryOnImmediate (timers.js:667:5)
botpress_1  |     at processImmediate (timers.js:649:5)
botpress_1  |     at process.topLevelDomainCallback (domain.js:121:23)

I haven’t tried with large number of QnA but the last time I added multiple questions in different categories/contexts the bot wasn’t able to switch the context and respond back correctly. This made the QnA module almost unusable. Don’t you face the same problem?

I’ve typically used only the same category but I can see where that would be frustrating. I’ve noticed in other cases where it has difficulty picking up on the context. I know they mention using intents but as you mentioned, that isn’t really built for mass qnas which I feel are common place in many areas that use bots.

@shahamit @maybeno I’ve faced the same issue. The issue seems to be the same across intents and QnA’s (since QnA’s are regarded as intents themselves). As you can see in this thread, I faced same issue after adding multiple intents in a row. The solution to this is to allow the bot to complete the training before adding more intents/QnA’s (AKA Patience). Another option is editing the Intent/QnA in the backend (using a code editor) and adding all your intents/Questions at once and allowing the bot to process that (instead of using the UI/Frontend).

This is what EFF said regarding this issue:

@av_botpress - Thanks for the reply. I assume you are referring to the issue with bulk upload of intents/qna’s and not the context switching problem when having different intent’s/qna’s in different contexts. The latter one still remains a problem I guess

Thanks for the insight @av_botpress. I tried this on 12.1.5 and 12.1.6, I wonder if the issue was worse before? There still is the problem of importing large amounts even if you wait. The currently implementation times out as the model is just too large to process when you add that many questions.

@maybeno The issue should be partially fixed with the release of 12.2.0. I will be tackling training time performance quite soon and I just don’t happen to have a bot this large in hands. Would you mind sharing your data with me in private so I can get a decent testing candidate :slight_smile: .