Issues with Swedish model file for an offline language server

Hi!

I’m running Botpress via Docker on “Ubuntu 20.04.3 LTS” and it’s working great for the standard languages that are readily available from the metadata json file.

I now want to try and see if I can add support for the Swedish language and after reading through the forums regarding posts about adding languages that aren’t natively supported, most answers points to either FastText or BPEmb.

I’ve read through the forum posts below, but neither of the proposed solutions have worked thus far.
Adding Hungarian language failed
Custom languages
NLU doesn’t work

I’ve tried the combination of the bin file from FastText and the model file from MPEmb, as well as bin and model file from MPEmb only and they all result in the below error message from the language server.

Then I proceeded to try and create a model file from fastBPE and YouTokenToMe that I found referrals to but the result was the same error in regards to the model file format. I don’t expect any help in regards to these tools but I thought I’d mention them as reference.

With my very limited knowledge about the inner workings of NLU, Is there any blueprint as to how the model file needs to be formatted?
I did try and load the same files on a Windows installation but ended up with the same message.

I would greatly appreciate any help or nudge in the right direction to get Swedish support up and running on my language server.

Apologies if I haven’t supplied enough information or If I should’ve have added this as a comment on an already existing thread.
I will be happy to append any needed information to this thread upon request.

Thanks!
/Jocke

I did look at this thread. I will respond later with a full example.

Hey Daehli,

Thanks! I really appreciate you taking the time to help.

Ok ok,
I did figure it out.

Model

The model needs to be loaded from the https://bpemb.h-its.org website. Select the model you are interested in (In your case is the Sweden mode). You can use different sizes of vocabulary models, but the bigger your vocabulary is, the harder it will be loaded.

$ wget https://bpemb.h-its.org/sv/sv.wiki.bpe.vs25000.model
$ mv sv.wiki.bpe.vs25000.model  /path_TO_YOUR_LANGUAGE_SERVER/bp.sv.bpe.model

Bin

From the Word vectors for 157 languages · fastText website. You need to load the Word vectors for 157 languages. Because botpress is looking at the version of this file.

The wiki word vectors are not yet compatible with botpress.

From the list on the Word vectors for 157 languages Select the bin file

$ wget https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.sv.300.bin.gz
$ gunzip cc.sv.300.bin.gz
$ mv cc.sv.300.bin /path_TO_YOUR_LANGUAGE_SERVER/bp.sv.300.bin

Language directory

My language directory looks like this.

$ pwd
/HOME/botpress/language
$ ls bp.sv.300.bin   bp.sv.bpe.model

I’m using docker-compose with a language server.

version: '3'

services:
  botpress:
    image: botpress/server
    command: /botpress/bp
    expose:
      - 3000
    environment:
      - DATABASE_URL=postgres://postgres:secretpw@postgres:5435/botpress_db
      - REDIS_URL=redis://redis:6379?password=redisPassword
      - BP_MODULE_NLU_DUCKLINGURL=http://botpress_lang:8000
      - BP_MODULE_NLU_LANGUAGESOURCES=[{"endpoint":"http://botpress_lang:3100"}]
      - CLUSTER_ENABLED=true
      - BP_PRODUCTION=true
      - BPFS_STORAGE=database
    depends_on:
      - botpress_lang
      - postgres
      - redis
    volumes:
      - ./botpress/data:/botpress/data
    ports:
      - "3000:3000"
      
  botpress_lang:
    image: botpress-lang
    command: bash -c "./duckling -p 8000 & ./bp lang --offline --dim 300 --langDir /botpress/lang --port 3100"
    expose:
      - 3100
      - 8000
    volumes:
      - ./botpress/language:/botpress/lang

  postgres:
    image: postgres:11.2-alpine
    expose:
      - 5435
    environment:
      PGPORT: 5435
      POSTGRES_DB: botpress_db
      POSTGRES_PASSWORD: secretpw
      POSTGRES_USER: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:5.0.5-alpine
    expose:
      - 6379
    command: redis-server --requirepass redisPassword
    volumes:
      - redisdata:/data

  nginx:
    image: daehli-botpress-nginx
    ports:
      - 80:80
    command: nginx -g 'daemon off';
    depends_on:
      - botpress
    volumes:
      - ${PWD}/etc/nginx/nginx.conf:/etc/nginx/nginx.conf

volumes:
  pgdata:
  redisdata:

The important part is the botpress_lang. The dimension match the dimension I selected from the Swedish .bin

Thanks for the help @Daehli, I was now able to load the language files.

Something I noticed in the “Bottom Panel” under “Language Understanding”, was this message:

Failed at running NLU as no model was trained

Is this normal behavior when using an language that isn’t natively supported?

Did you export the model from the https://bpemb.h-its.org? If yes, could you describe the step you have done with some screenshots?

It will be easier for me to debug :slight_smile:

I did copy & paste on the commands you supplied (much appreciated :D) and replaced with my own file paths.

The only thing I’ve changed is the docker-compose file where I removed Redis since I was unable to start Botpress without a license.

Docker-compose file

version: '3'

services:
  botpress:
    image: botpress/server:latest
    container_name: botpress_server
    command: /botpress/bp
    restart: always
    expose:
      - 3000
    ports:
      - 3000:3000
    environment:
      - DATABASE_URL=postgres://postgres:<PW>@postgres:5435/botpress_db
      - BP_MODULE_NLU_DUCKLINGURL=http://botpress_lang:8000
      - BP_MODULE_NLU_LANGUAGESOURCES=[{"endpoint":"http://botpress_lang:3100"}]
      - EXTERNAL_URL=http://botpress_server:3000
      - BPFS_STORAGE=database
      - BP_PRODUCTION=true
    depends_on:
      - botpress_lang
      - postgres
    volumes:
      - /opt/chatbot/botpress/data:/botpress/data

  botpress_lang:
    image: botpress/server:latest
    container_name: botpress_lang
    command: bash -c "./duckling -p 8000 & ./bp lang --offline --dim 300 --langDir /botpress/lang --port 3100"
    restart: always
    expose:
      - 3100
      - 8000
    volumes:
      - /opt/chatbot/botpress/lang:/botpress/lang

  postgres:
    image: postgres:11.2-alpine
    restart: always
    expose:
      - 5435
    container_name: postgres
    environment:
      PGPORT: 5435
      POSTGRES_DB: botpress_db
      POSTGRES_PASSWORD: <PW>
      POSTGRES_USER: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data

  nginx:
    image: nginx:latest
    restart: always
    container_name: nginx
    ports:
      - 80:80
      - 443:443
    command: nginx -g 'daemon off';
    depends_on:
      - botpress
    volumes:
      - /opt/chatbot/nginx/nginx.conf:/etc/nginx/conf.d/nginx.conf
      - /opt/chatbot/certificates:/opt/certificates

volumes:
    pgdata:

Logs

Is there any relevant logs inside Botpress that would be helpful? I added Dialog to Debug, results below.
This looks fine to my eyes and from a user experience it’s not an issue either.

2021-11-23 16:15:52debug2021-11-23T15:15:52.253Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] init new context { currentNode: 'entry', currentFlow: 'main.flow.json' }{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.257Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-mIgXnm"{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.259Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-tBSBEP"{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.262Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-dnTXm0"{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.264Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.265Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (main.flow.json) [entry] -> [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:15:52debug2021-11-23T15:15:52.267Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] waiting until next event{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.367Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "event.nlu.intent.name === 'oppettider'" to [oppettider.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.368Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (main.flow.json) [node-intents] >> (oppettider.flow.json) [node-7eeb-copy]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.369Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "event.nlu.slots.departments.value === 'Kultur och fritid'" to [node-93e4]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.370Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (oppettider.flow.json) [node-7eeb-copy] -> [node-93e4]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.372Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-kkHYlQ"{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.374Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [more-q.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.375Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (oppettider.flow.json) [node-93e4] >> (more-q.flow.json) [tobbe-test]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.376Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-vyUwOC"{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.381Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [main.flow.json#node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.382Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (more-q.flow.json) [tobbe-test] >> (main.flow.json) [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:15:56debug2021-11-23T15:15:56.383Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] waiting until next event{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.075Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "event.nlu.intent.name === 'lediga_jobb'" to [lediga_jobb.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.076Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (main.flow.json) [node-intents] >> (lediga_jobb.flow.json) [entry]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.077Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-CDhXjR"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.079Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-dbnDc3"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.082Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-O-HPY4"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.087Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [node-f3e2]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.088Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (lediga_jobb.flow.json) [entry] -> [node-f3e2]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.089Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-EktEyJ"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.094Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-0zpEkF"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.097Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [more-q.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.098Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (lediga_jobb.flow.json) [node-f3e2] >> (more-q.flow.json) [tobbe-test]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.100Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-vyUwOC"{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.105Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [main.flow.json#node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.106Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (more-q.flow.json) [tobbe-test] >> (main.flow.json) [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:09debug2021-11-23T15:16:09.106Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] waiting until next event{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.190Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [error.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.203Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (main.flow.json) [node-intents] >> (error.flow.json) [entry]{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.205Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-76NpjC"{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.208Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-X_aTGQ"{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.228Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-0zpEkF"{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.231Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [main.flow.json#node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.232Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (error.flow.json) [entry] >> (main.flow.json) [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:16debug2021-11-23T15:16:16.234Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] waiting until next event{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.509Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "event.nlu.intent.name === 'lediga_jobb'" to [lediga_jobb.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.510Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (main.flow.json) [node-intents] >> (lediga_jobb.flow.json) [entry]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.511Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-CDhXjR"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.513Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-dbnDc3"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.516Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-O-HPY4"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.521Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [node-f3e2]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.522Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (lediga_jobb.flow.json) [entry] -> [node-f3e2]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.523Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-EktEyJ"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.527Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-0zpEkF"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.530Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [more-q.flow.json]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.530Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (lediga_jobb.flow.json) [node-f3e2] >> (more-q.flow.json) [tobbe-test]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.532Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] render element "#!builtin_text-vyUwOC"{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.537Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] eval transition "always" to [main.flow.json#node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.538Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] transit (more-q.flow.json) [tobbe-test] >> (main.flow.json) [node-intents]{ botId: 'kundtjanst' }
2021-11-23 16:16:27debug2021-11-23T15:16:27.539Z bp:dialog (kundtjanst) [005601fd-d05d-49d4-a287-bcc6b95be4bc] waiting until next event{ botId: 'kundtjanst' }

Misunderstood

When I purposefully type nonsense to the bot, it does not show up under Misunderstood even though transit (error.flow.json) gets triggered.
I’ve been using the explicit action to force it to show up in order to amend sentences to different intents.

Export

There’s no sensitive data in the bot, so I could provide an exported version if that would be helpful as well?

Hello joakimbergros,

The problem that you had is not related to the unsupported native language. I got this problem with the English model. I need to find why we are receiving this message. Don’t worry about this one.