Custom languages

Is it posible to use this bpe?
https://nlp.h-its.org/bpemb/

I was not able to do it.

Hi Popo,

I dont know about those specific ones, but you can easily use the Fastext ones.

You will need to run your own language server by following this guide.

Just rename the .bin to expected format.

Hope this helps, happy bot building
JB

1 Like

Hello,

Does that mean we can use other fasttext languages not currently supported natively by botpress? eg Greek?
In that case, since the guide you are referring to mentions two files necessary per language (.model and .bin for the actual data), what may one use as a .model ?

I searched the available documentation but unfortunately couldn’t find any definitive reference to custom language integration.

Thank you very much in advance!

HI Gilgamesh,

They are not natively supported. However, all fasttext languages work well in Botpress. I’ve seen multiple users implement them.

As for technical aspects of making it work. I’m not technical myself. Maybe @EFF Can be of better help.

Stay Safe,
JB

Hi @Gilgames13 You can indeed add support to other languages to your language server.

First off start by finding your botpress embeddings directory. It should be displayed when you start start your language server.

Here’s what I have locally
Screen Shot 2020-05-01 at 4.39.45 PM

Then as you pointed out, you will need to add some files in there (both .model and .bin are required).

.bin are fasttext models and .model are byte paire encodings (thus BPE) so you’ll need that for Greek as well.

You’ll notice that file are named with a specific pattern and that is required. Let’s take english for instance

bp.en.300.bin ==> fasttext file
== ++ — ___
bp : domain (in this nothing) if you trained your embedding on a specific source and want to use that domain
en : language code
300 : length of word vectors (number of dimensions )
.bin : extension

bp.en.bpe.model ==> bype pair encoding
nothing specific here just make sure domain and language code correspond to fasttext model

If you place those 2 files and restart your language server, you should be good to go.

Let me know how it plays out

Hello EFF and thanks for replying back! :slight_smile:

Well it works… kind-of!
The language appears to be recognised, but botpress cannot detect it correctly in actual usage. Debugging always shows “Detected language: n/a” regardless of the input, and no actual event, eg Q&A, is ever triggered :confused:

Do you know why that might happen?

Once again thank you very much for your answer!

Replying to my own question, for anyone interested: I was probably so eager to try out the new configuration, that apparently I did not wait for the language server to actually fully initialise and load all embeddings first, before starting botpress. For some reason botpress failed to identify the language in all of my chats if started too soon.

When I restarted both processes and waited for the language server to fully initialise ALL languages before starting botpress as well, the detection worked just fine. :slight_smile:

1 Like

Thanks for the insight, hopefully this will help fellow developers from around the world!

1 Like

Hi, did you actually add greek language to you bot;