Curiosity is one of the most important developer's assets. By being curious and experimenting things, you make sure you're constantly learning and upgrading your skills. Few weeks ago, I had to implement internationalization for an app I was working on. I chose to use Polyglot because of its simplicity. It's also created by Airbnb, a company that has many great open sources libraries.

When I used Polyglot, I wondered what was going on under the hood. So I decided to do a little bit of reverse-engineering by diving into the source code. And here I am, sharing with you, what I've learnt along the way.

You'll find many code snippets below. Most of them come from the source code of Polyglot itself. The goal of this article is to give you more explanations about that code, its purpose and its use.

Recap about the library

But first, let's do a recap about the library. Basically, Polyglot handles the internationalization of your app including interpolation and pluralization. You give it a set of translated phrases and you retrieve the ones you want using a special function called t:

const polyglot = new Polyglot({
  phrases: {
    hello: 'Hello',
    morning: 'Morning',
    auth: {
      login: 'Login',
      register: 'Register',
    },
  },
})

polyglot.t('hello') // Hello
polyglot.t('morning') // Morning
polyglot.t('auth.login') // Login

Notice how Polyglot handles nested objects.

If you want to add more phrases to the Polyglot instance, you can use the extend method:

polyglot.extend({
  bye: 'Bye!',
})

Polyglot supports interpolation. Roughly speaking, it consists in replacing some placeholders in your phrase by a real value:

polyglot.extend({
  welcome: 'Welcome %{name}',
})

polyglot.t('welcome', { name: 'Thomas' }) // Welcome Thomas

If you don't provide one of the interpolation value, Polyglot will return your placeholder as-is. It can be useful if you want to do partial interpolation:

polyglot.extend({
  user: 'My name is %{name}. I love %{hobby}!',
})

polyglot.t('user', { name: 'Thomas' }) // My name is Thomas. I love %{hobby}

You can also provide your own interpolation syntax:

const polyglot = new Polyglot({ interpolation: { prefix: '{{', suffix: '}}' } })

polyglot.extend({
  city: 'You live in {{city}}',
})

polyglot.t('city', { city: 'London' }) // You live in London

Finally, Polyglot handles pluralization, that is to say, express a sentence in the plural form. To make it work properly, Polyglot needs a locale (which is en by default):

polyglot.locale() // 'en'
polyglot.extend({
  thing: 'There is %{smart_count} thing |||| There are %{smart_count} things',
})

polyglot.t('thing', { smart_count: 1 }) // There is 1 thing

Giving a number as a second argument also works:

polyglot.t('thing', 4) // There are 4 things

If you need to set the locale, you have two choices:

  • Call the locale method with its corresponding code
  • Provide the locale during the instanciation
polyglot.locale('fr')
// OR
const polyglot = new Polyglot({ locale: 'fr' })

What's great is that Polyglot supports many locales, dozens of locales!

And then the less-used features:

  • unset to remove a key from Polyglot's phrases
  • clear to remove all the phrases from the Polyglot instance
  • replace to replace the current phrases with others
  • has to check if a translation exists for a given key
polyglot.unset('hello')
polyglot.unset({
  hello: 'Hello',
  auth: {
    register: 'Register',
  },
})

polyglot.replace({
  hello: 'Hey!',
  bye: 'Bye-bye',
})

polyglot.clear()

polyglot.has('hello') // true

You may think that the translation itself is easy to do. In fact, it's just returning a value of an object based on a key. But what about nested key objects? The interpolation feature? Or the pluralization? And handling the pluralization for all the locales? ?

Don't worry, we'll figure it out right away.

Translate a simple phrase

Let's start with the simplest feature: translate a simple phrase. It's as easy as getting a key from an object. Internally, when you create an instance of Polyglot, an empty phrases object is created. You can then extend it by providing some phrases in the options object:

function Polyglot(options) {
  var opts = options || {}
  this.phrases = {}
  this.extend(opts.phrases || {})
  // ...
}

We instanciate Polyglot like this:

const polyglot = new Polyglot() // phrases -> {}
const polyglot = new Polyglot({ phrases: { hello: 'Hello' } }) // extend({hello: 'Hello'}) is called

Polyglot makes use of what we call short-circuit evaluation. This complex term means JavaScript takes advantage of logical operators such as || (OR) to evaluate just what's necessary. Thus, if options is undefined or null, JavaScript will assign {} to opts as the first operand evaluates to false. But if options is set, it won't even look at the rest of the expression and will assign options to opts.

Polyglot uses extend to map a given key to a given phrase in its internal phrases object. We'll come back to it in a few moments.

Other things happen when you instanciate Polyglot including this:

function Polyglot(options) {
  var opts = options || {}
  // ...
  this.currentLocale = opts.locale || 'en'
  // ...
  this.tokenRegex = constructTokenRegex(opts.interpolation)
}

The currentLocale is set to a given locale or en and a mysterious tokenRegex is created. These properties are important for the interpolation and pluralization.

Then, when we'll call t with a given key, Polyglot will search its associated value in the internal phrases object. Once found, it will apply to the phrase some transformations (interpolation and pluralization) and return it:

Polyglot.prototype.t = function(key, options) {
  var phrase, result
  var opts = options == null ? {} : options
  if (typeof this.phrases[key] === 'string') {
    phrase = this.phrases[key]
  }
  // ...
  if (typeof phrase === 'string') {
    result = transformPhrase(phrase, opts, this.currentLocale, this.tokenRegex)
  }
  return result
}

"What an odd way to declare a method", you might say. Why not simply use the class keyword?

Because JavaScript is constantly evolving. At the time this library was written, there were no such thing as classes (as well as const and let), it was only introduced in 2015. So we had to create classes by using constructor functions.

But, functions being functions, it's not efficient to put all methods inside a constructor function. That would mean that every function inside the constructor function would be created at each instanciation!

That's why there is a prototype property inside every function: by putting a method in the prototype property, you share it across all instances of your function. Check out this resource to learn more about prototypes.

By the way, classes in JavaScript are also constructor functions. Go ahead and define a class, you'll see its type is Function. Interesting, isn't it?

Extend

The most attentives of you will think "But what if our phrases object has inner objects? I don't see how t handles the nested objects? It only transforms the phrase if the type of the phrase is a string, right?"

That's a legit question. As said in the recap, Polyglot handles nested phrases objects. Under the hood, It uses the extend method to add new phrases to the instance. This method processes recursively all the keys in the object passed as an argument and put them at the root level of the internal phrases object. Each nested key is concatenated with the one that's one level above it using the dot notation:

var forEach = require('for-each')

// ...

Polyglot.prototype.extend = function(morePhrases, prefix) {
  forEach(
    morePhrases,
    function(phrase, key) {
      var prefixedKey = prefix ? prefix + '.' + key : key
      if (typeof phrase === 'object') {
        this.extend(phrase, prefixedKey)
      } else {
        this.phrases[prefixedKey] = phrase
      }
    },
    this
  )
}

Note: Polyglot uses a package called for-each to iterate over both objects and arrays. It acts as a polyfill so that Polyglot doesn't require added methods like Array.prototype.forEach. If you wonder what is the third this argument passed to forEach, it allows to specify what is this in the callback function (the second argument). In our case, this refers to an instance of Polyglot.

Let's see what happens if you call extend like this:

polyglot.extend({
  hello: 'Hello',
  auth: {
    login: 'Login',
    register: 'Register',
  },
})
  • For the hello key, prefix will be undefined so prefixedKey = hello. The corresponding phrase (Hello) is a string, so we map Hello to hello in the internal phrases.

  • For auth, there is still no prefix, so prefixedKey = auth. However, phrase is an object, so we'll recursively call extend with phrase and prefixedKey: this.extend({ login: 'Login', register: 'Register' }, 'auth').

  • For login, there is a prefix which is auth, so prefixedKey = auth.login. Here phrase is a string so we map Login to auth.login in the internal phrases.

  • It's exactly the same behavior for register.

That way, you make sure you traverse all your object and map every key to a string value in the flattened phrases object. After calling extend, your phrases look like this:

{
  hello: 'Hello',
  auth.login: 'Login',
  auth.register: 'Register'
}

Note that extend actually extends the phrases object and doesn't replace it. The only things that can be replaced are the conflicting keys. For example if you call extend with an object which has a hello key and that key already exists in phrases, the old key will be replaced with the new one.

Interpolation

Let's come back to transformPhrase, shall we? You may remember that we use it in the t function to transform the phrase we get via our key:

if (typeof phrase === 'string') {
  result = transformPhrase(phrase, opts, this.currentLocale, this.tokenRegex)
}

We'll dive into its code to understand how one can build the interpolation feature.

Here, the interpolation feature consists in replacing a generic word by another one in an object. What's great is that String.prototype.replace does most of the work for us. Indeed, according to MDN, The replace method returns a new string with some or all matches of a pattern replaced by a replacement. Then, you just need to define a regular expression to capture what you need to replace and fetch the corresponding word in the options object.

var has = require('has')

// ...

var replace = String.prototype.replace

// ...

var defaultTokenRegex = /%\{(.*?)\}/g

function transformPhrase(phrase, substitutions, locale, tokenRegex) {
  // ...

  var result = phrase
  var interpolationRegex = tokenRegex || defaultTokenRegex

  var options =
    typeof substitutions === 'number'
      ? { smart_count: substitutions }
      : substitutions

  result = replace.call(result, interpolationRegex, function(
    expression,
    argument
  ) {
    if (!has(options, argument) || options[argument] == null) {
      return expression
    }
    return replace.call(options[argument], dollarRegex, dollarBillsYall)
  })

  return result
}

This is how interpolation is done under the hood. Not less, not more. But there are things you may be unfamiliar with like call or String.prototype.replace.

Three essential things are done before calling transformPhrase:

  • We require the has package which is a shortcut for Object.prototype.hasOwnProperty.call.
  • We cache the String.prototype.replace method.
  • We create the defaultRegexToken that will be used to match the generic words in our phrases. Basically, this regex means "match anything that is included inside %{}". Note that this regular expression is lazy (.*?) instead of greedy (.*). In other words, it means this regexp will capture the smallest group possible.

That being said, we can learn how transformPhrase behaves.

It takes four parameters: phrase, substitutions, locale and tokenRegex. We won't need locale for now as it's needed for pluralization. Then, we assign to interpolationRegex the regex that will be used for the replacements. It can be a custom regex (remember the mysterious tokenRegex in the constructor?) or the defaultTokenRegex explained above.

Beforehand, we assign to options the value of substitutions. The typeof check is done for pluralization. You don't have to care about it for now.

Then, we actually replace the phrase using the regular expression. As the replace function is cached, we need to use the call method to specify on which string we want to invoke the replace function, that is to say, result. If using call confuses you, it's the same as:

result = result.replace(interpolationRegex, function(expression, argument) {
  if (!has(options, argument) || options[argument] == null) {
    return expression
  }
  return replace.call(options[argument], dollarRegex, dollarBillsYall)
})

The second argument of replace can either have the form of a string or a function. Here we use the function because we need to retrieve the value associated to the eventual generic word in the options. This function will be run on every match with the following arguments:

  • expression: refers to the match itself, %{name} for example.
  • argument: refers to the captured group ((.*?)), that is to say everything inside the interpolation syntax. By default, it is everything inside %{}.

If the captured string is a property of the options object passed to t and is different than null, then we return its associated value otherwise we return the expression itself. The returned value will be used as a replacement for the match.

Let's apply it on an example to fully understand it:

polyglot.extend({
  user: 'Hi, your name is %{name}. You are a %{job} and you like %{hobby}.',
})

polyglot.t('user', {
  name: 'Thomas',
  hobby: 'Traveling',
})

We assume we haven't provided any custom interpolation syntax to Polyglot. Then, interpolationRegex is equal to /%\{(.*?)\}/g.

We call replace on the phrase 'Hi, your name is %{name}. You are a %{job} and you like %{hobby}.'. We'll have three matches: %{name}, %{job} and %{hobby}. For every match, we run replace's callback function:

  1. expression = %{name}, argument = name. Is name in the options object? Yes, then return the associated value: Thomas.
  2. expression = %{job}, argument = job. Is job in the options object? No, return the expression: %{job}.
  3. expression = %{hobby}, argument = hobby. Is hobby in the options object? Yes, then return the associated value: Traveling.

The result of the translation is: Hi, your name is Thomas. You are a %{job} and you like Traveling. That's not rocket science after all! ?

Custom interpolation syntax

Now that we saw how to implement interpolation, let's see how to customize the interpolation syntax. Indeed, Polyglot allows you to customize the prefix and the suffix so that you can use {{name}} or |name| instead of %{name}.

"Easy", you might think. "Just change the regex!". And you would be right, that's what Polyglot does. It uses a custom regex called tokenRegex that is built when you create an instance of Polyglot:

function Polyglot(options) {
  var opts = options || {}
  // ...
  this.tokenRegex = constructTokenRegex(opts.interpolation)
}

This tokenRegex is then passed to transformPhrase when you call t on a key. It is assigned to interpolationRegex if it holds a value:

function transformPhrase(phrase, substitutions, locale, tokenRegex) {
  // ...
  var interpolationRegex = tokenRegex || defaultTokenRegex
  // ...
}

constructTokenRegex is a fairly simple function. Its purpose is to return a new regex based on the given prefix and suffix:

var delimiter = '||||'

// ...

function escape(token) {
  return token.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}

function constructTokenRegex(opts) {
  var prefix = (opts && opts.prefix) || '%{'
  var suffix = (opts && opts.suffix) || '}'

  if (prefix === delimiter || suffix === delimiter) {
    throw new RangeError(
      '"' + delimiter + '" token is reserved for pluralization'
    )
  }

  return new RegExp(escape(prefix) + '(.*?)' + escape(suffix), 'g')
}

There are two things to consider though:

  • It's not possible to choose a prefix or a suffix that is equal to |||| as it's used for pluralization.
  • We must escape the prefix and the suffix. But why is that? Well, you'll probably use special characters such as {, or [ or maybe *. However, these symbols are special regex characters so we need to escape them with a backslash. That's the responsibility of the escape function. It will replace the regex symbol by a backslash followed by the matched symbol (corresponds to $&).

As an example, constructTokenRegex({ prefix: '[[', suffix: ']]' }) returns /\[\[(.*?)\]\]/g (and not /[[(.*?)]]/g ⚠️).

Smart count and plural groups

Now the second part of transformPhrase: pluralization. That one can be tough to build. Indeed, you have to make it possible for Polyglot to choose a phrase among others both based on a number and a locale. But a language can be radically different from another, plural forms included. For example, did you know that there are no plural forms in Chinese but there are six in Arabic? Or that in French, zero is singular while it's plural in English? To make pluralization happen, we need to take account of all these rules.

Note: If you're interested, you can find all plural rules here.

Roughly speaking, here's what Polyglot does:

  1. Reference all possible rules and map them to the corresponding locales.
  2. When you translate a phrase which need to be pluralized (with the smart_count option), split it in multiple phrases based on a delimiter. Thus, you get an array of phrases (more precisely, all the plural forms of the phrase).
  3. Retrieve the rule associated to the locale given to Polyglot. This rule takes a number as a parameter and returns another number indicating which plural form to choose.
  4. Returns the correct phrase using the number returned by the rule (which acts as an array index).

We are going to detail this step by step.

First, here is an extract of all the rules and their corresponding locales:

var russianPluralGroups = function(n) {
  var lastTwo = n % 100
  var end = lastTwo % 10
  if (lastTwo !== 11 && end === 1) {
    return 0
  }
  if (2 <= end && end <= 4 && !(lastTwo >= 12 && lastTwo <= 14)) {
    return 1
  }
  return 2
}

var pluralTypes = {
  arabic: function(n) {
    // http://www.arabeyes.org/Plural_Forms
    if (n < 3) {
      return n
    }
    var lastTwo = n % 100
    if (lastTwo >= 3 && lastTwo <= 10) return 3
    return lastTwo >= 11 ? 4 : 5
  },
  bosnian_serbian: russianPluralGroups,
  chinese: function() {
    return 0
  },
  croatian: russianPluralGroups,
  french: function(n) {
    return n > 1 ? 1 : 0
  },
  german: function(n) {
    return n !== 1 ? 1 : 0
  },
  // ...
}

var pluralTypeToLanguages = {
  arabic: ['ar'],
  bosnian_serbian: ['bs-Latn-BA', 'bs-Cyrl-BA', 'srl-RS', 'sr-RS'],
  chinese: [
    'id',
    'id-ID',
    'ja',
    'ko',
    'ko-KR',
    'lo',
    'ms',
    'th',
    'th-TH',
    'zh',
  ],
  croatian: ['hr', 'hr-HR'],
  german: [
    'fa',
    'da',
    'de',
    'en',
    'es',
    'fi',
    'el',
    'he',
    'hi-IN',
    'hu',
    'hu-HU',
    'it',
    'nl',
    'no',
    'pt',
    'sv',
    'tr',
  ],
  french: ['fr', 'tl', 'pt-br'],
  // ...
}

You can see for example that for a german rule (which includes english), we return the plural form if the number is different than one, otherwise we return the singular form.

Now we can focus on the pluralization part of transformPhrase. Here we will take account of the locale parameter. The currentLocale value of Polyglot is passed to transformPhrase when calling t:

if (typeof phrase === 'string') {
  result = transformPhrase(phrase, opts, this.currentLocale, this.tokenRegex)
}

To make pluralization happen, you need to add a smart_count in your options (substitutions object). Note that you can also pass a number instead of an options object. Polyglot will take account of that shortcut and transform it back to an options object with a smart_count property.

var trim = require('string.prototype.trim')

// ...

var split = String.prototype.split

// ...

var delimiter = '||||'

// ...

function transformPhrase(phrase, substitutions, locale, tokenRegex) {
  // ...

  if (substitutions == null) {
    return phrase
  }

  var result = phrase

  var options =
    typeof substitutions === 'number'
      ? { smart_count: substitutions }
      : substitutions

  if (options.smart_count != null && result) {
    var texts = split.call(result, delimiter)
    result = trim(
      texts[pluralTypeIndex(locale || 'en', options.smart_count)] || texts[0]
    )
  }

  // Interpolation code...

  return result
}

Step two. If we do have a smart_count option, we split the phrase in multiple parts thanks to the delimiter (||||). As we are caching split, we need to invoke the function on result thanks to the call method. For example:

var phrase = 'I have one thing |||| I have many things'
phrases.split('||||') // ['I have one thing ', ' I have many things']

Then, step 3. We need to retrieve the rule associated to our locale. This happens thanks to the pluralTypeIndex function.

Basically, pluralTypeIndex takes a locale and a count and it invokes one of the functions defined in pluralTypes with count as a parameter. It can be arabic(3) for example.

function pluralTypeIndex(locale, count) {
  return pluralTypes[pluralTypeName(locale)](count)
}

Nevertheless, we need to know beforehand to which language rule our locale refers to. That's what pluralTypeName does.

function langToTypeMap(mapping) {
  var ret = {}
  forEach(mapping, function(langs, type) {
    forEach(langs, function(lang) {
      ret[lang] = type
    })
  })
  return ret
}

function pluralTypeName(locale) {
  var langToPluralType = langToTypeMap(pluralTypeToLanguages)
  return (
    langToPluralType[locale] ||
    langToPluralType[split.call(locale, /-/, 1)[0]] ||
    langToPluralType.en
  )
}

After building the map that associates a locale to the correct plural rule, we lookup in this map for the value of the given locale.

You have to deal with one edge case here. Indeed, the locale can sometimes be composed and missing in the corresponding map. In that case, we first try to return the rule associated to the first part of the locale. Otherwise we return the one associated to en (german).

If you struggle to see what the langToPluralType map looks like, here is an extract:

{
  ar: 'arabic',
  'bs-Latn-BA': 'bosnian_serbian',
  'bs-Cyrl-BA': 'bosnian_serbian',
  'srl-RS': 'bosnian_serbian',
  'sr-RS': 'bosnian_serbian',
  id: 'chinese',
  'id-ID': 'chinese',
  // ...
  'th-TH': 'chinese',
  zh: 'chinese',
  hr: 'croatian',
  'hr-HR': 'croatian',
  fa: 'german',
  da: 'german',
  de: 'german',
  en: 'german',
  // ...
}

Let's say we have fa as a locale. Invoking pluralTypeName will lookup in the map above if fa is mapped to a value. It turns out it is, so it will return german.

However, in the case of en-US, it doesn't correspond to anything in the map, so it will split this locale based on -, and will retrieve the first part of it: en. Thus, we have a locale to look for in the map.

Finally, in pluralTypeIndex, we can invoke the correct language rule to the function with the count. The result of that function will be trimed as there may be some whitespaces between the end (or beginning) of the phrase and ||||.

Note: As we are thinking in terms of indexes, it implies that your phrases should be delimited in an ascending order and that you should be exhaustive on the possible plural forms your phrase may take. Otherwise Polyglot may just return the first phrase.

Let's recap pluralization on an example:

const polyglot = new Polyglot()
polyglot.extend({
  thing: 'There is %{smart_count} thing |||| There are %{smart_count} things',
})

polyglot.t('thing', { smart_count: 1 })

We haven't provided a locale to polyglot, so it's en by default.

  1. Is there a smart_count on the options or is it a number? Yes, let's know which phrase to return.
  2. Split the phrase in multiple phrases: ['There is %{smart_count} thing ', ' There are %{smart_count} things']
  3. After building the langToPluralType map, we retrieve the plural rule associated to en, that is to say german.
  4. Let's invoke the german function with the correct count: german(1)
  5. The german rule checks the following: n !== 1 ? 1 : 0. Here, 1 is not different from 1, then return 0.
  6. Select the first phrase (index 0) from the multiple phrases and trim it: There is %{smart_count} thing.
  7. Interpolation: replace smart_count in the phrase: There is 1 thing.

Another example:

const polyglot = new Polyglot({ locale: 'fr' })
polyglot.extend({
  thing: "Il n'y a rien |||| Il y a plein de choses",
})

polyglot.t('thing', 4)

Here the locale is fr.

  1. Is there a smart_count on the options or is it a number? Yes, let's know which phrase to return.
  2. Split the phrase in multiple phrases: ["Il n'y a rien ", " Il y a plein de choses"]
  3. After building the langToPluralType map, we retrieve the plural rule associated to fr, that is to say french.
  4. Let's invoke the french function with the correct count: french(4)
  5. The french rule checks the following: n > 1 ? 1 : 0. Here, 4 is greater than 1, then return 1.
  6. Select the second phrase (index 1) from the multiple phrases and trim it: Il y a plein de choses.

Other features

Are you still with me? Great. The next methods are pretty easy to understand, they act more as utilities instead as a feature on their own.

Unset

unset is quite similar to extend code-wise. It takes a parameter that can either be a string or an object. If it's a string, delete the key from the internal phrases object. If it's an object, iterate over the keys of the object passed and remove them if the associated value if it's not an object. Otherwise, apply the unset function on the associated object value:

Polyglot.prototype.unset = function(morePhrases, prefix) {
  if (typeof morePhrases === 'string') {
    delete this.phrases[morePhrases]
  } else {
    forEach(
      morePhrases,
      function(phrase, key) {
        var prefixedKey = prefix ? prefix + '.' + key : key
        if (typeof phrase === 'object') {
          this.unset(phrase, prefixedKey)
        } else {
          delete this.phrases[prefixedKey]
        }
      },
      this
    )
  }
}

Clear

As you have an internal phrases object that contains all your keys and phrases, clear is fairly simple. Just empty the object:

Polyglot.prototype.clear = function() {
  this.phrases = {}
}

Replace

replace allows you to replace your current set of phrases by another one. So you just need to clear the current phrases and extend it with the other set:

Polyglot.prototype.replace = function(newPhrases) {
  this.clear()
  this.extend(newPhrases)
}

Has

As we already require an has package that checks for a key in an object, we just have to apply the has function on the internal phrases with a given key:

Polyglot.prototype.has = function(key) {
  return has(this.phrases, key)
}

Locale

The locale method has two behaviors:

  • If a new locale is provided, set the internal currentLocale to it.
  • Otherwise return the current locale.
Polyglot.prototype.locale = function(newLocale) {
  if (newLocale) this.currentLocale = newLocale
  return this.currentLocale
}

What I've learnt

We're done for the code! Reverse-engineering a library is a great way to upgrade your skills. By diving into the code source of Polyglot, I had the opportunity to enhance my knowledge of:

  • How to handle the internationalization in an application
  • Prototypes, the prototype chain and closures
  • Regular expressions and their use in String.prototype.replace
  • Using call to apply a function to another object
  • The plural rules specific to each language

I'm convinced that the most effective way to learn is by learning from real-world examples and by practicing. Theory is great to a certain extent. I highly encourage you to do more of reverse-engineering and I hope that this article gave you the will to do so.