Table of Contents
In today’s ramble, I discuss the subject of internationalisation in software development, how we currently approach it, and what I think we can do to improve it.
Internationalisation: A Primer
Internationalisation (abbreviated: i18n) describes the process of creating software designed for an international user base, so that software can easily be adapted for various languages and locales without updating the code for each. In an increasingly global Internet, this is an incredibly important practise - making your software suitable for everyone is a huge step, but a very worthwhile one!
Internationalisation requires a developer to think of solutions for many problems, including:
- Translating the software
- Handling multiple text directions, both for display and input
- Formatting measures correctly, such as with dates, times and currencies
This was a topic I needed to approach for my main project, Kord Extensions. Fortunately, this is not a new problem space by any means, and there are plenty of existing solutions!
Getting Started
In this post, I’ll be discussing this topic in relation to programming on the JVM, and specifically using the Kotlin programming language - as this is where the bulk of my experience lies. However, much of my experience is relevant to other languages as well, as standardised approaches to internationalisation have materialised.
Before you can internationalise your project, you’ll need to think about a few things:
- How to store your translations
- What message format to use
- Where to source formatting information and pre-translated strings
- How to actually get your strings translated
- How to implement your translations
Each topic will briefly be covered below.
Translation Storage
The first thing to consider is how to organise, store, and load your translations. There are a few different approaches to this, but most of are extensions of the same basic approach:
- Organise translations into bundles of files, with locales represented within a singular file or split into multiple files representing a single locale each
- Refer to translations using keys, which may either be descriptive static names (e.g.
commands.about.name
) or use the default English translation (e.g./about
)
- Load translations into memory at runtime, either all at once or dynamically based on which keys are needed
There are a few common strategies that implement this approach, along with easily accessible tooling.
GNU gettext
GNU gettext works as follows:
- Translations (referred to as “messages”) are stored in PO files (human-readable source files) and MO files (machine-readable binary files), with separate files for each supported language
- Translation keys (referred to as “message IDs”) contain the default, untranslated string content, and may refer to one or multiple target translations
- Libraries are provided for (and in some case bundled with) most major programming languages
In my opinion, GNU gettext falls short in multiple ways:
- Working with translations requires the use of multiple command-lines tools, which may include
xgettext
,msgmerge
andmsgfmt
depending on the specific project
- Translations are keyed using the default, untranslated string content, which means you have to update the key (and all usages of it) whenever the default translation changes
- The PO file format isn’t particularly friendly for non-developers, containing a lot of metadata about each translation, including which language-specific string formatting syntax to use
Java Resource Bundles
Java’s standard internationalisation approach works as follows:
- Translations are stored in Java resource bundles, a collection of similarly named
.properties
files in theUTF-8
encoding, with a separate file for each supported locale
- Translations are keyed using a descriptive key, most often dot-delimited - for example,
command.about.name
- Almost everything you need to work with these resource bundles is provided by the Java standard library, but no string formatting syntax is mandated or used by default, and there is no built-in support for plurals
In my opinion, Java resource bundles serve as a decent storage medium for apps running on the JVM, but fall short in several ways:
- As with most things in Java, a lot of boilerplate code is required to make this approach reasonably developer-friendly, or to load translations from multiple classpaths (e.g. when writing plugins)
- With no support for plurals or any mandated string formatting approach, resource bundles require the use of additional tooling, and a lot of application-specific documentation to explain how the implementation works
Message Formatting
If your chosen approach above doesn’t mandate a string formatting approach, you’ll need to decide on a format that best suits your software and development methodology.
All useful formats will implement some form of support for placeholders, which are tokens present in your translation strings that will be replaced with some data from your software. These may take the form of ordinal placeholders (indexed using sequential numbers), named placeholders (indexed using names), or both. If you need to pick, prefer named placeholders, as the names will be useful to your translators.
One of the most common solutions to this problem is the ICU message format, which is part of the Unicode Consortium’s suite of tools. It provides a number of advantages over developer-focused language-specific string formatting approaches, including both named and ordinal placeholders, a conditional syntax and support for plurals.
Personally, I consider ICU message format to be the current gold standard, and I’m keeping an eye on the progress of version 2.
Sourcing Translations
While there is no true replacement for a team of paid, professional translators, this isn’t really practical for the vast majority of software projects. There are several common approaches, which I’ll explain below.
Machine Translation
Machine translation involves using tools such as DeepL or Google Translate to automatically translate strings from a language you’re familiar with, to one you aren’t.
While this approach can be quick and cheap, I can’t recommend it for these reasons:
- Most translation tools are only designed (and able) to give you a rough translation, lacking contextual information, vernacular usage, and common slang, often resulting in output that looks unprofessional and unlike language actually spoken natively by adults
- Some translation tools provide results that are “too good to be true”, prioritising output that is grammatically correct and easy to read, but isn’t accurate to the meaning of the original string, which often results in severe confusion for your users
Crowdsourcing
Many projects (especially in the open-source world) use crowdsourced translation tools such as Crowdin, Weblate, and POEditor, collating translations provided by their community, and shipping them with their software.
If your project has an active community, this can be an excellent way of getting your strings translated. However, this relies strongly on volunteer effort, and you’ll need to validate and moderate any suggestions to make sure you aren’t the target of joke submissions or other malicious submissions.
Implementing Translations
Once you’ve covered everything else, it’s finally time to write some code and get things implemented. You’ll usually do this by utilising one of the many internationalisation libraries available, and many of them are excellent options.
In the next section, we’ll talk about how some of these libraries are designed, and later on, we’ll have a look at how we can improve things.
Current APIs
The most common approach taken by internationalisation libraries is a combination of high-level tools to load your translations, and low-level APIs for working with them afterwards.
For the most part, translation bundles and keys are simply referred to using strings, with languages either represented using standardised strings or locale objects. This is a simple approach with minimal complexity and cognitive overhead.
Example GNU gettext usage in Kotlin
Kotlin
// Use the system locale for this example
val resourceBundle = ResourceBundle.getBundle("bundle-name")
fun i18n(key: String) =
GettextResource.gettext(resourceBundle, key)
val commandName = i18n("/about")
A more complex example using ICU4J might look something like this:
Example resource bundle usage with ICU4J formatting in Kotlin
Kotlin
fun translate(
key: String,
locale: Locale,
placeholders: Array<Any?> = arrayOf()
): String? {
val resourceBundle = ResourceBundle.getBundle("bundle-name", locale)
val string = try {
resourceBundle.getString(key)
} catch (_: MissingResourceException) {
return key
}
// This is the standard "empty translation" string
if (string == "∅∅∅") {
return null
}
val formatter = MessageFormat(string, locale)
return formatter.format(placeholders)
}
val commandName = translate("command.about.name", Locale("en", "US"))
This approach is quite workable, and covers much of what you’ll need to handle translations in your software. However, there are still some drawbacks to this approach:
- There’s no way to pass a translation context around, making it difficult to account for many advanced use-cases
- Translation keys are just strings, meaning there’s no way to validate they’re correct at build time and there’s minimal IDE support for things like autocomplete
- A stringly-typed API can make it difficult or impossible to design a nice API around variadic arguments, which will most commonly be strings
Ideally, we can account for all common and uncommon use-cases, and come up with something that should work for everyone.
A Real Example
My primary project, Kord Extensions, was designed with internationalisation in mind, and it’s been this way for years. The approach taken was very similar to the more complex example above, but I’ve recently redesigned this system, and I believe we can do much better!
I’ll be referring to the Discord chat platform below. If you’re not familiar with Discord, please refer to the beginners’ guide for help with terminology.
Kord Extensions is a Discord bot framework, and that problem space makes internationalisation pretty complicated. Servers may set a preferred language, users can configure their language in the Discord client, and some communities may also need to use a language that Discord doesn’t support.
A Basic Example
Let’s say we’re working with English (UK), Spanish (Spain) and Toki Pona. We’d like to create a
/help
command, which explains to users how our bot works. This command should be named help
, ayuda
and sona
respectively.In this basic example, the more complex example above should cover our needs. We can tell users to provide a bundle name, set their command name/description to a string representing the translation key, and handle the translation logic internally.
Kotlin
publicSlashCommand {
bundle = "kordex.strings"
name = "extensions.help.commandName"
description = "extensions.help.commandDescription"
// ...
}
Adding Complexity
In our next example, let’s refer to the Kord Extensions Mappings module. The major functionality isn’t relevant here - what is important is that this module needs to register numerous commands that are very similar in functionality, aside from a few strings presented to the user.
For example, if we support Mojang, Quilt and Yarn mappings, we may want to have an information command for each. We could do the following, with a separate key created for each type:
Kotlin
command.mojang.info.description=Information about Mojang mappings
command.quilt.info.description=Information about Quilt mappings
command.yarn.info.description=Information about Yarn mappings
This doesn’t really make sense, given Mojang, Quilt and Yarn are names and shouldn’t be translated. Instead, we could use one single translation key, suitable for all of these commands:
Kotlin
command.generated.info.description=Information about {mappings} mappings
This is all well and good, but recall our previous example - we ask developers to provide the translation key as the command description. However, in this case, that’s not enough - there’s a placeholder in our translation string! Additionally, we have a lot of these similar commands, and it doesn’t make sense to define them one-by-one.
One solution would be to expand the command registration API, but that makes for a lot of clutter:
Kotlin
fun slashCommand(mappings: String) {
publicSlashCommand {
bundle = "kordex.func-mappings"
name = mappings
description = "command.generated.description"
descriptionPlaceholder("mappings" to mappings)
publicSubCommand {
bundle = "kordex.func-mappings"
name = "command.generated.info.name"
description = "command.generated.info.description"
descriptionPlaceholder("mappings" to mappings)
// ...
}
// ...
}
}
This problem would become exponentially worse as more translatable fields are added, resulting in a ton of clutter and code that isn’t particularly easy to understand. Instead, let’s consider an entirely different approach.
An Object-Oriented Approach
While it is occasionally useful to refer to translation keys and bundles using plain strings, I think we could do better by replacing them with objects. This is the new paradigm I’ve been working on for Kord Extensions (v2.3.0 and later), and I think it makes for a much nicer API.
Generated Types
Firstly, instead of relying on developers to correctly copy or type out translation keys, I decided to generate a structure of objects based on the developer’s translation bundle. This is an immediate improvement, providing concrete types that an IDE can help you with.
Kotlin
public object MappingsTranslations {
public val bundle: Bundle = Bundle("kordex.func-mappings")
public object Command {
public object Generated {
public val description: Key = Key("command.generated.description")
.withBundle(MappingsTranslations.bundle)
public object Info {
public val description: Key = Key("command.generated.info.description")
.withBundle(MappingsTranslations.bundle)
public val name: Key = Key("command.generated.info.name")
.withBundle(MappingsTranslations.bundle)
}
}
}
}
// ...
publicSlashCommand {
name = MappingsTranslations.Command.Generated.Info.name
// ...
}
The example above also hints at the next part of the solution…
Bundle and Key Objects
Instead of simply accepting a string in the translation API, I shifted my approach to an API that works with immutable rich types. This makes things a lot simpler, and also makes for a much more powerful API.
Bundle objects are nothing special - a simple data class wrapping the bundle name was plenty.
Key objects, on the other hand, are immutable data classes wrapping a translation key (at minimum) along with other data:
- The corresponding translation bundle
- The locale to use for the translation
- A set of preset placeholders, plus a way to configure whether they appear before or after other placeholders
- A configuration variable that dictates whether translation keys nested in placeholders should be automatically translated
These objects provide many API functions which allow users to make new copies of the Key object while adding/removing data, and they also provide direct access to the translation system.
Kotlin
// Old
val translations: TranslationsProvider = getKoin().get()
val message = translations.translate("messages.about", locale, arrayOf(botName))
// New
val message = Translations.Messages.about
.withLocale(locale)
.translate(botName)
Returning to the previous Mappings module example, this approach allows us to solve our problem in a much more sensible manner:
Kotlin
fun slashCommand(mappings: String) {
publicSlashCommand {
name = mappings
description = MappingsTranslations.Command.Generated.description
.withNamedPlaceholders("mappings" to mappings)
publicSubCommand {
name = MappingsTranslations.Command.Generated.Info.name
description = MappingsTranslations.Command.Generated.Info.description
.withNamedPlaceholders("mappings" to mappings)
// ...
}
// ...
}
}
While this is a relatively contrived example, an API like this makes room for many complex use-cases, and is a lot easier to understand at a glance, even when you don’t know how the API itself works in depth. It also reminds developers that they should translate their strings because they need to explicitly convert plain strings into Key objects:
Kotlin
publicSlashCommand {
description = "Information about Mojang mappings".toKey()
}
For more in-depth information on this approach, please refer to the Kord Extensions documentation, where I’ve explained things in detail.
The Next Step
While I’m happy with what I’ve created, I’m not so bold as to claim that my approach is perfect, nor assume that I’m the first developer to have come up with it. However, I’m genuinely curious why this approach doesn’t seem very common. I can’t imagine the object allocation overhead is anything to worry about on the JVM, though of course there are many other runtimes and languages that I wasn’t able to account for.
I do plan to extract this API from Kord Extensions before v2.3.0 releases, and make it its own library. Regardless, if you have any feedback on the idea, or you’re reading this from the future and using this in your own projects, please do get in touch - either in the comments, the Kord Extensions Discord server, or my socials!
Related Posts
Moderating with Empathy
This post was imported from my old blog at gserv.me. It was originally posted on the 3rd of June 2022, while I was still part of Quilt’s...
·
8 min read