Internationalization & Content in Multiple Languages

One of our big focuses this year is the internationalization of our content, allowing it to be understood natively in multiple languages and appeal to a wider audience. There are many factors to consider when creating internationalized content, from choosing a URL strategy to ensuring that your SEO is protected from duplicate content issues. If you do it right, you’ll be able to provide an excellent service to your user base and still keep all the search engines on board.

Choosing your URL routes for internationalized content

Let us first examine the choices for URL strategies. There is no ‘one right method’ of setting up locale specific URLS, but if you don’t have a logical method for housing your content you will quickly confuse your users. The content needs to remain easily navigable for all parties. If you dive in and start creating the content throughout your site without a clear goal in mind, you can end up with a huge mess at the end.

The main URL options that are commonly seen are as follows:

  • Creating a top level subdirectory per language locale (i.e., /cn/my/internationalized/content/ )
  • Creating a low level subdirectory for each content branch (i.e., /my/content/cn/)
  • Creating a subdomain per language locale (i.e., //cn.mydomain.com/my/internationalized/content)
  • Passing a variable through to a dynamic content page (i.e., /my/content/page.php?locale=zh_cn) — Not Recommended
  • Creating an alternately named file within each folder for the alternate language content (i.e., /my/content/page-zh_cn.php) — Not Recommended

While all of the above options will work (as will some others not mentioned here), it is strongly recommended that you avoid the last two options. Aside from being aesthetically unpleasant, they provide no clear logical strategy for your users to follow when navigating internationalized content and lack the SEO benefits provided by the other url structures.

Using a subdomain per locale will work, but it creates a host of issues that you will need to overcome. It is good if your internationalized site is going to be very different from your main content site, as it will allow you to completely separate analytics, but you can run into some serious cross-domain issues with it and it will not have the same SEO boost to your primary domain as using a top level subfolder will do.

The best way to enhance your domain authority for the content is to keep your internationalized content on the same domain as the rest of your content (not a subdomain), and use the folder structure to separate the internationalized content. Whether you use a top level or low level folder to do so will depend on the amount of information you are planning to present in alternate languages, and how many languages you will be presenting that information in.

Top Level Locale Distinctions – Best for Site Wide Translations

Using a top level folder to separate your internationalized content allows you to create fully translated paths to house the content, making it an even more tailored experience for your users. For example, on our www.internationalstudentinsurance.com website, we are currently developing two alternate language versions of the site — one in Chinese and one in Spanish. To do so, we have specified locale specific indicators at the top level and translated paths to all of the content.

For example, the resources for students on an f1 visa are structured as follows:

By doing so, we maintain the primary URL structure for the site and enable the url to add to the SEO value of the translated content as a stand alone resource for those language speakers.  Each language section grows at the pace the translators set and where translations are not available, we’re linking out to the English content. If we were using path variables or session values to choose the language locale, we could potentially end up with a lot of unexpected page not found errors as people attempted to access language content that does not yet exist.

Low Level Locale Distinctions – Best for Small/Isolated Sections of Translated Content in Multiple Languages

If you’re not planning to provide translation of the entire site, or if you have a specific section that you wish to provide many translations for, using the locale at the lowest navigation level works best. For example, on our www.internationalstudent.com website, we are now offering translated versions of the featured school displays. These can be chosen by each school and present their content in the languages they select to appeal to their targeted audience most directly.

For example, the SUNY Brockport school profile provides content in multiple languages, though the main site is not translated. Their options are as follows:

You will note that the primary URL remains the same, it’s just the final sub-directory that denotes the locale. The primary page layout remains the same and the main navigation items are not translated, so the locale setting is established not at the body level, but at the article level. For the Arabic translation, the direction attribute is also set at the article level. Where the content cannot be translated due to it’s dynamic nature (i.e., Twitter and Facebook feeds) the content is declared as left to right and english to ensure that the browser displays it properly.  This is a particular challenge as the content itself is dynamic and updated frequently.

Remember, when serving translated content, it’s important to keep all the content up to date, not just your default language content!

No matter which method you choose, it’s important that you generate the proper links between your translated content to notify the search engines that the content is a translation. This is a crucial step in avoiding the issue of duplicate content.

Internationalization and Meta Data – Linking Your Translated Content

The most important thing to remember when avoiding duplicate content flags, is to let the search engines know how pages are related.  Alternate hreflang link tags are an essential part of telling the search engine that you have content specifically tailored for a given language and how that content should be prioritized.

Each translated page should include alternate tags to the other language pages available, as well as the “x-default” option which lets the search engine know that is the appropriate page to display if it cannot match the locale.  If we look at the examples above, the meta data provided in all three pages shows the same alternate language information:

<link
    hreflang="x-default" 
    rel="alternate" 
    href="http://www.internationalstudentinsurance.com/f1student/"/>
<link
    hreflang="en"
    rel="alternate"
    href="http://www.internationalstudentinsurance.com/f1student/"/>
<link
    hreflang="es" 
    rel="alternate" 
    href="http://www.internationalstudentinsurance.com/espanol/f1estudiante/"/>
<link 
    hreflang="zh_cn" 
    rel="alternate" 
    href="http://www.internationalstudentinsurance.com/cn/f1qianzheng/"/>

The primary html open tag sets the language parameter for the page: <html lang="en_us">

Direct access to alternate language versions of the current page

It is also very important to give the user the ability to navigate between the translations of the page, so that if the user should arrive at the “wrong” language, they can find the one that is most appropriate to them. It is not recommended to automatically redirect the user to a different page based on geoip data. There are many schools of thought on the best way to present the links between pages, but the name of the language, written in that language appears to be the most consistent method. Some sites use flags rather than the language names, but this can lead to confusion and controversy.

Additionally, you need to pay special attention to building your sitemap and nest the translated pages within the main page item as described in the Google Webmaster documentation 

Adjusting Page Layout to Suit the Language

Left-to-Right Languages

When focusing on internationalization, ensure your character encoding is set to UTF-8!

If your content is being translated from one latin based or germanic language to another, there is not a lot you will need to do to update the page layout to enable it to render well. You should consider your images and swap them out for ones that speak more closely with your target audience for the page, especially if the images contain text that needs to be translated as well. Doing this is just one more step in showing that your content is truly tailored to the audience and not just an auto-translation of a page. If the search engines think that your internationalized page is just an auto-translation, it will not have the same value as a page that has had focused attention from a translator.

Right-to-Left Languages

Presenting content in languages that read right to left, such as Arabic, can present very unique challenges. You will need to ensure that not only is the hreflang set on the page, but that you’ve added the appropriate direction=”rtl” tags to the highest level appropriate element of the page. If your navigation systems are all still in english, you won’t want to set that at the page level, but at the top level block element that surrounds your translated content. You will also need to choose fonts that present the language properly, and this can be very challenging. There are not a lot of options available for non-western languages, outside of costly paid web-based font services. Google fonts does have a beta version of some arabic font options, but they are not yet well suited for production use. In addition to the font family selected, ensuring the font sizes are set properly is very important. To best manage these font differences, it is best to avoid inline font size declarations as much as possible, so that you can alter the display at the stylesheet level when you alter the font family. It has been our experience that it is necessary to boost the font size when presenting content in Arabic. It is also worth noting that padding-left, padding-right, float: right; float: left, etc. can be affected by your direction settings, so be sure that you’re looking closely at all those elements and how you use them.

Logographic Languages

Presenting content in Chinese or other logographic languages can also present unique challenges and opportunities for page layout and structure. The language presents in a more condensed format and is uniquely suited to a more columnar/blocky structure. Sometimes a phrase that can take a full inch of text in English can be presented as a single line of characters in Chinese, so the layout that works well in English or Spanish looks drastically different when presented in Chinese.

For example, compare the Spanish version of this FAQ page, to the Chinese version:

example of internationalization for es locale
example of internationalization for zh_cn locale

As with the right-to-left languages, font choice for logographic languages is very important, as is avoiding inline font-size alterations. The ability to set the font sizes at the stylesheet level will save you a lot of trouble in the long run. Using em values for the fonts, rather than px values provides a much smoother transition from western to non-western character displays.

When presenting internationalized content, it’s important to present that content in the form the user is accustomed to reading. Keeping your user happy goes a long way to keeping the search engines happy. If you focus on the user, the SEO will follow.

Have a question we didn’t get to? Check out the extensive Google FAQ on internationalization

 

This entry was posted in Search Engine Optimization, Technical Talk and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *