Creating multi-language web sites
From WhyNotWiki
This is about the problem that I and probably many others have faced, about how to create a web site that can display its content in multiple languages.
Contents |
[edit] How to store/organize the content
This is the most difficult of the issues related to multi-language web sites. There are many possible ways to solve this problem, and each has its own share of drawbacks, so whichever way you choose, it will involve careful consideration and a compromise.
http://lists.evolt.org/archive/Week-of-Mon-20041018/165389.html.
You have both the page content, and all the labels that go around it - stuff like navigation, instructions ("Search") etc. For the labels, you'd usually have a dictionary for each language, so navigation label 1234 is "Search" in English and "Rechercher" en Francais. You may also need to go to different levels for language differentiation: EN_GB, EN_US etc.
Don't forget images too!
Take a look at how (for example) ZenCart does it.
For the page content, you'll need to store the manually translated content in all your multiple languages for each content asset - each page with all its data items.
As has previously been mentioned, machine translation really, really doesn't work if you want to effectively communicate. Not only are there the issues of machine translators not understanding what you mean, but also the problem of nuance and idiom. Even between versions of English, there are significant risks of misunderstanding - I forget the phrase, but I've worked on a project where the exact same words meant *diametrically opposite* things in US and UK English.
...
Also, if you're in different language, you'll need to play the characterset game. Not every language (actually, very few) will display nicely in ASCII. Even among single byte, left-to-right languages, you've got a whole range of fun accents and non-ASCII characters. Google for ISO 8859 for more...
To be honest, many CMSs (particularly OSS ones used in multiple countries) will handle this for you out of the box. Otherwise, it's a Hard Thing to code from scratch.
http://www.stylusinc.com/Common/whitepapers/WhitePapers/Multi-Language%20Support.pdf.
[edit] Dynamic Content Generation
Although this method is a very complicated way of organizing your site to support different languages, it could be an option if you have only two languages, or even three to support on a fast server. [...]
In this method all the text of the site is stored in a database. Every page carries a variable (a session variable or a query string) to identify which language the site is to be displayed in. Based on that, the content is pulled out from the respective tables for the language chosen, and displayed.
You might now be wondering, what about Graphics? You have two choices. If the amount of graphics that your site uses is very minimal, you could consider storing them in the database itself as blob fields. Another way is to simply open up a new table with the following structure:
Name English German French
Stored in this manner, you could give each image a name, and store only the relative paths to the different images in the database. When pulling it onto the client page, get the path and pull it out from the file system.
The messages can be stored in the database in a similar format, except instead of "Name" use a unique ID for each message. This message can then be called in the necessary pages of the site. You could also declare an array which you include in all pages, that contains all the messages. Please take care to keep the message number's constant once assigned because if the messages re- shuffle it could be a tedious task to re-do all the messages on the site.
This method has many disadvantages. A few significant ones are:
There could be a performance degradation of the site if the amount of content of the site is huge.
Editing the site would require you to either directly edit the content in the tables, or alternatively provide an admin panel to edit the content of each page on the site!
The load on the database is too high which could lead to lower performance.
...
[edit] Selective Replication
Of the three methods we discuss in this article, this is the most efficient one. Although difficult to set up the first time, the maintenance effort is lower than the other two methods discussed. This method is used by many major websites, including Microsoft, for multi-language support.
In Selective Replication we have the main site, which has no content or images whatsoever. The various images sit in various folders marked EN, GR, ES, etc depending on the languages. All the files that go into each of these directories have the same names. So, the English logo file name will be logo.gif, and so will the logo file for the other languages too.
The content (messages, JavaScript alerts, etc) have two places in which they can be stored. One way is to store each individual message as separate text files, or an alternative way is to make them sit in an array which is included in every ASP file and the message that needs to appear is called from the array. Each language has a separate array which resides in its directory. So the array include depends on the language that is chosen by the user.
...
[edit] Conclusion
...
It would be a good idea to keep re-usability as priority one when designing the site. The more code/graphics/content you can make reusable for all the sites, the lesser the headache for maintenance and bug-fixing.
...
http://www.sitepoint.com/forums/showthread.php?t=374225#post2693547.
Normally you would refer to all on-screen text (error messages, menu items, form prompts etc.) using variables, and then including the variables from an external file. The language file included would depend on the language selected.
...
IMHO, more easy and correct way for translating already life site content is:
1. Set up gettext (or PEAR::i18n library) for current language
2. Change all existing output for gettext (i18n) format
After this step you have working site with one language but ready to translation.
3. Grab all gettext output in new-language file
4. Translate all terms in this file
After this step you will have working site with two languages.
---
Icheb give you right solution about navigation and templates. If you not using templates system, but you have "ideal" CSS-design - you can change stylesheets only.
After all you will have only one problem: how to internationalize DB information? There are several ways. Most easy and fast - create translated tables-clones for each exitsting table. I mean if now you allready have table articles then after adding spanish language you will have to tables: articles_en and articles_es All you need to change in code - just append current language code to table names in your SQL queries.
[edit] How to let the user select the language
http://lists.evolt.org/archive/Week-of-Mon-20041018/165389.html.
Next you need to work out which info to present to the user, and how. Browsers are pretty good at telling servers what the user's preferred language is. Right now, I'm in French Switzerland, in a client environment. My client provided PC has default browser prefs to request French. My laptop (on which I'm typing) is set to UK English. When I visit my own website, the interface (ie all the labels, helptext and so on) are in French on one machine, and English on the other, automagically.
If I had multilingual content too, that would also localise.
You might want to think about the costs & benefits of allowing users to set a preference without mucking about in the browser. That could be a simple cookie thing, or if you have user registration, then an explicit user preference.
But there's a fun problem specifically with the interface, particularly with tightly defined non-liquid layouts that take into account given lengths of nav text: the same information is a different size in different language. German in particular tends to be significantly longer than the English equivalent.
[edit] HTTP Accept-Language header
This relies on your user having their user agent (browser) set so that it sends the appropriate Accept-Language header with each request to your web site. I'm guessing it's a pretty safe assumption that they'll have their preferred language configured there properly (my web browser, for example, had it set to en-us already, for example, and I never had to explicitly set it to that).
Debian web site in different languages explains how to change the preferred language in various browsers.
http://www.w3.org/International/questions/qa-accept-lang-locales.
The HTTP Accept-Language header was originally only intended to specify the user's language. However, since many applications need to know the locale of the user, common practice has used Accept-Language to determine this information. It is not a good idea to use the HTTP Accept-Language header alone to determine the locale of the user. If you use Accept-Language exclusively, you may handcuff the user into a set of choices not to his liking.
For a first contact, using the Accept-Language value to infer regional settings may be a good starting point, but be sure to allow them to change the language as needed and specify their cultural settings more exactly if necessary. Store the results in a database or a cookie for later visits.
[edit] Web-based selection
This is probably the most common way I've seen it done. You can show a bunch of flags and let the user click on the flag corresponding to her language, or provide a similar user interface for selection.
Once selected, the user is taken to the version of the site in their preferred language. (It could either save a session variable with the name of the language and have it dynamically change, keeping the same URL, or it could redirect you to a different URL, possibly even a static copy of the site for the desired language...)
