Developing a locale file for OpenOffice.
Javier SOLA - www.khmeros.info - Last edited 05/04/2005 - Top
The written culture of a country is not only defined by its language. Many other conventions are applied on written texts.
As an example, in the US they write dates placing the name or number of month before the day of the month, capitalize the month and make the day of the month an ordinal (May 2nd, 1960), while in Spain the day is written first, the month is not capitalized and the word “de” has to be placed between the day and the month and between the month and the year (2 de mayo de 1960). In other countries the word “day” has to precede the day number and in others the year is written first. Some others will use the Buddhist or other calendar, changing dates altogether... and these are only a few examples.
The same applies to numbers (the period is used as a decimal separator in some countries and the comma is used in others), currency writing formats (number of decimals, currency identifier placed before or after, etc), measurement units and some other pieces of data.
Sorting order, word order, alphabetic order, collation order, or whatever you prefer to call it (all equivalent terms), is also script and culture dependent. Cultures that use the same script sometimes have different rules. In English the sorting order starts with “a b c d f…” while in Spanish it starts with “a b c ch d e f…”. Accents and diacritics are also classified differently in different languages.
All this data is usually referred to in the computer world as LOCALE data. Locale data either refers to the general use of a language (independently of which country or region it is used in) or to the specific use of a language in a given country or regions (the conventions for the use of the Spanish language is Spain are different to the ones used for the Spanish language as used in Chile).
Programs that are localized to many languages tend to place all the data related to a language or to a country (region) in a file called a LOCALE for that culture.
OpenOffice requires that you place all the cultural data for your language/region in a file that has a format specific to OpenOffice. This file is an XML file that is plain text (utf-8 if your language requires it). It can be edited with any plain text editor.
By now, you should know your locale name (in this case LanguageCode.xml or languageCode_regionCode.xml). If you don't, you can find them here. Language codes - iso639-2Country codes - iso3166-1 The locale name in OpenOffice always uses both the language code on the country code (language code in small letters and country code in capital letters), separated by an underscore (not a hyphen), and with the .xml extension. Some examples of names are: km_KH.xml, es_CL.xml, en_US.xml, etc. Please look into the directory that contains all the locale files presently included in OpenOffice:
http://l10n.openoffice.org/source/browse/l10n/i18npool/source/localedata/data/
If there is a file for your culture already, you might want to check it to see if any corrections are needed. If they are, you should file an issue with OpenOffice related to mistakes in the file.
If there is no file… then follow the instructions below on how to create and submit a locale.
To try to make it easy for you, we have created a template file that originates in the present en-US.xml file. It includes most of the information that will be needed; other information is inherited from the en-US.xml (English in the United States) file. The information that we have not included in our template file (that we have included as inherited) is not important for first level localization work. If you localize the template.xml file to your own culture you will have everything that you need.
We will now go through this file point by point to see what needs to be changed:
<?xml
version="1.0" encoding="UTF-8"?>
<Language> <LangID>en</LangID> <DefaultName>English</DefaultName> </Language> <Country> <CountryID>US</CountryID> <DefaultName>United States</DefaultName> </Country> </LC_INFO>
<LC_CTYPE unoid="generic"> <Separators>
<ThousandSeparator>,</ThousandSeparator> <DecimalSeparator>.</DecimalSeparator> <ListSeparator>;</ListSeparator>
NNNNMMMM DD, YYYY
As the automatic separator is inserted, and if we consider the <LongDateDayOfWeekSeparator> ", ", it will be interpreted in this template locale as something like:
Wednesday, March 12, 2023
<DateSeparator>/</DateSeparator> <TimeSeparator>:</TimeSeparator> <Time100SecSeparator>.</Time100SecSeparator> <LongDateDayOfWeekSeparator>, </LongDateDayOfWeekSeparator> <LongDateDaySeparator>, </LongDateDaySeparator> <LongDateMonthSeparator> </LongDateMonthSeparator> <LongDateYearSeparator> </LongDateYearSeparator> </Separators>
<Markers> <QuotationStart>‘</QuotationStart> <QuotationEnd>’</QuotationEnd> <DoubleQuotationStart>“</DoubleQuotationStart> <DoubleQuotationEnd>”</DoubleQuotationEnd> </Markers>
<TimeAM>AM</TimeAM> <TimePM>PM</TimePM>
<MeasurementSystem>metric</MeasurementSystem> </LC_CTYPE>
http://www.microsoft.com/globaldev/reference/lcid-all.mspx
if your language does not appear in this list, then say so in the OpenOffice Localization list and they will assign a number for you.
<LC_FORMAT replaceFrom="[CURRENCY]" replaceTo="[$$-409]">
· Dates are the following part. A large number of formats are used in OpenOffice, each one has different amount of information and format. Here in the example the data is structured in the usual US format, with the month before the day of the month and then the year (I wonder who ever came out with this order). You will most probably have to change the order, but it is a good idea to maintain the separators and the amount of information in each format. The formats are build around this table of “placeholder” letters: § Era G § Year Y § Month M (when within a date) § Day D § DayofWeek N § DayOfWeek A ? (probably in old specifications, not used now) § Quarter Q § Hour H § Minutes M (when used within a time structure) § Seconds S § 1/100 of sec. 00
The number of times a placeholder letter is repeated is an indication of the number of characters to be used, but the number of characters does not always correspond exactly with the number of times the placeholder letter is repeated in the format. For example: ‘D’ means day of the month, with one or two characters (as needed: 2 for day 2, 12 for day 12), ‘DD’ means that two digits must always be used (day 5 must be 05). M is a one or two digit month, MM a two digit month, MMM a three letter month (short month name) and MMMM a long month name (long and short month names for your language are defined further down in the locale). Of the following block you only have to localize what is in the <FormatCode> lines, don’t touch the other lines. Do not change the format for entries numbered 32 and 33, they correspond to data in a specific ISO formats (year first). Also, pay special attention to using the same number of letters for each piece of data in dates 21 and 47 (change only the order of the elements if needed). Spaces inside the dates are significant. If you include a space, it will be included in the printed date. In some cases you will see that they day of the week is attached to the name of the month (no spaces between). This is because here OpenOffice automatically includes between them the <LongDateDayOfWeekSeparator> that you defined above (comma + space for US English). Inside the date format, you can include strings with text before, between or after the placeholder letters, such as in D ¨de¨ MMMM ¨de¨ YYYY for Spanish. This is because dates in Spanish should be printed with these words in the middle: 2 de mayo de 1960 (note that there are spaces outside the quotes, which are significant. IF they had been inside the quotes, they would also be taken into account. What you should never do is to put spaces inside and outside, then they would be duplicated in the final date. Inside some of the date format, you will see “AM/PM”. Do not translate these. They are placeholders for the words for AM and PM that you have defined in <TimeAM> and <TimePM> The placeholder M is used in two different situations. When used within a date, it means "month", but when used within a time structure, it means "minute". Note that it might be used with both meanings within the same format.
<FormatElement msgid="DateFormatskey1" default="true" type="short" usage="DATE" formatindex="18"> <FormatCode>M/D/YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey2" default="false" type="medium" usage="DATE" formatindex="28"> <FormatCode>NN DD/MMM YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey3" default="false" type="medium" usage="DATE" formatindex="34"> <FormatCode>MM/YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey4" default="false" type="medium" usage="DATE" formatindex="35"> <FormatCode>MMM DD</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey5" default="false" type="medium" usage="DATE" formatindex="36"> <FormatCode>MMMM</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey6" default="false" type="medium" usage="DATE" formatindex="37"> <FormatCode>QQ YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey7" default="false" type="medium" usage="DATE" formatindex="21"> <FormatCode>MM/DD/YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey8" default="true" type="medium" usage="DATE" formatindex="20"> <FormatCode>MM/DD/YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey9" default="true" type="long" usage="DATE" formatindex="19"> <FormatCode>NNNNMMMM DD, YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey10" default="false" type="long" usage="DATE" formatindex="22"> <FormatCode>MMM D, YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey11" default="false" type="long" usage="DATE" formatindex="23"> <FormatCode>MMM D, YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey12" default="false" type="long" usage="DATE" formatindex="25"> <FormatCode>MMMM D, YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey13" default="false" type="long" usage="DATE" formatindex="27"> <FormatCode>NN, MMM D, YY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey14" default="false" type="long" usage="DATE" formatindex="29"> <FormatCode>NN, MMMM D, YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey15" default="false" type="long" usage="DATE" formatindex="30"> <FormatCode>NNNNMMMM D, YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey16" default="false" type="long" usage="DATE" formatindex="24"> <FormatCode>D. MMM. YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey17" default="false" type="long" usage="DATE" formatindex="26"> <FormatCode>D. MMMM YYYY</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey18" default="false" type="short" usage="DATE" formatindex="31"> <FormatCode>MM-DD</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey19" default="false" type="medium" usage="DATE" formatindex="32"> <FormatCode>YY-MM-DD</FormatCode> <DefaultName>ISO 8601</DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey20" default="false" type="medium" usage="DATE" formatindex="33"> <FormatCode>YYYY-MM-DD</FormatCode> <DefaultName>ISO 8601</DefaultName> </FormatElement> <FormatElement msgid="DateFormatskey21" default="false" type="medium" usage="DATE" formatindex="38"> <FormatCode>WW</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey1" default="false" type="short" usage="TIME" formatindex="39"> <FormatCode>HH:MM</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey2" default="false" type="medium" usage="TIME" formatindex="40"> <FormatCode>HH:MM:SS</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey3" default="true" type="short" usage="TIME" formatindex="41"> <FormatCode>HH:MM AM/PM</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey4" default="true" type="medium" usage="TIME" formatindex="42"> <FormatCode>HH:MM:SS AM/PM</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey5" default="false" type="medium" usage="TIME" formatindex="43"> <FormatCode>[HH]:MM:SS</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey6" default="false" type="short" usage="TIME" formatindex="44"> <FormatCode>MM:SS.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="TimeFormatskey7" default="false" type="medium" usage="TIME" formatindex="45"> <FormatCode>[HH]:MM:SS.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateTimeFormatskey1" default="true" type="medium" usage="DATE_TIME" formatindex="46"> <FormatCode>MM/DD/YY HH:MM AM/PM</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="DateTimeFormatskey2" default="false" type="medium" usage="DATE_TIME" formatindex="47"> <FormatCode>MM/DD/YYYY HH:MM:SS</FormatCode> <DefaultName></DefaultName> </FormatElement>
<FormatElement msgid="FixedFormatskey1" default="true" type="medium" usage="FIXED_NUMBER" formatindex="0"> <FormatCode>General</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="FixedFormatskey2" default="true" type="short" usage="FIXED_NUMBER" formatindex="1"> <FormatCode>0</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="FixedFormatskey3" default="false" type="medium" usage="FIXED_NUMBER" formatindex="2"> <FormatCode>0.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="FixedFormatskey4" default="false" type="short" usage="FIXED_NUMBER" formatindex="3"> <FormatCode>#,##0</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="FixedFormatskey5" default="false" type="medium" usage="FIXED_NUMBER" formatindex="4"> <FormatCode>#,##0.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="FixedFormatskey6" default="false" type="medium" usage="FIXED_NUMBER" formatindex="5"> <FormatCode>#,###.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey1" default="true" type="short" usage="CURRENCY" formatindex="12"> <FormatCode>[CURRENCY]#,##0;-[CURRENCY]#,##0</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey2" default="false" type="medium" usage="CURRENCY" formatindex="13"> <FormatCode>[CURRENCY]#,##0.00;-[CURRENCY]#,##0.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey3" default="false" type="medium" usage="CURRENCY" formatindex="14"> <FormatCode>[CURRENCY]#,##0;[RED]-[CURRENCY]#,##0</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey4" default="true" type="medium" usage="CURRENCY" formatindex="15"> <FormatCode>[CURRENCY]#,##0.00;[RED]-[CURRENCY]#,##0.00</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey5" default="false" type="medium" usage="CURRENCY" formatindex="16"> <FormatCode>#,##0.00 CCC</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="CurrencyFormatskey6" default="false" type="medium" usage="CURRENCY" formatindex="17"> <FormatCode>[CURRENCY]#,##0.--;[RED]-[CURRENCY]#,##0.--</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="PercentFormatskey1" default="true" type="short" usage="PERCENT_NUMBER" formatindex="8"> <FormatCode>0%</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="PercentFormatskey2" default="true" type="long" usage="PERCENT_NUMBER" formatindex="9"> <FormatCode>0.00%</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="ScientificFormatskey1" default="true" type="medium" usage="SCIENTIFIC_NUMBER" formatindex="6"> <FormatCode>0.00E+000</FormatCode> <DefaultName></DefaultName> </FormatElement> <FormatElement msgid="ScientificFormatskey2" default="false" type="medium" usage="SCIENTIFIC_NUMBER" formatindex="7"> <FormatCode>0.00E+00</FormatCode> <DefaultName></DefaultName> </FormatElement> </LC_FORMAT>
<LC_COLLATION> <Collator unoid="alphanumeric" default="true"/> <CollationOptions> <TransliterationModules>IGNORE_CASE</TransliterationModules> </CollationOptions> </LC_COLLATION>
<LC_SEARCH> <SearchOptions> <TransliterationModules>IGNORE_CASE</TransliterationModules> </SearchOptions> </LC_SEARCH>
<LC_INDEX> <IndexKey unoid="alphanumeric" default="true" phonetic="false">A-Z</IndexKey>
and click on view. This will show you a file that has the last version of this list… but unfortunately, the entries are not numbered. If you find the script for your language there, you have to start counting from the beginning (BasicLatin is number 0) and figure out what is the correct number of your <UnicodeScript>. If your script is in Unicode, but you still do not find the script listed here, you should write to the L10N@openoffice.apache.org list mentioning this and asking what should be done.
<UnicodeScript>0</UnicodeScript> <UnicodeScript>1</UnicodeScript>
<FollowPageWord>p.</FollowPageWord> <FollowPageWord>pp.</FollowPageWord> </LC_INDEX>
Use the right capitalization. In English the Months are written capitalizing the first letter, in other languages months are written all in small letters.
<LC_CALENDAR> <Calendar unoid="gregorian" default="true"> <DaysOfWeek> <Day> <DayID>sun</DayID> <DefaultAbbrvName>Sun</DefaultAbbrvName> <DefaultFullName>Sunday</DefaultFullName> </Day> <Day> <DayID>mon</DayID> <DefaultAbbrvName>Mon</DefaultAbbrvName> <DefaultFullName>Monday</DefaultFullName> </Day> <Day> <DayID>tue</DayID> <DefaultAbbrvName>Tue</DefaultAbbrvName> <DefaultFullName>Tuesday</DefaultFullName> </Day> <Day> <DayID>wed</DayID> <DefaultAbbrvName>Wed</DefaultAbbrvName> <DefaultFullName>Wednesday</DefaultFullName> </Day> <Day> <DayID>thu</DayID> <DefaultAbbrvName>Thu</DefaultAbbrvName> <DefaultFullName>Thursday</DefaultFullName> </Day> <Day> <DayID>fri</DayID> <DefaultAbbrvName>Fri</DefaultAbbrvName> <DefaultFullName>Friday</DefaultFullName> </Day> <Day> <DayID>sat</DayID> <DefaultAbbrvName>Sat</DefaultAbbrvName> <DefaultFullName>Saturday</DefaultFullName> </Day> </DaysOfWeek> <MonthsOfYear> <Month> <MonthID>jan</MonthID> <DefaultAbbrvName>Jan</DefaultAbbrvName> <DefaultFullName>January</DefaultFullName> </Month> <Month> <MonthID>feb</MonthID> <DefaultAbbrvName>Feb</DefaultAbbrvName> <DefaultFullName>February</DefaultFullName> </Month> <Month> <MonthID>mar</MonthID> <DefaultAbbrvName>Mar</DefaultAbbrvName> <DefaultFullName>March</DefaultFullName> </Month> <Month> <MonthID>apr</MonthID> <DefaultAbbrvName>Apr</DefaultAbbrvName> <DefaultFullName>April</DefaultFullName> </Month> <Month> <MonthID>may</MonthID> <DefaultAbbrvName>May</DefaultAbbrvName> <DefaultFullName>May</DefaultFullName> </Month> <Month> <MonthID>jun</MonthID> <DefaultAbbrvName>Jun</DefaultAbbrvName> <DefaultFullName>June</DefaultFullName> </Month> <Month> <MonthID>jul</MonthID> <DefaultAbbrvName>Jul</DefaultAbbrvName> <DefaultFullName>July</DefaultFullName> </Month> <Month> <MonthID>aug</MonthID> <DefaultAbbrvName>Aug</DefaultAbbrvName> <DefaultFullName>August</DefaultFullName> </Month> <Month> <MonthID>sep</MonthID> <DefaultAbbrvName>Sep</DefaultAbbrvName> <DefaultFullName>September</DefaultFullName> </Month> <Month> <MonthID>oct</MonthID> <DefaultAbbrvName>Oct</DefaultAbbrvName> <DefaultFullName>October</DefaultFullName> </Month> <Month> <MonthID>nov</MonthID> <DefaultAbbrvName>Nov</DefaultAbbrvName> <DefaultFullName>November</DefaultFullName> </Month> <Month> <MonthID>dec</MonthID> <DefaultAbbrvName>Dec</DefaultAbbrvName> <DefaultFullName>December</DefaultFullName> </Month> </MonthsOfYear>
<Eras> <Era> <EraID>bc</EraID> <DefaultAbbrvName>BC</DefaultAbbrvName> <DefaultFullName>Before Christ</DefaultFullName> </Era> <Era> <EraID>ad</EraID> <DefaultAbbrvName>AD</DefaultAbbrvName> <DefaultFullName>Anno Domini</DefaultFullName> </Era> </Eras>
<StartDayOfWeek><DayID>sun</DayID></StartDayOfWeek>
<MinimalDaysInFirstWeek>1</MinimalDaysInFirstWeek> </Calendar> </LC_CALENDAR>
http://nsdsa.phdnswc.navy.mil/mspecs/docs/styleman2000/chapter_txt-17.html#17t6
Currency symbol should be in your own script.
Currency codes (<BankSymbol>), <CurrencyName> and <DecimalPlaces> come from the ISO4217 list can be found in here. If your currency is new one and is not in here, you should try to find it by yourself in your country, because if you go the standards body mantainer (BSI Global) , they will make you PAY for the data.
<LC_CURRENCY> <Currency default="true" usedInCompatibleFormatCodes="true"> <CurrencyID>dollar</CurrencyID> <CurrencySymbol>$</CurrencySymbol> <BankSymbol>USD</BankSymbol> <CurrencyName>US Dollar</CurrencyName> <DecimalPlaces>2</DecimalPlaces> </Currency> </LC_CURRENCY>
<LC_TRANSLITERATION> <Transliteration unoid="LOWERCASE_UPPERCASE"/> <Transliteration unoid="UPPERCASE_LOWERCASE"/> <Transliteration unoid="IGNORE_CASE"/> </LC_TRANSLITERATION>
<LC_MISC> <ReservedWords> <trueWord>true</trueWord> <falseWord>false</falseWord>
<quarter1Word>1st quarter</quarter1Word> <quarter2Word>2nd quarter</quarter2Word> <quarter3Word>3rd quarter</quarter3Word> <quarter4Word>4th quarter</quarter4Word>
<aboveWord>above</aboveWord> <belowWord>below</belowWord>
<quarter1Abbreviation>Q1</quarter1Abbreviation> <quarter2Abbreviation>Q2</quarter2Abbreviation> <quarter3Abbreviation>Q3</quarter3Abbreviation> <quarter4Abbreviation>Q4</quarter4Abbreviation> </ReservedWords> </LC_MISC>
<LC_NumberingLevel>
<LC_OutLineNumberingLevel>
</Locale>
And you are finished. Save your file, check it a couple of times and then submit it as an ENHANCEMENT issue against the Localization (L10n) project and submit the file. To submit an issue you first need to login into the OpenOffice website, then hit File Issue on the left hand menu… go to proceed in the next page… click in the component l10n in the next one… are you are ready to file it. Select version current, subcomponent code, type ENHANCEMENT, Summary Locale file for language…., and hit Submit. The system will ask you if you want to attach a file and what type. Attach the file that you have been working on, submit it… and you are done.
If you would - nevertheless - like to prepare a more developed locale, please look at the following documents:
http://l10n.openoffice.org/i18n_framework/HowToAddLocaleInI18n.html
http://l10n.openoffice.org/source/browse/l10n/i18npool/source/localedata/data/locale.dtd
http://l10n.openoffice.org/i18n_framework/LocaleData.html
http://api.openoffice.org/docs/common/ref/com/sun/star/i18n/NumberFormatIndex.html
|