[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
General Programming Concepts: Writing and Debugging Programs

NLS Sample Program

This sample program fragment, foo.c, illustrates internationalization through code set independent programming.

Message Source File for foo

A sample message source file for the foo utility is given here. Note we defined only one set and three messages in this catalog for illustration purposes only. A typical catalog contains several such messages.

The following is the message source file for foo, foo.msg.

$quote "
$set MS_FOO
CANTOPEN        "foo: cannot open %s\n"
BYTECNT         "number of bytes: %d\n"
CHARCNT         "number of characters: %d

Creation of Message Header File for foo

To generate the run-time catalog, use the runcat command as follows:

runcat foo foo.msg

This generates the header file foo_msg.h as shown in the following section. Note that the set mnemonic is MS_FOO and the message mnemonics are CANTOPEN, BYTECNT, and CHARCNT. These mnemonics are used in the programs on the following pages.

/*
** The header file: foo_msg.h is as follows:
*/
#ifndef _H_FOO_MSG 
#define _H_FOO_MSG 
#include <limits.h>
#include <nl_types.h>
#define MF_FOO "foo.cat"
 
/* The following was generated from wc.msg. */
 
 
/* definitions for set MS_FOO */
#define MS_FOO 1
 
#define CANTOPEN 1
#define BYTECNT  2
#define CHARCNT  3
 
#endif 

Single Path Code Set Independent Version

The term single source single path refers to one path in a single application to be used to process both single-byte and multibyte code sets. The single source single path method eliminates all ifdefs for internationalization. All characters are handled the same way whether they are members of single-byte or multibyte code sets.

Single source single path is desirable but may degrade performance. Thus, it is not recommended for all programs. There may be some programs that do not suffer any performance degradation when they are fully internationalized; in those cases, use the single source single path method.

The following fully internationalized version of the foo utility supports all code sets through single source single path, code-set independent programming:

/*
 * COMPONENT_NAME: 
 *
 * FUNCTIONS: foo
 *
 * The following code shows how to count the number of bytes and
 * the number of characters in a text file.
 *
 * This example is for illustration purposes only. Performance
 * improvements may still be possible.
 *
 */
#include        <stdio.h>
#include        <ctype.h>
#include        <locale.h>
#include        <stdlib.h>
#include        "foo_msg.h"
#define MSGSTR(Num,Str) catgets(catd,MS_FOO,Num,Str)
/*
 * NAME: foo 
 * 
 * FUNCTION: Counts the number of characters in a file.
 *
 */  
main(argc,argv)
int argc;
char **argv;
{
    int     bytesread,   /* number of bytes read */
        bytesprocessed; 
    int     leftover;
    int     i;
    int     mbcnt;           /* number of bytes in a character */
    int     f;               /* File descriptor */
    int mb_cur_max;
    int    bytect;           /* name changed from charct... */
    int    charct;           /* for real character count */
    char   *curp, *cure;    /* current and end pointers into 
                               ** buffer */
    char         buf[BUFSIZ+1];
    nl_catd      catd;
    wchar_t    wc;
    /* Obtain the current locale */
    (void) setlocale(LC_ALL,"");
    /* after setting the locale, open the message catalog */
    catd = catopen(MF_FOO,NL_CAT_LOCALE);
    /* Parse the arguments if any */
    /* 
    ** Obtaint he maximum number of bytes in a character in the
    ** current locale.
    */
    mb_cur_max = MB_CUR_MAX;
    i = 1;
    /* Open the specified file and issue error messages if any */
    f = open(argv[i],0);
    if(f<0){
        fprintf(stderr,MSGSTR(CANTOPEN,              /*MSG*/
            "foo: cannot open %s\n"), argv[i]);      /*MSG*/
            exit(2);
    }
    /* Initialize the variables for the count */
    bytect = 0;
    charct = 0;
    /* Start count of bytes and characters  */
    leftover = 0;
    
    for(;;) {
        bytesread = read(f,buf+leftover, BUFSIZ-leftover);
        /* issue any error messages here, if needed */
        if(bytesread <= 0)
             break;
        buf[leftover+bytesread] = '\0'; 
                /* Protect partial reads */
        bytect += bytesread;
        curp=buf;
        cure = buf + bytesread+leftover;
        leftover=0;      /* No more leftover */
        for(; curp<cure ;){
            /* Convert to wide character */
            mbcnt= mbtowc(&wc, curp, mb_cur_max);
            if(mbcnt <= 0){
                mbcnt = 1;
            }else if (cure - curp >=mb_cur_max){
                wc = *curp;
                mbcnt =1;
            }else{
                /* Needs more data */
                leftover= cure - curp;
                strcpy(buf, curp, leftover);
                break; 
            }
            curp +=mbcnt;
            charct++;
        }
    }
        /* print number of chars and bytes */
    fprintf(stderr,MSGSTR(BYTECNT, "number of bytes:%d\n"),
            bytect);
    fprintf(stderr,MSGSTR(CHARCNT, "number of characters:%d\n"), 
            charct);
    close(f);
    exit(0);

Dual-Path Version Optimized for Single-Byte Code Sets

The term single source dual path refers to two paths in a single application where one of the paths is chosen at run time depending on the current locale setting, which indicates whether the code set in use is single-byte or multibyte.

If a program can retain its performance and not increase its executable file size too much, the single source dual path method is the preferred choice. You should evaluate the increase in the executable file size on a per command or utility basis.

In the single byte dual path method, the MB_CUR_MAX macro specifies the maximum number of bytes in a multibyte character in the current locale. This should be used to determine at run time whether the processing path to be chosen is the single-byte or the multibyte path. Use a boolean flag to indicate the path to be chosen, for example:

int mbcodeset ;
/* After setlocale(LC_ALL,"") is done, determine the path to
** be chosen.
*/
if(MB_CUR_MAX == 1)
        mbcodeset = 0;
else    mbcodeset = 1;

This way, the current code set is checked to see if it is a multibyte code set and if so, the flag mbcodeset is set appropriately. Testing this flag has less performance impact than testing the MB_CUR_MAX macro several times.

if(mbcodeset){
        /* Multibyte code sets (also supports single-byte
        ** code sets )
        */
        /* Use multibyte or wide character processing
        functions */
}else{
        /* single-byte code sets */
        /* Process accordingly */
}

This approach is appropriate if internationalization affects a small proportion of a module. Excessive tests for providing dual paths may degrade performance. Provide the test at a level that precludes frequent testing for this case.

This following version of the foo utility produces one object, yet at run time the appropriate path is chosen based on the code set to optimize performance for that code set. Note we distinguish between single and multibyte code sets only.

/*
 * COMPONENT_NAME: 
 *
 * FUNCTIONS: foo
 *
 * The following code shows how to count the number of bytes and
 * the number of characters in a text file.
 *
 * This example is for illustration purposes only. Performance
 * improvements may still be possible.
 *
 */
#include        <stdio.h>
#include        <ctype.h>
#include        <locale.h>
#include        <stdlib.h>
#include        "foo_msg.h"
#define MSGSTR(Num,Str) catgets(catd,MS_FOO,Num,Str)
/*
 * NAME: foo 
 *
 * FUNCTION: Counts the number of characters in a file.
 *
 */  
main(argc,argv)
int argc;
char **argv;
{
    int bytesread,  /* number of bytes read */
        bytesprocessed; 
    int   leftover;
    int   i;
    int   mbcnt;   /* number of bytes in a character */
    int   f;       /* File descriptor */
    int   mb_cur_max;
    int    bytect;             /* name changed from charct... */
    int    charct;             /* for real character count */
    char   *curp, *cure; /* current and end pointers into buffer                           */
    char        buf[BUFSIZ+1];
        nl_catd          catd;
        wchar_t    wc;
        /* flag to indicate if current code set is a  
        ** multibyte code set 
        */
        int     multibytecodeset;
        /* Obtain the current locale */
        (void) setlocale(LC_ALL,"");
    /* after setting the locale, open the message catalog */
    catd = catopen(MF_FOO,NL_CAT_LOCALE);
    /* Parse the arguments if any */
    /* 
    ** Obtain the maximum number of bytes in a character in the
    ** current locale.
    */
    mb_cur_max = MB_CUR_MAX;
    if(mb_cur_max >1)
        multibytecodeset = 1;
    else
        multibytecodeset = 0;
    i = 1;
    /* Open the specified file and issue error messages if any */
    f = open(argv[i],0);
    if(f<0){
        fprintf(stderr,MSGSTR(CANTOPEN,              /*MSG*/
            "foo: cannot open %s\n"), argv[i]);      /*MSG*/
            exit(2);
    }
    /* Initialize the variables for the count */
    bytect = 0;
    charct = 0;
    /* Start count of bytes and characters  */
    leftover = 0;
    
    if(multibytecodeset){ 
        /* Full internationalzation */
        /* Handles supported multibyte code sets */
        for(;;) {
            bytesread = read(f,buf+leftover, 
                    BUFSIZ-leftover);
            /* issue any error messages here, if needed */
            if(bytesread <= 0)
                break;
    
            buf[leftover+bytesread] = '\0'; 
                    /* Protect partial reads */
            bytect += bytesread;
            curp=buf;
            cure = buf + bytesread+leftover;
            leftover=0; /* No more leftover */
    
            for(; curp<cure ;){
                /* Convert to wide character */
                mbcnt= mbtowc(&wc, curp, mb_cur_max);
                if(mbcnt <= 0){
                    mbcnt = 1;
                }else if (cure - curp >=mb_cur_max){
                    wc = *curp;
                    mbcnt =1;

                }else{
                    /* Needs more data */
                    leftover= cure - curp;
                    strcpy(buf, curp, leftover);
                    break; 
                }
                curp +=mbcnt;
                charct++;
            }
        }
    }else {
        /* Code specific to single-byte code sets that
        ** avoids conversion to widechars and thus optimizes
        ** performance for single-byte code sets.
        */
        for(;;) {
            bytesread = read(f,buf, BUFSIZ);
            /* issue any error messages here, if needed */
            if(bytesread <= 0)
                    break;
                        
                bytect += bytesread;
                charct += bytesread;
              
        }
    }

        /* print number of chars and bytes */
    fprintf(stderr,MSGSTR(BYTECNT, "number of bytes:%d\n"),  
            bytect);
    fprintf(stderr,MSGSTR(CHARCNT, "number of characters:%d\n"),              charct);
    close(f);
    exit(0);

[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]