This simple segment of program counts number of Western words, number of Chinese characters, number of two-byte-width punctuation marks, from stdin. It writes, in this order, to stdout.
A Western word is defined to be any ASCII (plus a few Latin-1 codes) string that is separated by an ASCII blank space or by a Chinese character. A Chinese character is defined to be any Big-5 code, and a two-byte-width punctuation mark is determined by the S-code function s_punct. One can substract the third number from the second number to get a closer estimate on how many Chinese words are in a document.
For example, given the input
This is a測驗,and there is a全形 space.The UNIX wc thinks there are 6 words, note that there is a two-byte-width space between
is and a on 
the second line.
But zc thinks there are 8 Western words, 6 Chinese characters,
among which 2 are two-bute-width punctuations.
A full sized zc shall check more command line arguments, and may take more than one input file. But the basic operations are here.
#include <stdio.h>
#include "s_code.h"
#define IN 1  /*inside a word */
#define OUT 0 /* outside a word */
main() {
    int c, np, nw, nz, state;
    state = OUT;
    nw = nz = np = 0;
    while ((c = s_getchar(BIG5)) != EOF) {
	if (s_wd(c) == 2) {
	    ++nz;
	    if (s_punct(c)) ++np;
	    if (state == IN) state = OUT;
	}
	else {
	    if (s_space(c))
		state = OUT;
	    else if (state == OUT) {
		state = IN;
		++nw;
	    }
	}
    }
    printf("\t %d %d %d\n",nw, nz, np);
}
Created: Jan 14, 1996
Last Revised: Jan 14, 1996
© Copyright 1996 Wei-Chang Shann