Who is behind HanziCraft?

HanziCraft is solely developed & maintained by Niel de la Rouviere. He blogs as Confused Laowai, shares online Mandarin resources to the world via Social Mandarin, brings language learning blogs together at Polyglot Link and also learns other languages as a crazed linguaphile.

Where does the data come from?

HanziCraft would not exist without the massive amounts of work already done by other people & researchers.

How was this site created?

HanziCraft uses the open-source Node.js module, HanziJS, that is developed to enable programmers to work with Chinese characters and words. HanziJS is closely linked to HanziCraft as it is developed by the same person.

Characters in the decompostion section are blocks or question marks, how can I fix this?

If you have display problems, such as "blocks" or question marks appearing, please download & install the HanaZono font.

How does HanziCraft determine the example words?

The example words are based on the Leiden Weibo Corpus. It is a vocabulary frequency list that was created out of over 4 million weibo posts.

Once you enter a character, HanziCraft looks at all possible words found in a dictionary that contains the given character. After this it compares this list with the Leiden data.

These words that have been found are now relatively categorized on frequency based on their distribution of absolute frequency counts.

Thus, when you see high or medium frequency words, these are not absolute across all words, but based on the character and its possible words.

This makes sure that even for uncommon characters, you'll at least see example words. A good example is 酱, which according to the Junda Character Frequency Data, only appears at nr. 3002, but combines with a few useful words such as 酱油 (soy sauce).

However if frequency was absolute, then 酱油 would have been drowned out by a lot higher frequency words.