The Mozilla Foundation has a policy of only including software in our CVS repository, and in builds we distribute, which is available under our MPL/LGPL/GPL tri-licence or a compatible licence (e.g. the BSD licence or an MPL/LGPL dual licence). The reason for this policy is to present a simple story to people who want to use or distribute our code. We want to be able to say: “Happy with the MPL? OK – use what you like.”
Many localisation teams come to us asking if they can include a dictionary with their localisation of Thunderbird. However, a lot of available free software dictionaries, including many of those used by the OpenOffice project, are available under the GPL or LGPL alone.
[Sidenote: why on earth do people use the LGPL for data? It doesn’t make any sense. For a start, a straight reading prohibits modification of the data. “The modified work must itself be a software library”, section 2a). They should have said “Library”, not “software library”.]
What can be done? Compiling a dictionary is a lengthy process, and it’s work that no-one wants to have to repeat.
So here’s an idea. You create a bit of code which checks text against the dictionary. This code is covered by the same licence as the dictionary. You then feed it a large number of documents in the relevant language. Words which fail the spell check are discarded; words which pass are added to an output file. Eventually, the output file is a new dictionary which you created, and you can licence how you choose. After all, one can’t claim that the licence of spell-checking software infects documents it spell-checks.
The questions are: Is it technically possible? If so, does it produce a useful dictionary? If so, is it legal? If so, is it moral?