Association for the Promotion of Languages via Lexicography and Open Data

State of art

Origins of our project

Open data is becoming a major concern in our society, so anyone can share any knowledge or information with anyone. In France, state authorities are slowly starting to make databases on :geography, trafic, raod work, general information on how they work, etc.

However, there is very little available data on science, other than online, which may be free, but imposed on the websites providing them. There is no comparison with what we can find in the english-speaking parts of the world.

The lack of free lexicography ressources (dictionaries, language tools for translation or localization, etc.) is obvious. This is the direct result of a lack of initial data such as words lists with grammar infos that would give the words unity and enable reliable analysis (there would be no possible ambiguity on words).

Having a multilingual data with large words lists available as Open Data would be an undeniable advantage for research (human sciences or basic search for translation), the preservation of local cultural heritage and private companies activities (which could then focus more on their initial goals and not lose time on looking for these data, or worse, creating them), as well as for teaching, or even for anyone with a personal project.

Many countries or groups supporting the preservation of language heritage with few means could then benefit of a specialist they might not have around.

Brief state-of-the-art of the lexicographic and linguistic resources and tools

The links on the right show the details of the internet searches we ran throughout a whole year. Our researches are not exhaustive. Let’s not forget, for example, that something online can disappear or change of address from one day to another. This makes their finding and long-term visibility random. For example, have you ever seen how many links on some websites are simply dead ?

We could keep on giving examples, but this small "state of affairs" seems quite enlightening : the data currently available is very incomplete and "spread out", which our project wishes to set right. For example, automatic translators offer the possibility to give a raw translation, but not to choose between two translations in case of polysemy. In order to do so, a dictionary is necessary, because it lists the different meanings of a word, but doesn’t always give the context and almost never the etymology of a word.

Moreover, our project aspires to more rigor, especially regarding the data listing and updating. Unfortunately, we realized that even well-known oganizations neglect some details in their online ressources, which, according to us, are still very important. The project APLLOD seeks to benefit from the participative nature of the Internet to ensure the ongoing progress of its content, thanks to the attention and regular control of our translators and linguists team, who wil be responsible for the quality of our site.

In the end, we plan on sharing data usable by the most people. For that, we will work on making available an efficient and intuitive platform to the users. We also plan on using reusable formats, adapted to many applications, so that open-source ressources work also means choosing information.