Thank you for your questions.
RE: project and volume of credits
This is part of a larger project that is trying to build better displays and provide more effective search options for people looking for film and video in library catalogs. Library catalogs have trouble providing the kind of functionality that people have come to expect from e-commerce sites like Amazon. Part of the problem is that the form of the data in library records was designed in the late 1960s. These records have lots of free-text statements like "directed by Steven Spielberg" and we want to map these to a form that a computer can manipulate as data. Our main goal with this part of the project is to compile a set of correct answers for machine learning and evaluation. By training the computer to be sufficiently accurate on a known subset of records, we hope to then be able to apply that code to a larger set of records rather than having people interpret the text. I have tried to explain this in more detail at http://olac-annotator.org/#/about
There are actually several thousand credits needing annotation on our site, including many in English. For example, there are several hundred in the Thai file. However, there are not nearly so many unique translations as terms like "kamkap kansadǣng" appear numerous times. We need multiple variations on the same type of credit for the machine-learning aspect of the project.
I was not intending for anyone on this forum to do all or even very many translations. Even one or two would be helpful. Or we have some ongoing volunteers who try to do five or ten per week. Someone had suggested to me that perhaps your group would be a good place to find people who might be willing to help with the translation part of the project. However, it might not actually be a good fit. I do not have any funds to pay professional translators to help with this, which is why we decided to try crowdsourcing. Perhaps you (or someone on the forum) might have a suggestion for a better place to recruit volunteers.
Many of these translations don't require very advanced knowledge of the language in question. The problem is that we can't predict in advance which ones are going to be problems. Some of them are grammatically incorrect or have spelling errors that make them hard to interpret. Some of them lack context. Some of them use unusual constructions. Another challenge is that film, like many areas, has some specialized vocabulary, which has caused problems even for some native speakers of various languages who have been helping.
Unfortunately, we were not able to import the original texts for non-Roman scripts into our website even when they were available. The romanization systems that are supposed to be used in these records can be found at http://www.loc.gov/catdir/cpso/roman.html
, but you are likely to encounter errors or old data (e.g., U.S. libraries switched from Wade-Giles to Pinyin for Chinese, but some records never got updated).