Module:Language/data/ISO 639-3/make: Difference between revisions
Jump to navigation
Jump to search
en>Legoktm (Replace Module:No globals with require( "strict" )) |
m (1 revision imported) |
(No difference)
|
Latest revision as of 09:10, 18 December 2022
This is a crude tool that reads a local copy of a iso-639-3_Name_Index_YYYYMMDD.tab file from sil.org and extracts the information necessary to create the data table held by Module:Language/data/ISO_639-3
Usage
To use this tool:
- open a blank sandbox page and paste this
{{#invoke:}}
into it at the top line:{{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=YYYYMMDD}}
- where YYYYMMDD is year, month, day from the .tab filename (used to place a file-date comment in Module:Language/data/ISO_639-3)
- go to and download the Complete Code Tables Set UTF-8 version zip file
- unzip the iso-639-3_Name_Index_YYYYMMDD.tab and open the file with a plain-text editor
- copy the data from the editor and paste it into the sandbox page below the
{{#invoke:}}
- click Show preview
- wait
- get result
There is some crude error checking that will insert an error message in the output. No guarantees that such messaging will be helpful. Search for the word 'error' in the tool's output.
require('strict');
local p = {};
--[=[------------------------< I S O _ 6 3 9 _ 3 _ E X T R A C T >---------------------------------------------
{{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=20170217}}
Reads a local copy of iso-639-3_Name_Index_YYYYMMDD.tab where (YYYYMMDD is the release date). Download that file
in zip form from http://www-01.sil.org/iso639-3/download.asp (use the UTF-8 zip)
useful lines in the file have the form:
<id>\t<name>\t<inverted name>\n
where:
<id> is the three-character ISO 639-3 language code
<name> is the language 'name'
<inverted name> is the language in 'last-name, first-name(s)' form; this part ignored
like this:
aaq Eastern Abnaki Abnaki, Eastern
when a language code has more than one name, the code is repeated for each additional name:
rar Cook Islands Maori Maori, Cook Islands
rar Rarotongan Rarotongan
]=]
function p.ISO_639_3_extract (frame)
local page = mw.title.getCurrentTitle(); -- get a page object for this page
local content = page:getContent(); -- get unparsed content
local lang_table = {}; -- languages go here
local code;
local names;
local file_date = 'File-Date: ' .. frame.args["file-date"]; -- set the file date line from |file-date=
for code, name in mw.ustring.gmatch (content, '%f[%a](%a%a%a)\t([^\t]+)\t[^\n]+\n') do -- get code and 'forward' name
if code then
if string.find (lang_table[#lang_table] or '', '^%[\"' .. code) then -- if this is an additional name for code ('or' empty string for first time when lang_table[#lang_table] is nil)
lang_table[#lang_table] = mw.ustring.gsub (lang_table[#lang_table], '}$', ''); -- remove trailing brace from previous name
lang_table[#lang_table] = lang_table[#lang_table] .. ', \"' .. name .. '\"}'; -- add this name with new brace
else
table.insert (lang_table, "[\"" .. code .. "\"] = {\"" .. name .. "\"}"); -- make new table entry
end
elseif not code then
table.insert (lang_table, "[\"error\"] = {" .. record .. "}"); -- code should never be nil, but inserting an error entry in the final output can be helpful
end
end
-- make pretty output
return "<br /><pre>-- " .. file_date .. "<br />return {<br />	" .. table.concat (lang_table, ',<br />	') .. "<br />	}<br />" .. "</pre>";
end
return p;