<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
  <Document>
    <name><![CDATA[Wordlists of languages of Indonesia and the Philippines]]></name>
    <description><![CDATA[Recordings and transcription of a large wordlist (c. 1,100 items) for many languages of Indonesia and the Philippines. Additional textual data was collected for a subset of languages.

Made as part of the OCSEAN (OCeanic and South East Asian Navigators; MSCA-RISE-2019,  Project Number 873207) project. The initial workshop was held in Uppsala, Sweden from 20 June to 15 July 2022.]]></description>
    <ExtendedData>
      <Data name="ghap_url"><![CDATA[https://test-ghap.tlcmap.org/publicdatasets/1315]]></Data>
    </ExtendedData>
    <Style id="TLCMapStyle">
                <IconStyle>
                <scale>1</scale>
                <Icon>
                  <href>https://tlcmap.org/img/mapicons/dotorangepip1.png</href>
                </Icon>
                </IconStyle>
                </Style>
    <Placemark>
      <Point>
        <coordinates>102.164,11.2255</coordinates>
      </Point>
      <name><![CDATA[Amanatun wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A wordlist of 1228 items in the Amanatun variety/dialect of Uab Meto. The list was recorded with Alfred in three sessions: evening 04/07/2022, evening 05/07/2022, and afternoon 16/07/2022. Due the 20 minute limit of the recording device there are eight recordings in total

1. 1-220 (recorded 04/07/2022, by I Made Netra)
2. 221-233 (recorded 04/07/2022, by I Made Netra)
3. 234-394 (recorded 05/07/2022, by I Made Netra)
4. 395-439 (recorded 16/07/2022, by Owen Edwards)
5. 440-695 (recorded 16/07/2022, by Owen Edwards)
6. 696-884 (recorded 16/07/2022, by Owen Edwards)
7. 885-1076 (recorded 16/07/2022, by Owen Edwards)
8. 1077-1228 (recorded 16/07/2022, by Owen Edwards)

OCSEAN-AOZ_20220715-WORDLIST.wav contains all eight recordings concatenated together in a single file.

OCSEAN-AOZ_20220715-WORDLIST_TEXTGRID.txt contains a text-grid file made in Praat. This contains two tiers: the top tier with the number of the item, and the bottom tier with a broad phonetic transcription made by Owen Edwards. Different responses to the same prompt are differentiated alphabetically (e.g. 1a, 1b).

OCSEAN-AOZ_20220715-WORDLIST.txt is a text file which contains the cleaned up version of all the Uab Meto data collected with the orthographic transcriptions made by Alfred Snae and broad phonetic transcriptions made by Owen Edwards.

In Owen Edward’s transcriptions three phonetic heights are distinguished among the mid vowels: mid-low [ɛ] and [ɔ], mid-high [e] and [o], and slightly higher [ɪ] and [ʊ]. All these mid vowels are transcribed by Alfred Snae (the consultant) as <e> and <o>. In final syllables mid-high [e o] and slightly higher [ɪ ʊ] are historically high vowels which have lowered after a (historically) penultimate mid-low vowel. In penultimate syllables, mid-high [e o] are mid vowels which have raised before a (historically) high vowel; e.g.*okiʔ *[ʔɔkiʔ] ‘wave’ > [ʔokeʔ], *mepu *[mɛpu] > [mepo]. All vowel-initial words begin with a glottal stop. Long vowels analysable as sequences of two identical vowels are transcribed with two vowel symbols. Stress regularly falls on the penultimate vowel of a word.

OCSEAN-AOZ_20220715-WORDLIST.pdf contains the first 393 items as filled in by hand by I Made Netra in IPA.

OCSEAN-AOZ_20220715-WORDLIST_TYPED.pdf contains a typed up version of the wordlist, made by Alfred Snae in orthographic transcription. Meto items are given in brackets after the Indonesian prompt.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc440b'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-16</begin>
        <end>2022-07-16</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-AOZ_20220704]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Uab Meto - aoz]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>102.164,11.2255</coordinates>
      </Point>
      <name><![CDATA[Ba'a Wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A wordlist of 1228 items recorded in the Ba'a from Oelunggu village, Rote.

The wordlist was recorded in 23 sessions, roughly corresponding to semantic domains. Metadata of the speaker is recorded on the first recording.

1. 1-45; The physical world
2. 46-85; The physical world
3. 86-197; People, Animals
4. 198-249; Animals
5. 250-274; Animals
6. 275-407; The Body
7. 408-440; The Body
8. 441-554; Food and drink, Clothing and grooming, The house
9. 555-669; Agriculture and vegetation
10. 670-722; Basic actions
11. 723-785; Motion
12. 786-810; Possession
13. 811-835; Spatial relations
14. 838-884; Spatial relations
15. 885-932; Quantity
16. 933-970; Time
17. 971-1032; Sense perception
18. 1033-1076; Emotions and values
19. 1077-1125; Cognition
20. 1126-1156; Speech and language
21. 1157-1175; Social and political relations
22. 1176-1200; Warfare and hunting
23. 1201-1228; Law/Religion and belief

Each recording is accompanied by a video (.mp4) an audio file extracted from the video (.wav) and a backup recording made with a separate device. Items 1-45 and 670-722 also have accompanying time-aligned ELAN transcriptions.

LLG_20220708-WORDLIST contains a scan of the wordlist written by James Ngginak in orthography. Numbers correspond to the item number in the wordlist.

LLG_20220708-WORDLIST_1TO147_670TO722.pdf contains a scan of items 1-147 and 670-722 transcribed by Zuvyati Tlonaen.

Name of speaker James Ngginak.
Age 34.
Education Magister.
Occupation teacher.
Languages Ba'a Oelunggu, Indonesian, Kupang Malay, English.
Place of birth Rote Ba'a Oelunggu.
Father and mother place birth in Ba'a Oelunggu.
Father and mother languages Ba'a Oelunggu.
Collector is Zuviyati Tlonaen.
Place of recording in Uppsala University.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc440c'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-08</begin>
        <end>2022-07-08</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-LLG_20220703]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Lole - llg]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>102.164,11.2255</coordinates>
      </Point>
      <name><![CDATA[Dela wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A wordlist of 999 items for Dela, a language of Indonesia. The recordings are divided to five sections based on the numbers. Each section included an MP4, WAV, WAV back up, and an EAF file. The transcription is based on  Dela orthograpy except that ɓ and ɗ in all positions are written as b' and d'. 

Made as part of the OCSEAN (OCeanic and South East Asian Navigators; MSCA-RISE-2019,  Project Number 873207) project. The initial workshop was held in Uppsala, Sweden from 20 June to 15 July 2022.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc440d'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-08</begin>
        <end>2022-07-08</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-ROW_20220708]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Dela-Oenale - row]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>124.588,-8.294</coordinates>
      </Point>
      <name><![CDATA[Abui wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A wordlist of 445 items in Abui. The list was recorded with in five sessions, according to the worldist items at the end of the file name. The first four sessions were recorded on the 03/07/2022

1. items 1-50
2. items 51-100
3. items 101-150
4. items 151-200
5. items 201-300

AOZ_20220715-WORDLIST.pdf contains items 1-445 items as filled in by hand by Cindy Copas. This contains an orthographic transcription. Long vowels are not differentiated from short vowels and the uvular plosive /q/ is not differentiated from the velar plosive /k/.

AOZ_20220715-WORDLIST.txt contains a typed up version of items 1-300. The transcriptions is phonemic according to standard IPA practices.

eaf files contain time-aligned phonemic transcriptions for use in Elan

			<p><a href='https://test-ghap.tlcmap.org/search?id=tc440e'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-15</begin>
        <end>2022-07-15</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-ABZ_20220715]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Abui - abz]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>117.2812,3.90215</coordinates>
      </Point>
      <name><![CDATA[Kupang Malay wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A 147 item wordlist in Kupang Malay. This list was filled in one day at Uppsala University - Sweden by June Jacob.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc440f'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-05</begin>
        <end>2022-07-05</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-MKN_20220705]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Malay, Kupang - mkn]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>117.2812,3.90215</coordinates>
      </Point>
      <name><![CDATA[Balinese wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A 1,228 item wordlist in Balinese. The first six recordings were according to semantic domains, the final four sessions were not.

1. 1-85; The physical world (consultant I Ketut Artawa, recorded 03/07/2022)
2. 86-147; People (consultant I Ketut Artawa, recorded 05/07/2022)
3. 148-249; Animals (consultant I Ketut Artawa, recorded 05/07/2022)
4. 250-439; The Body (consultant Ni Luh Nyoman Seri Malini, recorded 14/07/2022)
5. 440-457; Food and drink (consultant Ni Luh Nyoman Seri Malini, recorded 22/07/2022)
6. 458-501; Food and drink (consultant Ni Luh Nyoman Seri Malini, recorded 14/07/2022, no video recording)
7. 502-751 (consultant Ni Luh Nyoman Seri Malini, recorded 17/08/2022)
8. 752-991 (consultant Ni Luh Nyoman Seri Malini, recorded 17/08/2022) 
9. 992-1122 (consultant Ni Luh Nyoman Seri Malini, recorded 17/08/2022)
10. 1123-1228 (consultant Ni Luh Nyoman Seri Malini, recorded 17/08/2022)

OCSEAN-BAN_20220714-WORDLIST.pdf contains a handwritten filled-in version of the entire wordlist with words transcribed in IPA, except <y> (sometimes looks like <ɣ>) represents a palatal glide [j]. Both <ɟ> and <j> represent a voiced palatal affricate/stop [ɟ]. <c> represents a voiceless palatal affricate/stop.

			<p><a href='https://test-ghap.tlcmap.org/search?id=tc4410'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-31</begin>
        <end>2022-07-31</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-BAN_20220817]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Bali - ban]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>117.2812,3.90215</coordinates>
      </Point>
      <name><![CDATA[Balinese text]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A short text in Balines I Gusti Ngurah Parthama introduces himself and explains what he is doing.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc4411'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-06</begin>
        <end>2022-07-06</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-BAN_20220706]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Bali - ban]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
    <Placemark>
      <Point>
        <coordinates>117.2812,3.90215</coordinates>
      </Point>
      <name><![CDATA[Enggano Wordlist]]></name>
      <styleUrl>#TLCMapStyle</styleUrl>
      <description><![CDATA[A wordlist of 1228 items in Enggano.

The wordlist was recorded in 21 sessions, according to semantic domains, as outlined below. 

1228 items were recorded (except for items 86 to 147).

1. 1-85; The physical world (recorded 05/07/2022)
2. 86-147; People (recording missing)
3. 148-249; Animals
4. 250-439; The Body
5. 440-501; Food and drink
6. 502-521; Clothing and grooming
7. 522-554; The house
8. 555-669; Agriculture and vegetation
9. 670-722; Basic actions
10. 723-785; Motion
11. 786-810; Possession
12. 811-884; Spatial relations
13. 885-932; Quantity
14. 933-970; Time
15. 971-1032; Sense perception
16. 1033-1075; Emotions and values
17. 1076-1125; Cognition
18. 1126-1156; Speech and language
19. 1157-1175; Social and political relations
20. 1176-1200; Warfare and hunting
21. 1201-1228; Law/Religion and belief

OCSEAN-ENO_20220712-WORDLIST.pdf contains a scan of the transcription. Transcriptions in black pen were probably made by Engga Sangian Zakaria and are in orthography. This follows Indonesian orthography with the following exceptions:

< ' > = [ʔ] (apostrophe represents glottal stop)
< ė > = [ə] (e with dot represents mid-central vowel)
< u̇ > = [ɨ] (u with dot represents high-central vowel)
< ˉ > = [ ~ ] (macron represents nasalized vowel)
both <y> and < j > appear to represent [ j ] (palatal glide)

Words which were given as identical in Indonesian and Enggano are transcribed with standard Indonesian orthography. Transcription in blue pen appears to be by someone else, and uses a mix of IPA and orthographic transcription.

Items 1-981 have been transcribed and are present in the pdf and text document.
			<p><a href='https://test-ghap.tlcmap.org/search?id=tc4412'>TLCMap</a></p>
			<p><a href='https://test-ghap.tlcmap.org/publicdatasets/1315'>TLCMap Layer</a></p>]]></description>
      <TimeSpan>
        <begin>2022-07-12</begin>
        <end>2022-07-12</end>
      </TimeSpan>
      <ExtendedData>
        <Data name="ID">
          <value><![CDATA[OCSEAN-ENO_20220712]]></value>
        </Data>
        <Data name="Languages">
          <value><![CDATA[Enggano - eno]]></value>
        </Data>
        <Data name="Countries">
          <value><![CDATA[Indonesia - ID]]></value>
        </Data>
        <Data name="Publisher">
          <value><![CDATA[Owen Edwards]]></value>
        </Data>
        <Data name="Contact">
          <value><![CDATA[admin@paradisec.org.au]]></value>
        </Data>
        <Data name="License">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
        <Data name="Rights">
          <value><![CDATA[Open (subject to agreeing to PDSC access conditions)]]></value>
        </Data>
      </ExtendedData>
    </Placemark>
  </Document>
</kml>
