Skip to content

Meeting Note #3 14.03.2019

ugurcanarikan edited this page Mar 21, 2019 · 2 revisions

Location: Bogazici University Computer Engineering Building

Date/Time: 14.03.2019 / 12:00

Attendees:

  • Suzan Üsküdarlı
  • Onur Güngör
  • Uğurcan Arıkan

1. Preparation Before Meeting

  • 1.1. Search about sentencepiece
  • 1.2. Search about pretraining BERT

2. Agenda

  • 2.1. Pretraining BERT after the Turkish vocabulary has been created

3. Discussion

  • 3.1. Memory issue during BERT's pretraining due to the corpus' size has been discussed
  • 3.2. Creating pretraining data and pretraining BERT has been discussed
  • 3.3. Current status of the project has been discussed

4. Outcomes

  • 4.1. BERT pretraining
    • 4.1.1. Due to its massive size, corpus will be split into 10 smaller chunks before pretraining

5. TO-DO list

Deadline: 21.03 12:00 Assignee: Uğurcan Arıkan

  • 5.1. Split corpus into smaller pieces

Deadline: 21.03 12:00 Assignee: Uğurcan Arıkan

  • 5.2. Create pretraining data for BERT

Deadline: 21.03 12:00 Assignee: Uğurcan Arıkan

  • 5.3. Run pretraining data on BERT