Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification

Xiaofen Xing, Bolun Cai, Yinhu Zhao, Shuzhen Li, Zhiwei He, Weiquan Fan

Introduction

We propose a novel hierarchical recall model fusing multiple modality (including audio, video and text) for bipolar disorder classifcation, where patients with diﬀerent mania level are recalled layer-by-layer. To address the complex distribution on the challenge data, the proposed framework utilizes multi-model, multi-modality and multi-layer to perform domain adaptation for each patient and hard sample mining for special patients. The experimental results show that our framework achieves competitive performance with Unweighed Average Recall (UAR) of 59.26% on the test set, and 64.29% on the development set.

If you use these codes in your research, please cite:

@inproceedings{avec2018hrf,
    author = {Xing, Xiaofen and Cai, Bolun and Zhao, Yinhu and Li, Shuzhen and He, Zhiwei and Fan, Weiquan},
    title={Multi-modality Hierarchical Recall Based on GBDTs for Bipolar Disorder Classification},
    booktitle={Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop},
    series = {AVEC'18},
    location = {Seoul, Republic of Korea},
    pages = {31--37},
    numpages = {7},
    year = {2018},
    }

Usage

1. Install python3 requre modules

pip install -r requirements.txt

2. Official data prepare

cp -r <path-to LLDs_audio_eGeMAPS> ./data/LLDs_audio_eGeMAPS
cp -r <path-to LLDs_audio_opensmile_MFCCs> ./data/LLDs_audio_opensmile_MFCCs
cp -r <path-to VAD_turns> ./data/VAD_turns
cp -r <path-to LLDs_video_openFace>/*.csv ./data/AU
cp <path-to recordings_video>/*/*.mp4 ./data/video

3. Features extraction

sh ./code/features_extraction.sh

4. Model training

sh ./code/model_generation.sh

5. Generate the predictions of test and dev set in `./result`

sh ./code/model_test.sh

Note: The order of predictions is in order. In other words, the order is dev_001, dev002, ..., dev_060 in ./result/predictions_dev.csv, and test_001, test002, ..., test_054 in ./result/predictions_test.csv.

Third party feature Extraction

1. Action extraction

features_extraction_video_body_action.py is the code to extract body action features for each video, which spends more than one day. Therefore, we have prepared these features in ./features/body_action_features.csv, and you also can run it by yourself.

2. Emotion extracted by Face++

Emotion extracted from raw official video by Face++ toolkit, and we have prepared the entire extracted emotion data in ./data/emotion. If emotion features are required to use new data, please apply for a Face++ account and acquire and . Sample code for obtaining emotion features are available as:

python faceplusplus_emotion -k <key> -s <secret>

Note: This code is used to extract emotion for each video, which will spend more than one day. In addtion, beacase the Face++ API has been upadted recently, it may get the features with a little difference to ours. Therefore, we have prepared these features in ./data/emotion.

3. Topic extracted by GCP

The Turkish transcript (in ./turkish_text_json) with word time offsets are generated by Automatic Speech Recognition (https://cloud.google.com/speech/), and the English transcript (in ./translation_orgin) translated by Neural Machine Translation (https://cloud.google.com/translate/). The corrected English transcript with topic segmentation as follows.

Topic	Descriptor
# 1	Describe why you come here
# 2	Depict Van Gogh’s Depression
# 3	Describe the worst memory
# 4	Count 1-30
# 5	Count 1-30 again (often faster)
# 6	Depict Dengel’s Home Sweet Home
# 7	Describe the best memory

If a sentence doesn't belong to these 7 topics, then this sentence is labeled as Topic #0.

A concatenated table (7topics_data.csv) can be generated by:

python concat_all_transcripts.py

Then, accuracy topic timestamps can be obtained from 7topics_data.csv simply.

time_7topics.csv -- Seven topic timestamps obtained from 7topics_data.csv.
time_3topics.csv, times_3topics_video.csv -- Three topic timestamps obtained from 7topics_data.csv according the rules as: Topic 1-3 as the first part, Topic 4-5 as the second part, and Topic 6-7 as the third part.

If topic features are required to use new data, we provide a script of Automatic Speech Recognition for Turkish audios and a script of Neural Machine Translation for Turkish texts. Here, an account of Google Cloud Platform (GCP) is needed.

translation_ASR.sh -- The script of Automatic Speech Recognition for Turkish audios.
translation_check.sh -- The script of Neural Machine Translation for Turkish texts.

Contact

[email protected] or [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification

Introduction

Usage

1. Install python3 requre modules

2. Official data prepare

3. Features extraction

4. Model training

5. Generate the predictions of test and dev set in `./result`

Third party feature Extraction

1. Action extraction

2. Emotion extracted by Face++

3. Topic extracted by GCP

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
code		code
data		data
features		features
model		model
result		result
README.md		README.md
requirements.txt		requirements.txt

caibolun/AVEC-BDS2018

Folders and files

Latest commit

History

Repository files navigation

Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification

Introduction

Usage

1. Install python3 requre modules

2. Official data prepare

3. Features extraction

4. Model training

5. Generate the predictions of test and dev set in ./result

Third party feature Extraction

1. Action extraction

2. Emotion extracted by Face++

3. Topic extracted by GCP

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

5. Generate the predictions of test and dev set in `./result`

Packages