Skip to content

Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control

Notifications You must be signed in to change notification settings

morteza/CogText

Repository files navigation

Linking Theories and Methods of Cognitive Control

This is the official repository for the paper Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control.

We performed automated text analyses on a large body of scientific texts (385705 scientific abstracts) and created a joint representation of cognitive control tasks and constructs.

Abstracts were first mapped into an embedding space using GPT-3 and Top2Vec models. Document embeddings were then used to identify a task-construct graph embedding that grounds constructs on tasks and supports nuanced meaning of the constructs by taking advantage of constrained random walks in the graph.

Setup

We recommend Conda/Mamba and DVC to set up a clean environment and download the data. You can create and activate the cogtext environment and automatically download the required data from CogText dataset on HuggingFace by running:

mamba env create --file environment.yml  # or use `conda`
mamba activate cogtext                   # activate the environment
dvc pull                                 # download the data

Notebooks

The main entry point of the project is the notebooks/ folder.

Note that Jupyter notebooks contain relative paths and are supposed to be run from the root of the project.

  • 1 Data Collection (2023) uses the EFO ontology to search PubMed, aggregates abstracts as a single dataset, and stores the results in a compressed CSV file. If you already downloaded the CogText dataset, you can skip this step. Simply copy your downloaded file to data/pubmed/abstracts_2023.csv.gz.

  • 2 Descriptive Statistics computes some basic statistics such as the number of tasks and constructs, co-occurrences, articles per each task or construct, etc. This notebook requires the data/pubmed/abstracts_2023.csv.gz file.

  • 3 Document Embedding uses GPT-3 Embedding API (Ada) to transform the raw abstracts to vectorized embeddings.

  • 4 Topic Embedding projects embeddings into a more interpretable topic space. The topic embedding uses UMAP and HDBSCAN to calculate the topic weights (as in Top2Vec).

  • 5 Hypernomy visualizes construct hypernomy: inconsistent definitions of cognitive constructs across cognitive fields.

  • 6 Hypergraph Visualization plots the task-construct hypergraph.

  • 7 Link Prediction predicts the edges of the task-constructs hypergraph and learns Metapath2vec embedding of the graph nodes.

Acknowledgements

This research was supported by the Luxembourg National Research Fund (ATTRACT/2016/ID/11242114/DIGILEARN and INTER Mobility/2017-2/ID/11765868/ULALA).

Citation

To cite the paper use the following entry:

@misc{cogtext2022,
  author = {Morteza Ansarinia and
            Paul Schrater and
            Pedro Cardoso-Leite},
  title = {Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control},
  year = {2022},
  url = {https://arxiv.org/abs/2203.11016}
}

About

Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published