Skip to content

[FIX]: Remove unused CSV files from vocab_csv/ #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bact opened this issue Mar 22, 2025 · 2 comments
Open

[FIX]: Remove unused CSV files from vocab_csv/ #259

bact opened this issue Mar 22, 2025 · 2 comments
Milestone

Comments

@bact
Copy link
Collaborator

bact commented Mar 22, 2025

In code/vocab_csv, there are a number of CSV files that have no mentioned in the RDF/HTML generation code.

  1. Some of them are probably part of development in progress and the inclusion of them will soon be happen
  2. Some of them are part of outdated workflow/concepts that have moved to other CSV files

We like to keep (1) and may like to remove (2) to tidy up the codebase and avoid possible confusion.

  • If needed, files in (2) can still be accessible from the tag releases (e.g. dpv-2.1)

Files need to be reviewed

Listed below are CSV files that have no mentioned in the RDF/HTML generation code, with notes:

Filename Notes
Bias.csv Last commit 6 months ago
DE_glossary.csv Last commit last year; Has only a header; Maybe used for future translations
EntityControl.csv Last commit 10 months ago
Mapping_ODRL.csv Last commit last month
Requirement.csv Last commit last year
RiskSource.csv Last commit last month; Has comment "Proposed for v2.2"
Standards_ISO.csv Last commit last year
UseCase.csv Last commit 9 months ago
concepts.csv Last commit 9 months ago
legal-memberships.csv Empty; Replaced by location_memberships.csv?
legal-uk.csv Replaced by legal-gb.csv?
legal_Authorities.csv Replaced by legal-(countrycode).csv?
legal_EU_Adequacy.csv Replaced by legal-eu.csv?
legal_EU_EEA.csv Replaced by location_memberships.csv?
legal_Laws.csv Replaced by legal-(countrycode).csv?
legal_Locations.csv Replaced by location.csv?
legal_properties.csv Replaced by location_properties?
tech-data.csv Last commit last year
tech-ops.csv Last commit last year
tech-provision-properties.csv Last commit last year
tech-security.csv Last commit 11 months ago
tech-surveillance.csv Last commit last year
  • legal* files are likely to be replaced by other files and could be removed
  • The rest could still be in use
@bact
Copy link
Collaborator Author

bact commented Mar 22, 2025

A script the find CSV files with no mentioned in the codebase:

#!/bin/bash
# Run this script from the "code" directory.

CSV_DIR="vocab_csv"

csv_files=$(find "$CSV_DIR" -type f -name "*.csv")

unused_files=()

for csv_file in $csv_files; do
    file_name=$(basename "$csv_file")

    if ! grep --include="*.py" --include="*.sh" -r "$file_name" . > /dev/null; then
        unused_files+=("$file_name")
    fi
done

if [ ${#unused_files[@]} -eq 0 ]; then
    echo "All .csv files in '$CSV_DIR' are mentioned in the codebase."
else
    sorted_unused_files=($(printf "%s\n" "${unused_files[@]}" | sort))

    echo "The following .csv files in '$CSV_DIR' are not mentioned in the codebase:"
    for file in "${sorted_unused_files[@]}"; do
        echo "$file"
    done

    echo
    echo "From the list above, these files are empty or have only one line:"
    for file in "${sorted_unused_files[@]}"; do
        file_path="$CSV_DIR/$file"
        if [ ! -s "$file_path" ] || [ $(wc -l < "$file_path") -le 1 ]; then
            echo "$file"
        fi
    done
fi

@coolharsh55
Copy link
Collaborator

Hi @bact thanks - some of these files are present because they are part of DPV 1.0 or 2.0, therefore we usually keep them around in case fixes are needed or we want to see source of changed extensions. Some others are proposed work items, so they won't be included in the RDF/HTML generation scripts. Below I've made a note for how to resolve each file, but there is no issue with keeping them in the folder as they are helpful to look stuff up now and then. In the future, once we have resolved the proposed items, deleting all files in vocab_csv and downloading+extracting all CSVs again should fix this.

  • Bias.csv -- can be deleted
  • DE_glossary.csv -- needed for multilingual translations
  • EntityControl.csv -- delete
  • Mapping_ODRL.csv -- needed for ODRL-DPV mappings
  • Requirement.csv -- source for requirements
  • RiskSource.csv -- from RISK extension
  • Standards_ISO.csv -- proposed modelling of ISO standards
  • UseCase.csv -- source for use-cases
  • concepts.csv -- can be deleted
  • legal-memberships.csv -- can be deleted
  • legal-uk.csv -- can be deleted
  • legal_Authorities.csv -- can be deleted
  • legal_EU_Adequacy.csv -- can be deleted
  • legal_EU_EEA.csv -- can be deleted
  • legal_Laws.csv -- can be deleted
  • legal_Locations.csv -- can be deleted
  • legal_properties.csv -- can be deleted
  • tech-data.csv -- can be deleted
  • tech-ops.csv -- can be deleted
  • tech-provision-properties.csv -- can be deleted
  • tech-security.csv -- can be deleted
  • tech-surveillance.csv -- can be deleted

@coolharsh55 coolharsh55 added code and removed legal labels Mar 22, 2025
@coolharsh55 coolharsh55 added this to the dpv v2.2 milestone Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants