From 166763fe409ab7a6fcc89eaafbf8bd0e8ea83558 Mon Sep 17 00:00:00 2001 From: elronbandel Date: Sun, 19 Jan 2025 14:22:55 +0200 Subject: [PATCH 1/5] Add documentation for Settings and Constants management Signed-off-by: elronbandel --- docs/docs/settings.rst | 206 +++++++++++++++++++++++++++++++++++ docs/docs/tutorials.rst | 1 + src/unitxt/settings_utils.py | 82 ++++++++++++++ 3 files changed, 289 insertions(+) create mode 100644 docs/docs/settings.rst diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst new file mode 100644 index 0000000000..d707be1d33 --- /dev/null +++ b/docs/docs/settings.rst @@ -0,0 +1,206 @@ +.. _settings: + +===================================== +Library Settings and Constants +===================================== + +This guide explains the rationale behind the :class:`Settings ` and :class:`Constants ` system, how to extend and configure them, and how to use them effectively in your application. + +Rationale +========= + +Managing application-wide configuration and constants can be challenging, especially in larger systems. The :class:`Settings ` and :class:`Constants ` classes provide a centralized, thread-safe, and type-safe way to manage these configurations. + +- **Settings**: Designed for mutable configurations that can be customized dynamically, with optional type enforcement and environment variable overrides. +- **Constants**: Designed for immutable values that remain consistent throughout the application lifecycle. + +By centralizing these configurations, you can: +- Ensure consistency across your application. +- Simplify debugging and testing. +- Enable dynamic configuration using environment variables or runtime contexts. + +Adding New Settings +=================== + +To add a new setting, follow these steps: + +1. Open the :class:`Settings ` initialization block in the :class:`settings_utils ` module. +2. Add a new setting key with a tuple of `(type, default_value)` to enforce its type and provide a default value. + +.. code-block:: python + + settings.new_feature_enabled = (bool, False) # Adding a new boolean setting. + +Guidelines: +- Use a clear and descriptive name for the setting. +- Always specify the type as one of `int`, `float`, or `bool`. + +Adding New Constants +==================== + +To add a new constant: + +1. Open the :class:`Constants ` initialization block in the :class:`settings_utils ` module. +2. Assign a new constant key with its value. + +.. code-block:: python + + constants.new_constant = "new_value" # Adding a new constant. + +Guidelines: +- Constants should represent fixed, immutable values. +- Use clear and descriptive names that indicate their purpose. + +Using Settings Context +====================== + +The :class:`Settings ` class provides a `context` manager to temporarily override settings within a specific block of code. After exiting the block, the settings revert to their original values. + +Example: + +.. code-block:: python + + from unitxt import settings + + print(settings.default_verbosity) # Output: "info" + + with settings.context(default_verbosity="debug"): + print(settings.default_verbosity) # Output: "debug" + + print(settings.default_verbosity) # Output: "info" + +This feature is useful for scenarios like testing or running specific tasks with modified configurations. + +List of Settings +================ + +Below is the list of available settings, their types, and default values: + +.. list-table:: + :header-rows: 1 + + * - Setting + - Type + - Default Value + * - allow_unverified_code + - bool + - False + * - use_only_local_catalogs + - bool + - False + * - global_loader_limit + - int + - None + * - num_resamples_for_instance_metrics + - int + - 1000 + * - num_resamples_for_global_metrics + - int + - 100 + * - max_log_message_size + - int + - 100000 + * - catalogs + - None + - None + * - artifactories + - None + - None + * - default_recipe + - str + - "dataset_recipe" + * - default_verbosity + - str + - "info" + * - use_eager_execution + - bool + - False + * - remote_metrics + - list + - [] + * - test_card_disable + - bool + - False + * - test_metric_disable + - bool + - False + * - metrics_master_key_token + - None + - None + * - seed + - int + - 42 + * - skip_artifacts_prepare_and_verify + - bool + - False + * - data_classification_policy + - None + - None + * - mock_inference_mode + - bool + - False + * - disable_hf_datasets_cache + - bool + - True + * - loader_cache_size + - int + - 1 + * - task_data_as_text + - bool + - True + * - default_provider + - str + - "watsonx" + * - default_format + - None + - None + +List of Constants +================= + +Below is the list of available constants and their values: + +.. list-table:: + :header-rows: 1 + + * - Constant + - Value + * - dataset_file + - Path to `dataset.py`. + * - metric_file + - Path to `metric.py`. + * - local_catalog_path + - Path to the local catalog directory. + * - package_dir + - Directory of the installed package. + * - default_catalog_path + - Default catalog directory path. + * - dataset_url + - URL for dataset resources. + * - metric_url + - URL for metric resources. + * - version + - Current version of the application. + * - catalog_hierarchy_sep + - Separator for catalog hierarchy levels. + * - env_local_catalogs_paths_sep + - Separator for local catalog paths in environment variables. + * - non_registered_files + - List of files excluded from registration. + * - codebase_url + - URL of the codebase repository. + * - website_url + - Official website URL. + * - inference_stream + - Name of the inference stream constant. + * - instance_stream + - Name of the instance stream constant. + * - image_tag + - Default image tag for operations. + * - demos_pool_field + - Field name for demos pool. + +Conclusion +========== + +The `Settings` and `Constants` system provides a robust and flexible way to manage your application's configuration and constants. By following the guidelines above, you can extend and use these classes effectively in your application. \ No newline at end of file diff --git a/docs/docs/tutorials.rst b/docs/docs/tutorials.rst index 83115ee06f..28c19f677c 100644 --- a/docs/docs/tutorials.rst +++ b/docs/docs/tutorials.rst @@ -31,4 +31,5 @@ Tutorials ✨ tags_and_descriptions types_and_serializers contributors_guide + settings diff --git a/src/unitxt/settings_utils.py b/src/unitxt/settings_utils.py index 75a3bd641c..89cd66f04e 100644 --- a/src/unitxt/settings_utils.py +++ b/src/unitxt/settings_utils.py @@ -1,3 +1,85 @@ +"""Library Settings and Constants. + +This module provides a mechanism for managing application-wide configuration and immutable constants. It includes the `Settings` and `Constants` classes, which are implemented as singleton patterns to ensure a single shared instance across the application. Additionally, it defines utility functions to access these objects and configure application behavior. + +### Key Components: + +1. **Settings Class**: + - A singleton class for managing mutable configuration settings. + - Supports type enforcement for settings to ensure correct usage. + - Allows dynamic modification of settings using a context manager for temporary changes. + - Retrieves environment variable overrides for settings, enabling external customization. + + #### Available Settings: + - `allow_unverified_code` (bool, default: False): Whether to allow unverified code execution. + - `use_only_local_catalogs` (bool, default: False): Restrict operations to local catalogs only. + - `global_loader_limit` (int, default: None): Limit for global data loaders. + - `num_resamples_for_instance_metrics` (int, default: 1000): Number of resamples for instance-level metrics. + - `num_resamples_for_global_metrics` (int, default: 100): Number of resamples for global metrics. + - `max_log_message_size` (int, default: 100000): Maximum size of log messages. + - `catalogs` (default: None): List of catalog configurations. + - `artifactories` (default: None): Artifact storage configurations. + - `default_recipe` (str, default: "dataset_recipe"): Default recipe for dataset operations. + - `default_verbosity` (str, default: "info"): Default verbosity level for logging. + - `use_eager_execution` (bool, default: False): Enable eager execution for tasks. + - `remote_metrics` (list, default: []): List of remote metrics configurations. + - `test_card_disable` (bool, default: False): Disable test cards if set to True. + - `test_metric_disable` (bool, default: False): Disable test metrics if set to True. + - `metrics_master_key_token` (default: None): Master token for metrics. + - `seed` (int, default: 42): Default seed for random operations. + - `skip_artifacts_prepare_and_verify` (bool, default: False): Skip artifact preparation and verification. + - `data_classification_policy` (default: None): Policy for data classification. + - `mock_inference_mode` (bool, default: False): Enable mock inference mode. + - `disable_hf_datasets_cache` (bool, default: True): Disable caching for Hugging Face datasets. + - `loader_cache_size` (int, default: 1): Cache size for data loaders. + - `task_data_as_text` (bool, default: True): Represent task data as text. + - `default_provider` (str, default: "watsonx"): Default service provider. + - `default_format` (default: None): Default format for data processing. + + #### Usage: + - Access settings using `get_settings()` function. + - Modify settings temporarily using the `context` method: + ```python + settings = get_settings() + with settings.context(default_verbosity="debug"): + # Code within this block uses "debug" verbosity. + ``` + +2. **Constants Class**: + - A singleton class for managing immutable constants used across the application. + - Constants cannot be modified once set. + - Provides centralized access to paths, URLs, and other fixed application parameters. + + #### Available Constants: + - `dataset_file`: Path to the dataset file. + - `metric_file`: Path to the metric file. + - `local_catalog_path`: Path to the local catalog directory. + - `package_dir`: Directory of the installed package. + - `default_catalog_path`: Default catalog directory path. + - `dataset_url`: URL for dataset resources. + - `metric_url`: URL for metric resources. + - `version`: Current version of the application. + - `catalog_hierarchy_sep`: Separator for catalog hierarchy levels. + - `env_local_catalogs_paths_sep`: Separator for local catalog paths in environment variables. + - `non_registered_files`: List of files excluded from registration. + - `codebase_url`: URL of the codebase repository. + - `website_url`: Official website URL. + - `inference_stream`: Name of the inference stream constant. + - `instance_stream`: Name of the instance stream constant. + - `image_tag`: Default image tag for operations. + - `demos_pool_field`: Field name for demos pool. + + #### Usage: + - Access constants using `get_constants()` function: + ```python + constants = get_constants() + print(constants.dataset_file) + ``` + +3. **Helper Functions**: + - `get_settings()`: Returns the singleton `Settings` instance. + - `get_constants()`: Returns the singleton `Constants` instance. +""" import importlib.metadata import importlib.util import os From be39365d910398dc9d5c0abc48e689ad3bef04b7 Mon Sep 17 00:00:00 2001 From: elronbandel Date: Sun, 19 Jan 2025 14:29:15 +0200 Subject: [PATCH 2/5] Enhance documentation for Settings and Constants with usage examples and environment variable details Signed-off-by: elronbandel --- docs/docs/settings.rst | 45 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst index d707be1d33..c2bf208426 100644 --- a/docs/docs/settings.rst +++ b/docs/docs/settings.rst @@ -6,6 +6,26 @@ Library Settings and Constants This guide explains the rationale behind the :class:`Settings ` and :class:`Constants ` system, how to extend and configure them, and how to use them effectively in your application. +All the settings can be easily accessed with: + +.. code-block:: python + + import unitxt + + print(unitxt.settings.default_verbosity) # Output: "info" + +All the settings can be easily modified with: + +.. code-block:: python + + settings.default_verbosity = "debug" + +Or through environment variables: + +.. code-block:: + + export UNITXT_DEFAULT_VERBOSITY = "debug" + Rationale ========= @@ -82,78 +102,103 @@ Below is the list of available settings, their types, and default values: * - Setting - Type - Default Value + - Environment Variable * - allow_unverified_code - bool - False + - UNITXT_ALLOW_UNVERIFIED_CODE * - use_only_local_catalogs - bool - False + - UNITXT_USE_ONLY_LOCAL_CATALOGS * - global_loader_limit - int - None + - UNITXT_GLOBAL_LOADER_LIMIT * - num_resamples_for_instance_metrics - int - 1000 + - UNITXT_NUM_RESAMPLES_FOR_INSTANCE_METRICS * - num_resamples_for_global_metrics - int - 100 + - UNITXT_NUM_RESAMPLES_FOR_GLOBAL_METRICS * - max_log_message_size - int - 100000 + - UNITXT_MAX_LOG_MESSAGE_SIZE * - catalogs - None - None + - UNITXT_CATALOGS * - artifactories - None - None + - UNITXT_ARTIFACTORIES * - default_recipe - str - "dataset_recipe" + - UNITXT_DEFAULT_RECIPE * - default_verbosity - str - "info" + - UNITXT_DEFAULT_VERBOSITY * - use_eager_execution - bool - False + - UNITXT_USE_EAGER_EXECUTION * - remote_metrics - list - [] + - UNITXT_REMOTE_METRICS * - test_card_disable - bool - False + - UNITXT_TEST_CARD_DISABLE * - test_metric_disable - bool - False + - UNITXT_TEST_METRIC_DISABLE * - metrics_master_key_token - None - None + - UNITXT_METRICS_MASTER_KEY_TOKEN * - seed - int - 42 + - UNITXT_SEED * - skip_artifacts_prepare_and_verify - bool - False + - UNITXT_SKIP_ARTIFACTS_PREPARE_AND_VERIFY * - data_classification_policy - None - None + - UNITXT_DATA_CLASSIFICATION_POLICY * - mock_inference_mode - bool - False + - UNITXT_MOCK_INFERENCE_MODE * - disable_hf_datasets_cache - bool - True + - UNITXT_DISABLE_HF_DATASETS_CACHE * - loader_cache_size - int - 1 + - UNITXT_LOADER_CACHE_SIZE * - task_data_as_text - bool - True + - UNITXT_TASK_DATA_AS_TEXT * - default_provider - str - "watsonx" + - UNITXT_DEFAULT_PROVIDER * - default_format - None - None + - UNITXT_DEFAULT_FORMAT List of Constants ================= From 59d425dfbf0002b72aed92e109887c557208c078 Mon Sep 17 00:00:00 2001 From: elronbandel Date: Sun, 19 Jan 2025 14:34:45 +0200 Subject: [PATCH 3/5] Update documentation for settings to include environment variable names and descriptions Signed-off-by: elronbandel --- docs/docs/settings.rst | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst index c2bf208426..e21a4c30a6 100644 --- a/docs/docs/settings.rst +++ b/docs/docs/settings.rst @@ -18,14 +18,14 @@ All the settings can be easily modified with: .. code-block:: python - settings.default_verbosity = "debug" + unitxt.settings.default_verbosity = "debug" Or through environment variables: .. code-block:: export UNITXT_DEFAULT_VERBOSITY = "debug" - + Rationale ========= @@ -94,7 +94,7 @@ This feature is useful for scenarios like testing or running specific tasks with List of Settings ================ -Below is the list of available settings, their types, and default values: +Below is the list of available settings, their types, default values, corresponding environment variable names, and descriptions: .. list-table:: :header-rows: 1 @@ -103,102 +103,127 @@ Below is the list of available settings, their types, and default values: - Type - Default Value - Environment Variable + - Description * - allow_unverified_code - bool - False - UNITXT_ALLOW_UNVERIFIED_CODE + - Enables or disables execution of unverified code. * - use_only_local_catalogs - bool - False - UNITXT_USE_ONLY_LOCAL_CATALOGS + - Restricts operations to use only local catalogs. * - global_loader_limit - int - None - UNITXT_GLOBAL_LOADER_LIMIT + - Sets a limit on the number of global data loaders. * - num_resamples_for_instance_metrics - int - 1000 - UNITXT_NUM_RESAMPLES_FOR_INSTANCE_METRICS + - Number of resamples used for calculating instance-level metrics. * - num_resamples_for_global_metrics - int - 100 - UNITXT_NUM_RESAMPLES_FOR_GLOBAL_METRICS + - Number of resamples used for calculating global metrics. * - max_log_message_size - int - 100000 - UNITXT_MAX_LOG_MESSAGE_SIZE + - Maximum size allowed for log messages. * - catalogs - None - None - UNITXT_CATALOGS + - Specifies the catalogs configuration. * - artifactories - None - None - UNITXT_ARTIFACTORIES + - Defines the artifact storage configuration. * - default_recipe - str - "dataset_recipe" - UNITXT_DEFAULT_RECIPE + - Specifies the default recipe for datasets. * - default_verbosity - str - "info" - UNITXT_DEFAULT_VERBOSITY + - Sets the default verbosity level for logging. * - use_eager_execution - bool - False - UNITXT_USE_EAGER_EXECUTION + - Enables eager execution for tasks. * - remote_metrics - list - [] - UNITXT_REMOTE_METRICS + - Defines a list of configurations for remote metrics. * - test_card_disable - bool - False - UNITXT_TEST_CARD_DISABLE + - Disables the use of test cards when enabled. * - test_metric_disable - bool - False - UNITXT_TEST_METRIC_DISABLE + - Disables the use of test metrics when enabled. * - metrics_master_key_token - None - None - UNITXT_METRICS_MASTER_KEY_TOKEN + - Specifies the master token for accessing metrics. * - seed - int - 42 - UNITXT_SEED + - Default seed value for random operations. * - skip_artifacts_prepare_and_verify - bool - False - UNITXT_SKIP_ARTIFACTS_PREPARE_AND_VERIFY + - Skips preparation and verification of artifacts. * - data_classification_policy - None - None - UNITXT_DATA_CLASSIFICATION_POLICY + - Specifies the policy for data classification. * - mock_inference_mode - bool - False - UNITXT_MOCK_INFERENCE_MODE + - Enables mock inference mode for testing. * - disable_hf_datasets_cache - bool - True - UNITXT_DISABLE_HF_DATASETS_CACHE + - Disables caching for Hugging Face datasets. * - loader_cache_size - int - 1 - UNITXT_LOADER_CACHE_SIZE + - Sets the cache size for data loaders. * - task_data_as_text - bool - True - UNITXT_TASK_DATA_AS_TEXT + - Enables representation of task data as plain text. * - default_provider - str - "watsonx" - UNITXT_DEFAULT_PROVIDER + - Specifies the default provider for tasks. * - default_format - None - None - UNITXT_DEFAULT_FORMAT + - Defines the default format for data processing. List of Constants ================= From 44b8cd1758c3b5be98d72caeeef4de78ec09410a Mon Sep 17 00:00:00 2001 From: Elron Bandel Date: Sun, 26 Jan 2025 11:07:39 +0200 Subject: [PATCH 4/5] Update docs/docs/settings.rst Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com> --- docs/docs/settings.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst index e21a4c30a6..e66d42111c 100644 --- a/docs/docs/settings.rst +++ b/docs/docs/settings.rst @@ -113,7 +113,7 @@ Below is the list of available settings, their types, default values, correspond - bool - False - UNITXT_USE_ONLY_LOCAL_CATALOGS - - Restricts operations to use only local catalogs. + - Restricts loading of artifacts to only use local catalogs on local filesystems (and not remote GitHub repos). * - global_loader_limit - int - None From 612caa4c0d0658527b8ad180afdeca8bf1feec13 Mon Sep 17 00:00:00 2001 From: Elron Bandel Date: Sun, 26 Jan 2025 11:08:36 +0200 Subject: [PATCH 5/5] Update docs/docs/settings.rst Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com> --- docs/docs/settings.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst index e66d42111c..b21c521199 100644 --- a/docs/docs/settings.rst +++ b/docs/docs/settings.rst @@ -108,7 +108,7 @@ Below is the list of available settings, their types, default values, correspond - bool - False - UNITXT_ALLOW_UNVERIFIED_CODE - - Enables or disables execution of unverified code. + - Enables or disables execution of unverified code. Unverified code includes executable code from HF datasets and calls to ExecuteExpressions or other operators that run user code. This ensure only trusted code is executed. * - use_only_local_catalogs - bool - False