20210507

sun1638650145 · May 7, 2022 · e11c3a5 · e11c3a5
1 parent 74ff75a
commit e11c3a5
Show file tree

Hide file tree

Showing 2 changed files with 77 additions and 20 deletions.
diff --git a/API.md b/API.md
@@ -4692,15 +4692,15 @@ encoding.ids
 
 # 17.transformers
 
-| 版本  | 描述                | 注意                                                         | 适配M1 |
-| ----- | ------------------- | ------------------------------------------------------------ | ------ |
-| 4.6.1 | SOTA自然语言处理库. | 1. 默认的缓存路径是~/.cache/huggingface/transformers                                                               2. 部分功能需要依赖sentencepiece模块. | 是     |
+| 版本   | 描述                | 注意                                                         | 适配M1 |
+| ------ | ------------------- | ------------------------------------------------------------ | ------ |
+| 4.18.0 | SOTA自然语言处理库. | 1. 默认的缓存路径是~/.cache/huggingface/transformers                                                               2. 部分功能需要依赖sentencepiece模块. | 是     |
 
 ## 17.1.AlbertTokenizer
 
 ### 17.1.1.\__call__()
 
-为Albert分词(预处理)一个或者多个数据.|{`input_ids`, (`token_type_ids`), (`attention_mask`)}
+为Albert分词(预处理)一个或者多个数据.|`transformers.tokenization_utils_base.BatchEncoding{'input_ids': tf.Tensor, 'token_type_ids': tf.Tensor, 'attention_mask': tf.Tensor}`
 
 ```python
 from transformers import AlbertTokenizer
@@ -4714,8 +4714,10 @@ encoder = tokenizer(text=x,  # list of str|需要预处理的文本.
                     truncation=False,  # bool(可选)|False|是否截断到最大长度.
                     max_length=128,  # int(可选)|None|填充和截断的最大长度.
                     return_tensors='tf',  # {'tf', 'pt', 'np'}(可选)|None|返回张量的类型.
-                    return_token_type_ids=False,  # bool(可选)|False|是否返回令牌ID.
-                    return_attention_mask=False)  # bool(可选)|False|是否返回注意力掩码.
+                    return_token_type_ids=True,  # bool(可选)|False|是否返回令牌ID.
+                    return_attention_mask=True)  # bool(可选)|False|是否返回注意力掩码.
+
+input_ids, attention_mask, token_type_ids = encoder['input_ids'], encoder['attention_mask'], encoder['token_type_ids']
 ```
 
 ### 17.1.2.from_pretrained()
@@ -4729,7 +4731,7 @@ tokenizer = AlbertTokenizer.from_pretrained(pretrained_model_name_or_path='alber
                                             do_lower_case=True)  # bool(可选)|True|是否全部转换为小写字母.
 ```
 
-## 17.2.BertTokenizer()
+## 17.2.BertTokenizer
 
 ### 17.2.1.\__call__()
 
@@ -4747,13 +4749,15 @@ encoder = tokenizer(text=x,  # list of str|需要预处理的文本.
                     truncation=False,  # bool(可选)|False|是否截断到最大长度.
                     max_length=128,  # int(可选)|None|填充和截断的最大长度.
                     return_tensors='tf',  # {'tf', 'pt', 'np'}(可选)|None|返回张量的类型.
-                    return_token_type_ids=False,  # bool(可选)|False|是否返回令牌ID.
-                    return_attention_mask=False)  # bool(可选)|False|是否返回注意力掩码.
+                    return_token_type_ids=True,  # bool(可选)|False|是否返回令牌ID.
+                    return_attention_mask=True)  # bool(可选)|False|是否返回注意力掩码.
+
+input_ids, attention_mask, token_type_ids = encoder['input_ids'], encoder['attention_mask'], encoder['token_type_ids']
 ```
 
 ### 17.2.2.from_pretrained()
 
-实例化Bert预训练分词器.|`transformers.models.bert.tokenization_bert.BertTokenizer`
+实例化Bert预训练分词器.|`transformers.tokenization_utils_base.BatchEncoding{'input_ids': tf.Tensor, 'token_type_ids': tf.Tensor, 'attention_mask': tf.Tensor}`
 
 ```python
 from transformers import BertTokenizer
@@ -4776,7 +4780,22 @@ config = RobertaConfig.from_pretrained(pretrained_model_name_or_path='roberta-ba
 
 ## 17.4.TFAlbertModel
 
-### 17.4.1.from_pretrained()
+### 17.4.1.\_\_call\_\_()
+
+调用Albert模型.|`transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling`
+
+```python
+from transformers import TFAlbertModel
+
+model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v2', trainable=True)
+outputs = model(input_ids=input_ids,
+                attention_mask=attention_mask,
+                token_type_ids=token_type_ids)
+
+sequence_output, pooled_output = outputs.last_hidden_state, outputs.pooler_output
+```
+
+### 17.4.2.from_pretrained()
 
 实例化预训练的Albert模型.|`transformers.models.albert.modeling_tf_albert.TFAlbertModel`
 
@@ -4789,7 +4808,22 @@ model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base
 
 ## 17.5.TFBertModel
 
-### 17.5.1.from_pretrained()
+### 17.5.1.\_\_call\_\_()
+
+调用Bert模型.|`transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions`
+
+```python
+from transformers import TFBertModel
+
+model = TFBertModel.from_pretrained(pretrained_model_name_or_path='bert-base-uncased', trainable=True)
+outputs = model(input_ids=input_ids,
+                attention_mask=attention_mask,
+                token_type_ids=token_type_ids)
+
+sequence_output, pooled_output = outputs.last_hidden_state, outputs.pooler_output
+```
+
+### 17.5.2.from_pretrained()
 
 实例化预训练的Bert模型.|`transformers.models.bert.modeling_tf_bert.TFBertModel`
 

diff --git a/TensorFlow.md b/TensorFlow.md
@@ -2978,15 +2978,38 @@ function func() {
 let c = tf.tidy(func);  // nameOrFn: string or Function|输入的函数.
 ```
 
-# 3.tensorflow_datasets
+# 3.tensorflow_addons
+
+| 版本   | 描述                  | 注意 | 适配M1 |
+| ------ | --------------------- | ---- | ------ |
+| 0.16.1 | TensorFlow的额外工具. | -    | 是     |
+
+## 3.1.optimizers
+
+| 版本 | 描述                       | 注意 |
+| ---- | -------------------------- | ---- |
+| -    | 符合Keras API的其他优化器. | -    |
+
+### 3.1.2.AdamW()
+
+实例化带权重衰减的`Adam`优化器.
+
+```python
+from tensorflow_addons.optimizers import AdamW
+
+optimizer = AdamW(weight_decay=4e-3,  # float|权重衰减.
+                  learning_rate=0.001)  # float|0.001|学习率.
+```
+
+# 4.tensorflow_datasets
 
 | 版本  | 描述                    | 注意                                                         | 适配M1 |
 | ----- | ----------------------- | ------------------------------------------------------------ | ------ |
 | 4.3.0 | TensorFlow的官方数据集. | 1. 默认的缓存路径是~/tensorflow_datasets.                                                                          2. 视网络情况使用代理. | 是     |
 
-## 3.1.features
+## 4.1.features
 
-### 3.1.1.ClassLabel
+### 4.1.1.ClassLabel
 
 实例化`ClassLabel`来建立整数和标签的映射.
 
@@ -2996,7 +3019,7 @@ import tensorflow_datasets as tfds
 class_label = tfds.features.ClassLabel(names=['cat', 'dog', 'bird'])  # list of str|标签字符串列表.
 ```
 
-#### 3.1.1.1.int2str()
+#### 4.1.1.1.int2str()
 
 将整数转换为标签字符串.|str
 
@@ -3007,7 +3030,7 @@ class_label = tfds.features.ClassLabel(names=['cat', 'dog', 'bird'])
 label = class_label.int2str(int_value=1)  # int|标签整数索引.
 ```
 
-## 3.2.load()
+## 4.2.load()
 
 加载数据集.|`dict of tf.data.Datasets`
 
@@ -3020,13 +3043,13 @@ ds_train, ds_test = tfds.load(name='mnist',  # str|数据集的注册名称.
                               as_supervised=True)  # bool(可选)|False|是否返回标签.
 ```
 
-# 4.tensorflow_hub
+# 5.tensorflow_hub
 
 | 版本   | 描述                    | 注意                                                         | 适配M1 |
 | ------ | ----------------------- | ------------------------------------------------------------ | ------ |
 | 0.12.0 | TensorFlow的官方模型库. | 1. 推荐使用环境变量`TFHUB_CACHE_DIR`指定模型保存位置.                                                       2. [TensorFlow Hub 国内镜像](https://hub.tensorflow.google.cn/) | 是     |
 
-## 4.1.KerasLayer()
+## 5.1.KerasLayer()
 
 将模型修饰为Keras的网络层.|`tensorflow_hub.keras_layer.KerasLayer`
 
@@ -3040,7 +3063,7 @@ layer = KerasLayer(handle='https://hub.tensorflow.google.cn/google/efficientnet/
                    dtype='float32')  # tensorflow.python.framework.dtypes.DType|'float32'|期望的数据类型.
 ```
 
-## 4.2.load()
+## 5.2.load()
 
 加载模型.|`tensorflow.python.training.tracking.tracking.AutoTrackable`