Skip to content

Commit

Permalink
20210507
Browse files Browse the repository at this point in the history
  • Loading branch information
sun1638650145 committed May 7, 2022
1 parent 74ff75a commit e11c3a5
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 20 deletions.
58 changes: 46 additions & 12 deletions API.md
Original file line number Diff line number Diff line change
Expand Up @@ -4692,15 +4692,15 @@ encoding.ids

# 17.transformers

| 版本 | 描述 | 注意 | 适配M1 |
| ----- | ------------------- | ------------------------------------------------------------ | ------ |
| 4.6.1 | SOTA自然语言处理库. | 1. 默认的缓存路径是~/.cache/huggingface/transformers 2. 部分功能需要依赖sentencepiece模块. ||
| 版本 | 描述 | 注意 | 适配M1 |
| ------ | ------------------- | ------------------------------------------------------------ | ------ |
| 4.18.0 | SOTA自然语言处理库. | 1. 默认的缓存路径是~/.cache/huggingface/transformers 2. 部分功能需要依赖sentencepiece模块. ||

## 17.1.AlbertTokenizer

### 17.1.1.\__call__()

为Albert分词(预处理)一个或者多个数据.|{`input_ids`, (`token_type_ids`), (`attention_mask`)}
为Albert分词(预处理)一个或者多个数据.|`transformers.tokenization_utils_base.BatchEncoding{'input_ids': tf.Tensor, 'token_type_ids': tf.Tensor, 'attention_mask': tf.Tensor}`

```python
from transformers import AlbertTokenizer
Expand All @@ -4714,8 +4714,10 @@ encoder = tokenizer(text=x, # list of str|需要预处理的文本.
truncation=False, # bool(可选)|False|是否截断到最大长度.
max_length=128, # int(可选)|None|填充和截断的最大长度.
return_tensors='tf', # {'tf', 'pt', 'np'}(可选)|None|返回张量的类型.
return_token_type_ids=False, # bool(可选)|False|是否返回令牌ID.
return_attention_mask=False) # bool(可选)|False|是否返回注意力掩码.
return_token_type_ids=True, # bool(可选)|False|是否返回令牌ID.
return_attention_mask=True) # bool(可选)|False|是否返回注意力掩码.

input_ids, attention_mask, token_type_ids = encoder['input_ids'], encoder['attention_mask'], encoder['token_type_ids']
```

### 17.1.2.from_pretrained()
Expand All @@ -4729,7 +4731,7 @@ tokenizer = AlbertTokenizer.from_pretrained(pretrained_model_name_or_path='alber
do_lower_case=True) # bool(可选)|True|是否全部转换为小写字母.
```

## 17.2.BertTokenizer()
## 17.2.BertTokenizer

### 17.2.1.\__call__()

Expand All @@ -4747,13 +4749,15 @@ encoder = tokenizer(text=x, # list of str|需要预处理的文本.
truncation=False, # bool(可选)|False|是否截断到最大长度.
max_length=128, # int(可选)|None|填充和截断的最大长度.
return_tensors='tf', # {'tf', 'pt', 'np'}(可选)|None|返回张量的类型.
return_token_type_ids=False, # bool(可选)|False|是否返回令牌ID.
return_attention_mask=False) # bool(可选)|False|是否返回注意力掩码.
return_token_type_ids=True, # bool(可选)|False|是否返回令牌ID.
return_attention_mask=True) # bool(可选)|False|是否返回注意力掩码.

input_ids, attention_mask, token_type_ids = encoder['input_ids'], encoder['attention_mask'], encoder['token_type_ids']
```

### 17.2.2.from_pretrained()

实例化Bert预训练分词器.|`transformers.models.bert.tokenization_bert.BertTokenizer`
实例化Bert预训练分词器.|`transformers.tokenization_utils_base.BatchEncoding{'input_ids': tf.Tensor, 'token_type_ids': tf.Tensor, 'attention_mask': tf.Tensor}`

```python
from transformers import BertTokenizer
Expand All @@ -4776,7 +4780,22 @@ config = RobertaConfig.from_pretrained(pretrained_model_name_or_path='roberta-ba

## 17.4.TFAlbertModel

### 17.4.1.from_pretrained()
### 17.4.1.\_\_call\_\_()

调用Albert模型.|`transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling`

```python
from transformers import TFAlbertModel

model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v2', trainable=True)
outputs = model(input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids)

sequence_output, pooled_output = outputs.last_hidden_state, outputs.pooler_output
```

### 17.4.2.from_pretrained()

实例化预训练的Albert模型.|`transformers.models.albert.modeling_tf_albert.TFAlbertModel`

Expand All @@ -4789,7 +4808,22 @@ model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base

## 17.5.TFBertModel

### 17.5.1.from_pretrained()
### 17.5.1.\_\_call\_\_()

调用Bert模型.|`transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions`

```python
from transformers import TFBertModel

model = TFBertModel.from_pretrained(pretrained_model_name_or_path='bert-base-uncased', trainable=True)
outputs = model(input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids)

sequence_output, pooled_output = outputs.last_hidden_state, outputs.pooler_output
```

### 17.5.2.from_pretrained()

实例化预训练的Bert模型.|`transformers.models.bert.modeling_tf_bert.TFBertModel`

Expand Down
39 changes: 31 additions & 8 deletions TensorFlow.md
Original file line number Diff line number Diff line change
Expand Up @@ -2978,15 +2978,38 @@ function func() {
let c = tf.tidy(func); // nameOrFn: string or Function|输入的函数.
```

# 3.tensorflow_datasets
# 3.tensorflow_addons

| 版本 | 描述 | 注意 | 适配M1 |
| ------ | --------------------- | ---- | ------ |
| 0.16.1 | TensorFlow的额外工具. | - ||

## 3.1.optimizers

| 版本 | 描述 | 注意 |
| ---- | -------------------------- | ---- |
| - | 符合Keras API的其他优化器. | - |

### 3.1.2.AdamW()

实例化带权重衰减的`Adam`优化器.

```python
from tensorflow_addons.optimizers import AdamW

optimizer = AdamW(weight_decay=4e-3, # float|权重衰减.
learning_rate=0.001) # float|0.001|学习率.
```

# 4.tensorflow_datasets

| 版本 | 描述 | 注意 | 适配M1 |
| ----- | ----------------------- | ------------------------------------------------------------ | ------ |
| 4.3.0 | TensorFlow的官方数据集. | 1. 默认的缓存路径是~/tensorflow_datasets. 2. 视网络情况使用代理. ||

## 3.1.features
## 4.1.features

### 3.1.1.ClassLabel
### 4.1.1.ClassLabel

实例化`ClassLabel`来建立整数和标签的映射.

Expand All @@ -2996,7 +3019,7 @@ import tensorflow_datasets as tfds
class_label = tfds.features.ClassLabel(names=['cat', 'dog', 'bird']) # list of str|标签字符串列表.
```

#### 3.1.1.1.int2str()
#### 4.1.1.1.int2str()

将整数转换为标签字符串.|str

Expand All @@ -3007,7 +3030,7 @@ class_label = tfds.features.ClassLabel(names=['cat', 'dog', 'bird'])
label = class_label.int2str(int_value=1) # int|标签整数索引.
```

## 3.2.load()
## 4.2.load()

加载数据集.|`dict of tf.data.Datasets`

Expand All @@ -3020,13 +3043,13 @@ ds_train, ds_test = tfds.load(name='mnist', # str|数据集的注册名称.
as_supervised=True) # bool(可选)|False|是否返回标签.
```

# 4.tensorflow_hub
# 5.tensorflow_hub

| 版本 | 描述 | 注意 | 适配M1 |
| ------ | ----------------------- | ------------------------------------------------------------ | ------ |
| 0.12.0 | TensorFlow的官方模型库. | 1. 推荐使用环境变量`TFHUB_CACHE_DIR`指定模型保存位置. 2. [TensorFlow Hub 国内镜像](https://hub.tensorflow.google.cn/) ||

## 4.1.KerasLayer()
## 5.1.KerasLayer()

将模型修饰为Keras的网络层.|`tensorflow_hub.keras_layer.KerasLayer`

Expand All @@ -3040,7 +3063,7 @@ layer = KerasLayer(handle='https://hub.tensorflow.google.cn/google/efficientnet/
dtype='float32') # tensorflow.python.framework.dtypes.DType|'float32'|期望的数据类型.
```

## 4.2.load()
## 5.2.load()

加载模型.|`tensorflow.python.training.tracking.tracking.AutoTrackable`

Expand Down

0 comments on commit e11c3a5

Please sign in to comment.