-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
step 2 when run "/get_train_lora_grads.sh", load the optimizer.pt error is happend #4
Comments
Same error, have you solved it? |
Hi, what transformers version are you using? I updated the requirement file to specify |
I am getting the same error despite using the same transformers version! |
Same error using |
Hi, I realized that you would have to use fsdp to get the I am sure there is a workaround to get key-value based optimization states from an index-value based optimization states, and one can probably reuse functions from optimizer.state_dict() in huggingface. |
Hi, thank you for your solution. I added the arguments, and there is a new error:
|
It seems to be a flatten issue, could you provide the script and code you ran? |
Thank you for your response! I run the
|
@xiamengzhou Hi, I got the same error (KeyError: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight), while I am loading optimizer.pt instead of optimizer.bin. Is there a way to solve this? |
I encountered the same error. When I tried running it without the --fsdp 'full_shard auto_wrap' --fsdp_config llama_finetune settings, I received optimizer.pt. Then, after modifying the code in get_info.py from optimizer.bin to optimizer.pt, I encountered a "KeyError" related to 'base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight'. Has anyone found a solution to this issue? |
@Tantor-D @RrankPyramid Could you check what the keys are like in your |
@xiamengzhou
However, the the
I will add |
Hi @Tantor-D , have you found a solution yet? After I add I have a new error:
|
@tengerye I solve the error by adding Here is the changed version. training_args="$base_training_args \
--fsdp 'full_shard auto_wrap' \
--fsdp_config llama_finetune \
--model_name_or_path $model_path \
--output_dir $output_dir \
--percentage $percentage \
--data_seed $data_seed \
--train_files ${train_files[@]} 2>&1 | tee $output_dir/train.log" |
@Tantor-D Thank you so much for your kind reply. My problem came from the wrong version of my environment packages and it has been solved. |
Hi @xiamengzhou I have another question about the code. After I tested the code, I found that we need to have two round warmup training, as first, I need to disable --fsdp 'full_shard auto_wrap' --fsdp_config llama_finetune to finish a round of train and get the optimizer1.bin and then using --fsdp 'full_shard auto_wrap' --fsdp_config llama_finetune for another training to get optimizer2.bin. After that, I have to move the optimizer2.bin to optimizer1.bin due to the key problem like KeyError: 'base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight'. Hence, may I ask the way to merge both training and use one round of training to get the warmup model? Thanks. |
The problem has been solved. Thanks |
Hi, I use this code to run smoonthly for Step 1 but at Step 2 I encountered the optimizer,bin not found problem. |
I have a very basic workaround for this index-value-based file, probably there are bugs but so far it seem to work from transformers.optimization import AdamW
from transformers.trainer_pt_utils import get_parameter_names
from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
def load_adam_state(model, optimizer_state_path):
opt_grouped_parameters = [{'weight_decay': 0.0}, {'weight_decay': 0.0}]
opt_grouped_parameter_names = [None, None]
decay_parameters = [name for name in get_parameter_names(model, ALL_LAYERNORM_LAYERS) if 'bias' not in name]
opt_grouped_parameters[0]['params'], opt_grouped_parameter_names[0] = zip(*[
(p, n) for n, p in model.named_parameters() if n in decay_parameters and p.requires_grad])
param_name_to_size_dict = {n: p.size() for n, p in model.named_parameters() if p.requires_grad}
if len(param_name_to_size_dict) != len(opt_grouped_parameter_names[0]):
opt_grouped_parameters[1]['params'], opt_grouped_parameter_names[1] = zip(*[
(p, n) for n, p in model.named_parameters() if n not in decay_parameters and p.requires_grad])
else:
opt_grouped_parameters[1]['params'], opt_grouped_parameter_names[1] = [], []
optimizer = AdamW(opt_grouped_parameters)
optimizer.load_state_dict(torch.load(optimizer_state_path, map_location='cpu'))
saved_state_dict = optimizer.state_dict()
param_name_to_saved_state_dict = {}
for group_idx in range(len(saved_state_dict['param_groups'])):
group_param_indices = saved_state_dict['param_groups'][group_idx]['params']
group_param_names = opt_grouped_parameter_names[group_idx]
for param_idx, param_name in zip(group_param_indices, group_param_names):
param_size = param_name_to_size_dict[param_name]
exp_avg = saved_state_dict['state'][param_idx]['exp_avg']
exp_avg_sq = saved_state_dict['state'][param_idx]['exp_avg_sq']
assert exp_avg.size() == param_size
param_name_to_saved_state_dict[param_name] = {'exp_avg': exp_avg, 'exp_avg_sq': exp_avg_sq}
return param_name_to_saved_state_dict |
When I add fsdp in training scripts,
contais (envdc) bli303@9b72496437af:~/dc/out/llama2-7b-p0.05-lora-seed3_fsdp/checkpoint-105$ ls |
when load the optimizer.pt display the key is different
KeyError: 'base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight'
the items in optimizer.pt state is 0~255.
The text was updated successfully, but these errors were encountered: