- Embedding.reset_parameters: normal_(std=0.02) - Linear.reset_parameters: kaiming_uniform_ + uniform_ bias - Transformer._init_weights 通过 apply 递归调用子模块 reset_parameters - 移除全局 normal_(0.006) 覆盖,各模块使用更合适的分布 |
||
|---|---|---|
| .. | ||
| config | ||
| dataset | ||
| inference | ||
| model | ||
| parallel | ||
| tokenize | ||
| trainer | ||
| __init__.py | ||
| factory.py | ||
| serialization.py | ||