最后活跃于 1728786012

LLaMA-Factory-2.md 原始文件
Approach Full-tuning Freeze-tuning LoRA QLoRA
Pre-Training
Supervised Fine-Tuning
Reward Modeling
PPO Training
DPO Training
KTO Training
ORPO Training
SimPO Training