215 lines
5.3 KiB
Markdown
215 lines
5.3 KiB
Markdown
# Deeplearning 使用说明
|
||
|
||
## 1. 项目约定
|
||
|
||
### 1.1 输入数据格式
|
||
每一类数据建议保存为 `xlsx/xls`。读取时默认取偶数列(索引 1,3,5...)作为特征,奇数列内容可忽略。
|
||
|
||
示意:
|
||
|
||
| 任意值 | 特征值 | 任意值 | 特征值 |
|
||
|---|---|---|---|
|
||
| arbitrary value | value | arbitrary value | value |
|
||
|
||
### 1.2 目录约定
|
||
训练数据放在 `Static/`,输出结果放在 `Result/`。
|
||
|
||
推荐目录:
|
||
|
||
```text
|
||
.
|
||
├─ Static/
|
||
│ └─ 20241009MaterialDiv/
|
||
└─ Result/
|
||
```
|
||
|
||
## 2. Conda 环境迁移
|
||
|
||
环境文件在 `conda_env/`:
|
||
|
||
- `conda_env/environment.portable.yml`:通用迁移(推荐)
|
||
- `conda_env/environment.lock.txt`:精确锁定(同系统/同架构优先)
|
||
- `conda_env/env.yml`:历史文件
|
||
|
||
### 2.1 创建环境
|
||
|
||
```bash
|
||
# 方式1(推荐):通用创建
|
||
conda env create -f conda_env/environment.portable.yml
|
||
conda activate Deeplearning
|
||
|
||
# 方式2:精确复现
|
||
conda create -n Deeplearning --file conda_env/environment.lock.txt
|
||
conda activate Deeplearning
|
||
|
||
# 验证
|
||
python -V
|
||
python -c "import torch; print(torch.__version__)"
|
||
```
|
||
|
||
### 2.2 同名环境已存在时
|
||
|
||
```bash
|
||
# 方式A:保留旧环境,改名创建
|
||
conda env create -f conda_env/environment.portable.yml -n Deeplearning_v2
|
||
conda activate Deeplearning_v2
|
||
|
||
# 或者(lock 方式)
|
||
conda create -n Deeplearning_v2 --file conda_env/environment.lock.txt
|
||
conda activate Deeplearning_v2
|
||
```
|
||
|
||
```bash
|
||
# 方式B:删除旧环境后重建(谨慎)
|
||
conda env remove -n Deeplearning
|
||
conda env create -f conda_env/environment.portable.yml
|
||
conda activate Deeplearning
|
||
```
|
||
|
||
### 2.3 重新导出环境
|
||
|
||
```bash
|
||
conda env export -n Deeplearning --no-builds > conda_env/environment.portable.yml
|
||
conda list -n Deeplearning --explicit > conda_env/environment.lock.txt
|
||
```
|
||
|
||
## 3. 快速开始
|
||
|
||
### 3.1 准备数据
|
||
1. 将数据目录命名为 `日期+项目名`,例如 `20241009MaterialDiv`。
|
||
2. 准备 `label_names`(建议英文或数字)。
|
||
3. 将数据目录放入 `Static/`。
|
||
|
||
### 3.2 数据目录模板
|
||
|
||
单文件模式(每个标签一个文件):
|
||
|
||
```text
|
||
Static/
|
||
20241009MaterialDiv/
|
||
Acrlic.xlsx
|
||
Ecoflex.xlsx
|
||
PDMS.xlsx
|
||
PLA.xlsx
|
||
Wood.xlsx
|
||
```
|
||
|
||
多子特征模式(每个标签一个子目录,目录下可有多个文件):
|
||
|
||
```text
|
||
Static/
|
||
20241009MaterialDiv/
|
||
Acrlic/
|
||
sample_01.xlsx
|
||
sample_02.xlsx
|
||
Ecoflex/
|
||
sample_01.xlsx
|
||
sample_02.xlsx
|
||
PDMS/
|
||
sample_01.xlsx
|
||
sample_02.xlsx
|
||
PLA/
|
||
sample_01.xlsx
|
||
sample_02.xlsx
|
||
Wood/
|
||
sample_01.xlsx
|
||
sample_02.xlsx
|
||
```
|
||
|
||
命名规则(重要):
|
||
|
||
- `label_names` 中每一项必须与文件名(单文件模式)或子文件夹名(多子特征模式)一致。
|
||
- `label_names` 顺序就是标签编码顺序,训练结果和混淆矩阵按该顺序展示。
|
||
|
||
示例:
|
||
|
||
```python
|
||
label_names = ['Acrlic', 'Ecoflex', 'PDMS', 'PLA', 'Wood']
|
||
```
|
||
|
||
对应关系:
|
||
|
||
```text
|
||
Acrlic <-> Acrlic.xlsx 或 Acrlic/
|
||
Ecoflex <-> Ecoflex.xlsx 或 Ecoflex/
|
||
PDMS <-> PDMS.xlsx 或 PDMS/
|
||
PLA <-> PLA.xlsx 或 PLA/
|
||
Wood <-> Wood.xlsx 或 Wood/
|
||
```
|
||
|
||
### 3.3 训练示例
|
||
|
||
```python
|
||
from Qtorch.Models.Qmlp import Qmlp
|
||
from Qfunctions.divSet import divSet
|
||
from Qfunctions.loadData import load_data
|
||
from Qfunctions.saveToXlsx import save_to_xlsx
|
||
|
||
projet_name = '20241009MaterialDiv'
|
||
label_names = ['Acrlic', 'Ecoflex', 'PDMS', 'PLA', 'Wood']
|
||
|
||
# 自动识别数据模式
|
||
# - folder/label.xlsx => 单文件模式
|
||
# - folder/label/*.xlsx => 多子特征模式
|
||
data = load_data(projet_name, label_names, fileClass='xlsx')
|
||
|
||
# 划分训练/测试集
|
||
X_train, X_test, y_train, y_test, encoder = divSet(
|
||
data=data,
|
||
labels=label_names,
|
||
test_size=0.3
|
||
)
|
||
|
||
# 构建模型
|
||
model = Qmlp(
|
||
X_train=X_train,
|
||
X_test=X_test,
|
||
y_train=y_train,
|
||
y_test=y_test,
|
||
hidden_layers=[128],
|
||
dropout_rate=0
|
||
)
|
||
|
||
# 训练与导出结果
|
||
pca_2d, pca_3d = model.get_PCA()
|
||
model.fit(300)
|
||
|
||
cm = model.get_cm()
|
||
cmn = model.get_cmn()
|
||
epoch_data = model.get_epoch_data()
|
||
|
||
save_to_xlsx(project_name=projet_name, file_name='pca_2d', data=pca_2d)
|
||
save_to_xlsx(project_name=projet_name, file_name='pca_3d', data=pca_3d)
|
||
save_to_xlsx(project_name=projet_name, file_name='cm', data=cm)
|
||
save_to_xlsx(project_name=projet_name, file_name='cmn', data=cmn)
|
||
save_to_xlsx(project_name=projet_name, file_name='acc_and_loss', data=epoch_data)
|
||
```
|
||
|
||
## 4. load_data 参数说明
|
||
|
||
| 参数 | 类型 | 默认值 | 说明 |
|
||
|---|---|---|---|
|
||
| folder | str | 必填 | `Static/` 下的数据目录名 |
|
||
| labelNames | list | 必填 | 类别名称列表,用于读取和排序标签 |
|
||
| fileClass | str | xlsx | 数据文件后缀 |
|
||
|
||
自动识别规则:
|
||
|
||
- 若每个 `label` 都对应 `folder/label/*.xlsx`,识别为多子特征模式。
|
||
- 若每个 `label` 都对应 `folder/label.xlsx`,识别为单文件模式。
|
||
- 若两种都成立(同名文件和同名子目录同时存在),会报错并提示只保留一种目录结构。
|
||
- 若两种都不成立,会报错并提示检查目录结构或 `label_names`。
|
||
|
||
读取路径规则:
|
||
|
||
- 单文件模式:`./Static/folder/labelNames[i].xlsx`
|
||
- 多子特征模式:`./Static/folder/labelNames[i]/*.xlsx`
|
||
|
||
## 5. 常见问题
|
||
|
||
### 5.1 找不到文件
|
||
优先检查:
|
||
|
||
- `label_names` 与文件/文件夹是否同名
|
||
- 文件后缀是否与 `fileClass` 一致
|