This website requires JavaScript.
Explore
Help
Sign In
glowz
/
data-prepare
Watch
1
Star
0
Fork
0
You've already forked data-prepare
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
10
Commits
1
Branch
0
Tags
main
Go to file
Code
Clone
HTTPS
Tea CLI
Open with VS Code
Open with VSCodium
Open with Intellij IDEA
Download ZIP
Download TAR.GZ
Download BUNDLE
glowz
40262648c4
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
01-pre-multi.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
01-pre.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
02-data_select_date_len.py
添加数据处理脚本,支持从原始数据筛选、抽样到转换为Alpaca格式
2025-06-09 14:39:07 +08:00
03-data_select_random.py
添加数据处理脚本,支持从原始数据筛选、抽样到转换为Alpaca格式
2025-06-09 14:39:07 +08:00
03-data_select_ratio.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
04-data2swift.py
swift
2025-07-18 18:00:04 +08:00
05-data-csv-swift-pretrain.py
swift
2025-07-18 18:00:04 +08:00
05-data-csv-swift-sft.py
更新数据转换功能,支持从新格式提取信息并生成多种问题模板,优化输入输出文件路径
2025-07-19 17:06:10 +08:00
05-data-csv-xtuner.py
添加数据处理脚本,支持从原始数据筛选、抽样到转换为Alpaca格式
2025-06-09 14:39:07 +08:00
05-data-swfit-pretrain-revise.py
swift
2025-07-18 18:00:04 +08:00
05-data-swfit-sft2multi_type-crawl.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
05-data-swfit-sft2multi_type.py
添加从arXiv批量获取论文数据的功能,并将结果保存为JSONL格式,优化了数据处理流程
2025-07-28 06:11:49 +08:00
05-data-swfit-sft2pretrain.py
swift
2025-07-18 18:00:04 +08:00
05-data-swfit-xtuner.py
swift
2025-07-18 18:00:04 +08:00
05-data-xtuner-swfit.py
swift
2025-07-18 18:00:04 +08:00
06-data-swift-compose.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
06-data-xtuner-compose.py
swift
2025-07-18 18:00:04 +08:00
arxiv-metadata-oai-snapshot--swift-26-500-m.jsonl
Add validation analysis script for classification results
2025-07-20 21:04:08 +08:00
arxiv-metadata-oai-snapshot--swift-26-500.json
swift
2025-07-18 18:00:04 +08:00
arxiv-metadata-oai-snapshot--swift-26-500.jsonl
Add validation analysis script for classification results
2025-07-20 21:04:08 +08:00
arxiv-metadata-oai-snapshot--swift-26-m.jsonl
Add validation analysis script for classification results
2025-07-20 21:04:08 +08:00
arxiv-metadata-oai-snapshot--swift-26.json
swift
2025-07-18 18:00:04 +08:00
arxiv-metadata-oai-snapshot--swift-26.jsonl
Add validation analysis script for classification results
2025-07-20 21:04:08 +08:00
arxiv-metadata-oai-snapshot--swift-26.jsonl.txt
swift
2025-07-18 18:00:04 +08:00
arxiv-metadata-oai-snapshot--swift-pretrain-26.jsonl
swift
2025-07-18 18:00:04 +08:00
crawl-arxiv.py
添加从arXiv批量获取论文数据的功能,并将结果保存为JSONL格式,优化了数据处理流程
2025-07-28 06:11:49 +08:00
README.md
first commit
2025-06-09 14:21:39 +08:00
val_test.py
添加多个类别关键词,优化数据处理逻辑,支持从arXiv提取和筛选论文数据
2025-07-30 23:05:31 +08:00
README.md
The file is empty.
Description
No description provided
Readme
19
MiB
Languages
Python
100%