公共数据

慧星云整理了一些常用的公开数据集，方便您在实例中使用。

1、登录方式

使用 ssh 命令登录工作区，例如：

ssh -p 14917 root@hz1.dc.houdeyun.cn

2、公共数据位置

公共盘挂载在 /root/public/ 目录，您可以通过运行 df 命令来检查目录是否已正确挂载。

3、使用方式

由于用户只具有对公共盘的只读权限，请确保将压缩文件解压到您本地后再使用。示例命令如下：

unzip /root/public/ModelNet/ModelNet10.zip -d /root/ModelNet10

数据名称	实例中路径	大小	类型	发布方	介绍
argoverse2.0感知数据集	/root/public/datasets/argoverse2.0-sensor	739.02 GiB	数据集	https://argoverse.github.io	https://argoverse.github.io/user-guide/
Vimeo-90k	/root/public/datasets/Vimeo-90k	81.89 GiB	数据集	toflow.csail.mit.edu	Vimeo-90k视频超分数据集
CULane	/root/public/datasets/CULane	42.45 GiB	数据集	https://xingangpan.github.io/projects/CULane.html	CULane is a large scale challenging dataset for academic research on traffic lane detection
TT100K	/root/public/datasets/TT100K	106.77 GiB	数据集	https://cg.cs.tsinghua.edu.cn/traffic-sign/	交通信号灯检测与识别数据集
cifar-100	/root/public/datasets/cifar-100	161.17 MiB	数据集	https://www.cs.toronto.edu/~kriz/cifar.html	CIFAR-100图像分类数据集
CUB200-2011	/root/public/datasets/CUB200-2011	1.11 GiB	数据集	http://www.vision.caltech.edu/datasets/cub_200_2011/	鸟类细粒度分类数据集
ModelNet	/root/public/datasets/ModelNet	2.34 GiB	数据集	https://modelnet.cs.princeton.edu/	The goal of the Princeton ModelNet project is to provide researchers in computer vision, computer graphics, robotics and cognitive science, with a comprehensive clean collection of 3D CAD models for objects.
S3DIS	/root/public/datasets/S3DIS	14.26 GiB	数据集	http://buildingparser.stanford.edu/dataset.html	Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)
Aishell	/root/public/datasets/Aishell	14.51 GiB	数据集	http://openslr.org/33/	400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz.
CrowdHuman	/root/public/datasets/CrowdHuman	13.25 GiB	数据集	https://www.crowdhuman.org/	The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively.
MsCelebV1	/root/public/datasets/MS-Celeb-1M	154.4 GiB	数据集	http://research.microsoft.com/en-US/projects/irc/acmmm2016.aspx	微软名人数据集
DIV2K	/root/public/datasets/DIV2K	8.45 GiB	数据集	https://data.vision.ee.ethz.ch/cvl/DIV2K/	DIVerse 2K resolution high quality images as used for the challenges @ NTIRE (CVPR 2017 and CVPR 2018) and @ PIRM (ECCV 2018)
nuScenes	/root/public/datasets/nuScenes	548.35 GiB	数据集	https://www.nuscenes.org/	nuScenes is a public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.
CelebA	/root/public/datasets/CelebA	21.69 GiB	数据集	http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html	CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.
KITTI	/root/public/datasets/KITTI	132.99 GiB	数据集	https://www.cvlibs.net/datasets/kitti	KITTI数据集
KITTI_Depth_Completion	/root/public/datasets/KITTI/kitti_depth_completion	92.7 GiB	数据集	http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion	KITTI深度补全数据集
SemanticKITTI	/root/public/datasets/SemanticKITTI	82.83 GiB	数据集	http://www.semantic-kitti.org/dataset.html#download	SemanticKITTI数据集
MPII Human Pose	/root/public/datasets/mpii_human_pose	11.27 GiB	数据集	http://human-pose.mpi-inf.mpg.de/#download	MPII Human Pose数据集
MVTec AD	/root/public/datasets/mvtec-ad	4.9 GiB	数据集	https://www.mvtec.com/company/research/datasets/mvtec-ad	工业异常检测的数据集
ImageNet100	/root/public/datasets/ImageNet100	13.41 GiB	数据集	image-net.org	ImageNet 100类数据集。参考：https://github.com/HobbitLong/CMC/blob/master/imagenet100.txt
ImageNet	/root/public/datasets/imagenet-1k	157.56 GiB	数据集	image-net.org	ImageNet 1000类分类识别数据集
SAIL-VOS	/root/public/datasets/SAIL-VOS	173.18 GiB	数据集	https://sailvos.web.illinois.edu/_site/dataset_readme.html	语义非模态实例级视频对象分割数据集（内蒙A区有该数据集）
MOT17	/root/public/datasets/mot17	5.46 GiB	数据集	https://motchallenge.net/data/MOT17/	MOT17 Challenge
Cityscapes	/root/public/datasets/cityscapes	11.03 GiB	数据集	www.cityscapes-dataset.net	城市街景实例/语义分割
GOT10k	/root/public/datasets/GOT10k	71.11 GiB	数据集	got-10k.aitestunion.com	大型目标跟踪数据集
MOT20	/root/public/datasets/mot20	4.7 GiB	数据集	motchallenge.net/data/MOT20/	密集人群中行人跟踪数据集（多目标跟踪）
CASIAWebFace	/root/public/datasets/CASIAWebFace	4.1 GiB	数据集	www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html	大规模人脸数据集，主要用于身份鉴定和人脸识别，包含 10,575 个主题和 494,414 张图像
DOTA v1	/root/public/datasets/DOTA	18.83 GiB	数据集	captain-whu.github.io/DOTA	航拍图像物体检测数据集
ADEChallengeData2016	/root/public/datasets/ADEChallengeData2016	1.1 GiB	数据集	sceneparsing.csail.mit.edu	ADE20K场景语义分割数据集
COCO 2017	/root/public/datasets/coco2017	25.19 GiB	数据集	Microsoft	COCO 2017检测数据集
CIFAR10	/root/public/datasets/cifar-10	163 MB	数据集	www.cs.toronto.edu	CIFAR10 分类数据集
PASCAL VOC2012	/root/public/datasets/voc2012	1.8 GiB	数据集	host.robots.ox.ac.uk	VOC 2012检测和语义分割数据集
PASCAL VOC2007	/root/public/datasets/voc2007	837 MB	数据集	host.robots.ox.ac.uk	VOC 2007检测和语义分割数据集
RoBERTa预训练模型(Torch)	/root/public/models/RoBERTa-Pretrain-Model	1.06 GiB	模型	参考：https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?tab=BB08J2	RoBERTa预训练模型
开源中英双语对话模型	/root/public/models/chatglm2-6b	11.63 GiB	模型	https://huggingface.co/THUDM/chatglm2-6b	ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本

2024-12-06

公共数据

使用方法

公共数据

本页目录

公共数据

使用方法​

公共数据

本页目录

使用方法