问题描述
背景:实验室的计算集群安装了ROCKS集群管理软件,系统是centos,PBS是troque。问题:提交算例以后就一直处于Q等待调度的状态。[bgb@clustertest]$qstatJobidNameUserTimeUseSQueue----------------------------------------------------------------------25.cluster0.01-0.00001bgb0Qdefault强制运行[bgb@clustertest]$qrun25.clusterpbs_iff:Accessfromhostnotallowed,orunknownhostMSG=requestnotauthorizedfromhostcluster.localpbs_iff:Accessfromhostnotallowed,orunknownhostMSG=requestnotauthorizedfromhostcluster.localqrun:UnknownJobIdMSG=cannotlocatejob25.cluster.local查了下pbs_iff,说是和用户认证有关,为pbsserver提供pbs信任状。但是vipbs_iff全是乱码弄了一天不知道到底是什么原因?还有问一下关于Pbs队列配置的问题:我发现在/opt/troque目录下有一个pbs.default文件,default是我定义的一个队列,打开以后如下:##Createanddefinequeuedefault#createqueuedefaultsetqueuedefaultqueue_type=Executionsetqueuedefaultkeep_completed=120setqueuedefaultenabled=Truesetqueuedefaultstarted=True##Setserverattributes.#setserverscheduling=Truesetserveracl_host_enable=Falsesetservermanagers=maui@cluster.hpc.orgsetservermanagers+=root@cluster.hpc.orgsetserverdefault_queue=defaultsetserverlog_events=511setservermail_from=admsetserverquery_other_jobs=Truesetserverallow_node_submit=Truesetservermoab_array_compatible=True这和我用qmgr-c'ps'命令查到的队列配置:[bgb@clustertest]$qmgr-c'ps'##Createqueuesandsettheirattributes.
解决方案二:
Createanddefinequeuedefault#createqueuedefaultsetqueuedefaultqueue_type=Executionsetqueuedefaultacl_host_enable=Truesetqueuedefaultacl_user_enable=Truesetqueuedefaultacl_users=bgbsetqueuedefaultenabled=Truesetqueuedefaultstarted=True##Setserverattributes.#setserverscheduling=Truesetserveracl_host_enable=Truesetserveracl_hosts=cluster.hpc.orgsetserveracl_users=root@*setserverdefault_queue=defaultsetserverlog_events=511setservermail_from=admsetserverscheduler_iteration=600setservernode_check_rate=150setservertcp_timeout=6setserverpoll_jobs=Truesetservermom_job_sync=Truesetserverauto_node_np=Truesetservernext_job_number=26不同,到底哪个配置被执行了?下面的配置有一部分是自己写的,应该有不少错误的地方还望各位前辈能帮忙指正。谢谢