3.2. Testing the Condor Roll

  1. First, make sure condor daemons are running by executing:

    # ps -ef | grep condor

    On the frontend, the output should be similar to following:

    condor    2623     1  0 Apr19 ?        00:04:26 /opt/condor/sbin/condor_master
    condor    2646  2623  0 Apr19 ?        00:20:25 condor_collector -f
    condor    2647  2623  0 Apr19 ?        00:04:56 condor_negotiator -f
    condor    2649  2623  0 Apr19 ?        00:00:02 condor_schedd -f

    And on the compute nodes, the output should be similar to following:

    condor   17007     1  0 Apr19 ?        00:01:09 /opt/condor/sbin/condor_master
    condor   17009 17007  0 Apr19 ?        00:00:02 condor_schedd -f
    condor   17010 17007  0 Apr19 ?        00:09:09 condor_startd -f
  2. Try a test job submission.

    # su - condor 
    $ cd ~condor/tests
    $ condor_submit subs/hmmpfam3
  3. Check if jobs are submitted by executing:

    $ condor_q

    The output should be similar to:

    -- Submitter: rocks-155.sdsc.edu : <198.202.88.155:47289> : rocks-155.sdsc.edu
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   condor          4/27 23:02   0+00:00:48 R  0   0.2  hmmpfam data/db100
       1.1   condor          4/27 23:02   0+00:00:46 R  0   0.2  hmmpfam data/db100
       1.2   condor          4/27 23:02   0+00:00:44 R  0   0.2  hmmpfam data/db100
       1.3   condor          4/27 23:02   0+00:00:42 R  0   0.2  hmmpfam data/db100
       1.4   condor          4/27 23:02   0+00:00:38 R  0   0.2  hmmpfam data/db100
       1.5   condor          4/27 23:02   0+00:00:36 R  0   0.2  hmmpfam data/db100
       1.6   condor          4/27 23:02   0+00:00:34 R  0   0.2  hmmpfam data/db100
       1.7   condor          4/27 23:02   0+00:00:40 R  0   0.2  hmmpfam data/db100
       1.8   condor          4/27 23:02   0+00:00:32 I  0   0.2  hmmpfam data/db100
       1.9   condor          4/27 23:02   0+00:00:30 I  0   0.2  hmmpfam data/db100

    R in status column(ST) means running. I means idling. The output from the jobs will be in results/

  4. Once the queue is empty (above command shows no jobs) can see the history of jobs execution with:

    $ condor_history

    To see all the nodes in the condor pool do:

    $ condor_status

    The output should be similar to:

    Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
    
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:40:04
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:45:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:45:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:45:07
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:35:04
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:40:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:40:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:40:07
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:25:04
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:30:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:30:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:30:07
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:15:05
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:20:06
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:20:07
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:20:08
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:10:04
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:15:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:15:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:15:07
    vm1@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:00:04
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:05:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:05:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:05:07
    vm1@compute-0 LINUX       INTEL  Owner      Idle       0.860   506  0+00:00:09
    vm2@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:00:05
    vm3@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:00:06
    vm4@compute-0 LINUX       INTEL  Unclaimed  Idle       0.000   506  0+00:00:07
    
                         Machines Owner Claimed Unclaimed Matched Preempting
    
             INTEL/LINUX       28     1       0        27       0          0
    
                   Total       28     1       0        27       0          0
  5. The directory ~condor/tests has a few tests programs with the corresponding job submit files for running test jobs in different condor universes. The test programs are in bin/, and the submit files are in subs/. The output of the jobs, if any, goes to results/. To run these tests as condor simply execute condor_submit command followed by the desired submit file name from subs/. For example:

    $ cd ~/tests
    $ condor_submit subs/hmmpfam3

    Note

    The test program tests/bin/simple.mpi and its submit file test/subs/submit_mpi are provided only as a reference. Current condor binaries do not work with MPI programs compiled with mpicc version higher then v.1.2.4. If you wish to run jobs in MPI universe your programs should be compiled with MPI versions 1.2.2 through 1.2.4.