How to get job working dir in sun grid engine? - working-directory

Suppose we submit jobs by qsub under the path /path/to/working/dir, after jobs pending, we can get some information of jobs with qstat, but no working dir information is supplied by qstat.
How can we get this?

If you run a qstat -j jobId you will get output that has a field "sge_o_workdir:" that should have the info you need. For example if your jobID is 1234 your command would look like the below.
qstat -j 1234
and the output would look like the below.
==============================================================
job_number: 1234
exec_file: job_scripts/1234
submission_time: Wed Oct 10 19:00:03 2012
owner: user
uid: 1000
group: group
gid: 1000
sge_o_home: /home/user
sge_o_log_name: user
sge_o_path: /usr/local/packages/sge-root/bin/lx24-amd64:/usr/bin:/bin
sge_o_shell: /bin/sh
sge_o_workdir: /path/to/workDir
sge_o_host: host
account: sge
stderr_path_list: NONE:NONE:/path/to/error/
mail_list:
notify: FALSE
job_name: myJobName
stdout_path_list: NONE:NONE:/path/to/output/
jobshare: 0
hard_queue_list: all.q
env_list:
script_file: /some/script.sh
project:
usage 1: cpu=00:28:22, mem=83.71677 GBs, io=252.35721, vmem=234.090M, maxvmem=256.438M
scheduling info:

Related

Setting GitLab CI for NodeJS deployment in AWS Ubuntu Instance

I have an AWS Ubuntu instance which has GitLab CE configured. Now I want to configure GitLab CI to deploy my NodeJS app after each commit.I don't have any proper step by step solution for this.
My NodeJS app running in /var/www/mean/my-app on http://myapp.mydomain.com and the hosting is handled by Apache Proxy,
<VirtualHost *:80>
ServerAdmin anshad#mydomain.com
ServerName gitlab.mydomain.com
ServerAlias www.gitlab.mydomain.com
ServerSignature Off
ProxyPreserveHost On
AllowEncodedSlashes NoDecode
<Location />
Require all granted
ProxyPassReverse http://localhost:8080
ProxyPassReverse http://gitlab.mydomain.com/
</Location>
RewriteEngine on
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f [OR]
RewriteCond %{REQUEST_URI} ^/uploads/.*
RewriteRule .* http://127.0.0.1:8080%{REQUEST_URI} [P,QSA,NE]
DocumentRoot /home/git/gitlab/public
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b" common_forwarded
ErrorLog /var/log/apache2/gitlab_error.log
CustomLog /var/log/apache2/gitlab_forwarded.log common_forwarded
CustomLog /var/log/apache2/gitlab_access.log combined env=!dontlog
CustomLog /var/log/apache2/gitlab.log combined
</VirtualHost>
And the app is bootstrapped using forever module
forever start app.js
The gitlab config check sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production gives,
Checking GitLab Shell ...
GitLab Shell version >= 4.0.0 ? ... OK (4.0.0)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ...
dev / my-app ... ok
Running /home/git/gitlab-shell/bin/check
Check GitLab API access: OK
Access to /home/git/.ssh/authorized_keys: OK
Send ping to redis server: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes
Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Reply by email ...
Reply by email is disabled in config/gitlab.yml
Checking Reply by email ... Finished
Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured with autocrlf=input? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config outdated? ... no
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory setup correctly? ... yes
Init script exists? ... yes
Init script up-to-date? ... yes
projects have namespace: ...
dev / my-app ... yes
Redis version >= 2.8.0? ... yes
Ruby version >= 2.1.0 ? ... yes (2.3.1)
Your git bin path is "/usr/bin/git"
Git version >= 2.7.3 ? ... yes (2.7.4)
Active users: 1
Checking GitLab ... Finished
I used to login to the instance using SSH from my system,
ssh -i API-Key.pem ubuntu#ec2-XX-XX-XXX-XXX.ap-south-1.compute.amazonaws.com
Created key using command
ssh-keygen -t rsa
Runner config on /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "Production Runner"
url = "http://gitlab.mydomain.com/ci"
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxx"
executor = "ssh"
[runners.ssh]
user = "ubuntu"
host = "ip-XXX-XX-XX-XXX"
identity_file = "/home/ubuntu/.ssh/id_rsa"
[runners.cache]
Code on .gitlab-ci.yml
test_async:
script:
- npm install
Because of my bad configuration, the runner gives error,
Running with gitlab-ci-multi-runner 1.7.1 (f896af7)
Using SSH executor...
ERROR: Preparation failed: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Will be retried in 3s ...
My confusions are:
What should be the content of .gitlab-ci.yml file to deploy the committed code to the application location?
And how to configure a runner for this? If I have to use an ssh runner, what should be the configurations over there?
Update:
After providing .pem file as identity_file, getting the following error
Running with gitlab-ci-multi-runner 1.7.1 (f896af7)
Using SSH executor...
Running on ip-xxx-xx-xx-xxx via ip-xxx-xx-xx-xxx...
Cloning repository...
Cloning into 'builds/a92f1b91/0/dev/my-app'...
fatal: unable to access 'http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx#gitlab.mydomain.com/dev/my-app.git/': The requested URL returned error: 500
ERROR: Build failed: Process exited with: 1. Reason was: ()
Now there is a problem, git clone with http not working but ssh cloning works.
Note: Both gitlab and build environment are same host (same aws instance)
Bug reported in GitLab as well (http clone issue).
In your /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "Production Runner"
url = "http://gitlab.mydomain.com/ci"
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxx"
executor = "ssh"
[runners.ssh]
user = "ubuntu"
host = "ip-XXX-XX-XX-XXX"
identity_file = "/home/ubuntu/.ssh/id_rsa"
[runners.cache]
You define
host
user
and identity file
host should be your Build host IP (in other words where you are going to execute your build)
user should be your user on Build host. Not on gitlab host.
You can test how your password less ssh works by
Login to gitlab host as root
ssh -i /home/ubuntu/.ssh/id_rsa ubuntu#ip-XXX-XX-XX-XXX
If that works and doesn't ask you for a password - all is good.
If that breaks - means you didn't setup password less auth correctly.
The easiest way to setup password less public key based auth is to use command called
ssh-copy-id
For example I want to setup password less ssh auth between my gitlab and my build host.
My build host ip is 192.168.0.42 and host name is build.home
I already have my id_rsa and id_rsa.pub generated under /home/ubuntu/.ssh on gitlab host.
Now let's push our public key from gitlab host to our build host. First time it will ask you for a password.
[root#gitlab ~]# ssh-copy-id -i /home/ubuntu/.ssh/id_rsa.pub ubuntu#build.home
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ubuntu#build.home's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'ubuntu#build.home'"
and check to make sure that only the key(s) you wanted were added.
Note that in above example I was pushing public key to remote host.
But when I will be now connecting to this remote host - I will specify my private key.
[root#gitlab ~]# ssh -i /home/ubuntu/.ssh/id_rsa ubuntu#build.home
[ubuntu#build ~]$ hostname
build.home
Try testing your public key auth between gitlab host and remote host and update your question.
Resources:
https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/master/docs/executors/ssh.md
P.S: I will post my working environment config a bit later today when I get home.
Edit 1:
Here is my config.
My gitlab host is called gitlab.home 192.168.0.41
And I have another VM called sshbuild.home 192.168.0.43
Below is how I added ssh runner
Step 1. Install on my gitlab.home
yum install gitlab-ci-multi-runner and register my remote sshbuild.home VM as ssh runner
I also need to make sure that password less auth works between my gitlab.home and sshbuild.home, so
[root#gitlab gitlab-runner]# ssh-copy-id 192.168.0.43
The authenticity of host '192.168.0.43 (192.168.0.43)' can't be established.
ECDSA key fingerprint is b4:6a:1b:72:d1:7d:1f:34:f7:bb:ef:ad:69:42:11:13.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root#192.168.0.43's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh '192.168.0.43'"
and check to make sure that only the key(s) you wanted were added.
[root#gitlab gitlab-runner]# ssh 192.168.0.43
Last login: Fri Nov 18 17:05:06 2016 from 192.168.0.101
[root#sshbuild ~]# exit
Then I disabled my other runner (shell runner) and made new SSH runner project specific just to make sure that when I commit - it will be executed on the ssh runner
And I do commit and voila - we have our successfull test which was run on sshbuild.home host
Here is several links which might help with a better understanding of this topic
https://about.gitlab.com/2016/07/29/the-basics-of-gitlab-ci/
https://docs.gitlab.com/ce/ci/runners/README.html
http://docs.gitlab.com/runner/commands/README.html
https://docs.gitlab.com/ce/ci/yaml/README.html
P.S: And here is my /etc/gitlab-runner/config.toml file
[root#gitlab gitlab-runner]# cat /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "sshbuild"
url = "http://gitlab.home/"
token = "2bc1825d8fbde09fd632637c60e9e7"
executor = "ssh"
[runners.ssh]
user = "root"
host = "192.168.0.43"
port = "22"
identity_file = "/root/.ssh/id_rsa"
[runners.cache]
P.S: I have similar error as you if I disable HTTP for my repo under Settings in the web interface. However error is not 500 but 403.
Edit 2:
Now I will cover .gitlab-ci.yml based on a simple HelloWorld project
In my HelloWorld I have file called server.js which when run from node - will simply create Web Server running on port 3000 and will reply with "Hello World" on GET requests.
1 const http = require('http');
2
3 const hostname = '0.0.0.0';
4 const port = 3000;
5
6 const server = http.createServer((req, res) => {
7 res.statusCode = 200;
8 res.setHeader('Content-Type', 'text/plain');
9 res.end('Hello World!\n');
10 });
11
12 server.listen(port, hostname, () => {
13 console.log(`Server running at http://${hostname}:${port}/`);
14 });
My goal is to be able to run a test case against it. In this case I will run simple
curl localhost:3000 | grep "Hello World"
But I need to put it into a separate script which will have exit status 0 on success and non zero on failure
cat -n simpletest.sh
1 #!/bin/bash
2
3 cleanup ()
4 {
5 count=`netstat -anp|grep ":3000"|grep LISTEN|awk '{print $NF}'|cut -d\/ -f1|wc -l`
6 if [ $count -ne 0 ]
7 then
8 pid=`netstat -anp|grep ":3000"|grep LISTEN|awk '{print $NF}'|cut -d\/ -f1`;
9 echo "Need to kill PID $pid";
10 kill $pid
11 fi
12 }
13
14 echo "Running simple test"
15 curl localhost:3000|grep "Hello World"
16 if [ $? -eq 0 ]
17 then
18 echo "Test was successfull"
19 echo "Clean up node.js process"
20 cleanup
21 exit 0
22 else
23 echo "Test failed"
24 echo "Clean up node.js process"
25 cleanup
26 exit 1
27 fi
Now let's cover my .gitlab-ci.yml
cat -n .gitlab-ci.yml
1 test:
2
3 before_script:
4 - echo "Before script"
5 - hostname
6 - /bin/bash cleanup.sh
7
8 script:
9 - echo "Main Script"
10 - node server.js &
11 - sleep 3
12 - /bin/bash simpletest.sh
I have a single job called test.In before_script it runs cleanup.sh script which simply kills PID listening on port 3000 in case such found.
cat -n cleanup.sh
1 #!/bin/bash
2 count=`netstat -anp|grep ":3000"|grep LISTEN|awk '{print $NF}'|cut -d\/ -f1|wc -l`
3 if [ $count -ne 0 ]
4 then
5 pid=`netstat -anp|grep ":3000"|grep LISTEN|awk '{print $NF}'|cut -d\/ -f1`;
6 echo "Need to kill PID $pid";
7 kill $pid
8 fi
9 exit 0
And under script: it runs node with my server.js, gives it 3 seconds to start and then runs test against it.This test will also take care of killing node PID after test is done.
So let's commit and check status of the build
And now let's change our server.js to output not "Hello World" but "HelloWorld", so there is no space in between. I expect that my test case will fail since it expects literally "Hello World". And it does fail.
This is the most simplistic CI use case I could come up with.
Now if based on the status of the test case you would like to deploy code to another environment - you would have to start using
stages
and
environment
So your .gitlab-ci.yml would turn into something like this (real working example)
cat -n .gitlab-ci.yml
1 stages:
2 - test
3 - deploy
4
5 run_test_case:
6 stage: test
7 before_script:
8 - echo "Before script"
9 - hostname
10 - /bin/bash cleanup.sh
11
12 script:
13 - echo "Main Script"
14 - node server.js &
15 - sleep 3
16 - /bin/bash simpletest.sh
17
18 deploy_to_production:
19 stage: deploy
20 script:
21 - echo "Run code here to do production deployment"
22 environment:
23 name: production
Which upon git push will succeed.On line 21 I simply ran echo, but this could be replaced with a script which will do push to your remote staging or production environment.

Apache Drill cannot connect to Zookeeper

I am trying to configure Apache Drill on my local machine in a distributed mode. For this, I have already installed Zookeeper on my machine using the following configuration in /opt/zookeeper-3.4.11/conf/zoo.conf configuration file (Here, sagar-pc resolves to my wlan0 inet addr):
tickTime = 2000
dataDir = /opt/zookeeper-3.4.11/data
clientPort = 2181
initLimit = 5
syncLimit = 2
server.1=sagar-pc:2888:3888
Zookeeper service runs successfully and after starting Apache Drill, it is able to create znodes as per the cluster ID given in the /opt/drill/conf/drill-override.conf file. Also, status check tells me that:
drillbit is running
Zookeeper output for Drill:
[zk: sagar-pc:2181(CONNECTED) 2] get /drill/drillbits1
cZxid = 0x4
ctime = Thu Dec 28 17:25:02 IST 2017
mZxid = 0x4
mtime = Thu Dec 28 17:25:02 IST 2017
pZxid = 0x4
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0
/opt/drill/conf/drill-override.conf file contents:
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "sagar-pc:2181"
}
However, even after following all these steps, when I try to run bin/drill-conf command in /opt/drill directory, it gives me the following error:
No active Drillbit endpoint found from ZooKeeper. Check connection
parameters?
And while checking the log file log/drillbit.out, I get a Null pointer exception.
Exception in thread "main" java.lang.NullPointerException
at org.apache.drill.exec.coord.zk.ZKClusterCoordinator.update(ZKClusterCoordinator.java:218)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:228)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:401)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:372)
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:368)
Versions:
Zookeeper - 3.4.11
Apache Drill - 1.12.0
Can anyone help in identifying what I am doing wrong here? I have taken help from these links:
Starting Drill in Distributed Mode - Apache Drill
ZooKeeper Getting Started Guide
#rusk Not sure if this is still an issue for you, but it seems that for any start up failure drill prints the same message in drillbit.out. It turns out to be that the actual failure reason is logged in log/drillbit.log. Once I resolved that error which was causing startup failure (in my case it was due to write permission to udf directory -- was owned/created by a different user at first), the service seemed to start okay. Hope this helps.

Why does aws code deploy throw “No passwd entry for user 'ec2-user'” error inspite of running everything as root?

Here are the error messages:
Here are the concerned files:
stop.sh
#!/bin/bash
pkill -f node
appspec.yml
version: 0.0
os: linux
files:
- source: /
destination: /var/www/
permissions:
- object: /var/www/
owner: root
mode: 777
hooks:
BeforeInstall:
- location: scripts/install.sh
timeout: 300
runas: root
AfterInstall:
- location: scripts/post_install.sh
timeout: 300
runas: root
ApplicationStart:
- location: scripts/run.sh
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop.sh
timeout: 300
runas: root
ValidateService:
- location: scripts/validate.sh
timeout: 300
runas: root
Here are the OS detail:
ubuntu#ip-172-31-2-33:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
I tried running all hooks as ec2-user 'ubuntu' as well, results are the same.
ec2-user is just default instance user. In all linux, you cannot execute any system command without first issue sudo
Fix :
#!/bin/bash
sudo pkill -f node
If you have successfully deployed the sam application in the instance before, the application stop script is actually running from last successful revision, could you check the application stop script in the last successful revision and it contains everything that you expected?
If the ApplicationStop is not right inside the last successful revision, you might want to set the --ignore-application-stop-failures option. You

Mongo replica set can't find primary

I found a tutorial to set up a mongo replica set using docker, my commands were
create network cluster
sudo docker network create curator-cluster
create a particular container named mongo1, map 27018 to 27017 inside and set name is rs0
sudo docker run \
-p 27018:27017 \
--name mongo1 \
--net curator-cluster \
mongo mongod --replSet rs0
my configuration,
config = {
"_id" : "rs0",
"members" : [{"_id" : 0, "host" : "mongo1:27017"},
{"_id" : 1, "host" : "mongo2:27017"},
{"_id" : 2, "host" : "mongo3:27017"}]
}
Eventually, I created 3 containers
5949826d5bb1 mongo "/entrypoint.sh mongo" 22 hours ago Up 22 hours 0.0.0.0:27020->27017/tcp mongo3
dcf37866dbb6 mongo "/entrypoint.sh mongo" 22 hours ago Up 22 hours 0.0.0.0:27019->27017/tcp mongo2
14202f76089f mongo "/entrypoint.sh mongo" 22 hours ago Up 22 hours 0.0.0.0:27018->27017/tcp mongo1
The result of sudo docker exec -it mongo1 mongo is
MongoDB shell version: 3.2.9
connecting to: test
Server has startup warnings:
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten]
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten]
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-09-22T10:24:29.655+0000 I CONTROL [initandlisten]
rs0:PRIMARY>
Look like I have a primary now, and tried to insert somethings on a container (say mongo1), MongoDBs can sync well.
Now I tried to connect to the set on bryan database with command (note 10.145.168.151 is my IP)
mongo --host rs0/10.145.168.151:27018,10.145.168.151:27019,10.145.168.151:27020 bryan
my result is
MongoDB shell version: 2.6.9
connecting to: rs0/10.145.168.151:27018,10.145.168.151:27019,10.145.168.151:27020/bryan
2016-09-23T16:46:18.819+0800 starting new replica set monitor for replica set rs0 with seeds 10.145.168.151:27018,10.145.168.151:27019,10.145.168.151:27020
2016-09-23T16:46:18.819+0800 [ReplicaSetMonitorWatcher] starting
2016-09-23T16:46:18.819+0800 changing hosts to rs0/mongo1:27017,mongo2:27017,mongo3:27017 from rs0/10.145.168.151:27018,10.145.168.151:27019,10.145.168.151:27020
2016-09-23T16:46:18.820+0800 getaddrinfo("mongo2") failed: Name or service not known
2016-09-23T16:46:18.821+0800 getaddrinfo("mongo1") failed: Name or service not known
2016-09-23T16:46:18.822+0800 getaddrinfo("mongo3") failed: Name or service not known
2016-09-23T16:46:18.822+0800 Error: connect failed to replica set rs0/10.145.168.151:27018,10.145.168.151:27019,10.145.168.151:27020 at src/mongo/shell/mongo.js:148
exception: connect failed
If I use Nodejs (mongoose), I get MongoError: no primary found in replicaset
I think the problem is getaddrinfo("mongo2") failed: Name or service not known so my question is how to fix this thing. Thank you
Sorry for the late reply, but I ran into this issue with different vnets and missing hostnames. I was under the impression if I connect using ips then the cluster would respond using ips. I was wrong. Even if you connect with ips the hostnames must be available.
But, if you want to, you can change the implementation from hostnames to ips (this is for all clients though)
1) connect to mongo cli
2) cfg = rs.conf()
you will see cfg.members[0].host as "hostname:27017" etc
3) for each entry do cfg.members[i].host = "ip(i):27017"
example: cfg.members[0].host = "10.0.0.1:27017"
4) rs.reconfig(cfg)
you should get the response:
{ "ok" : 1 }
now you should be able to connect. This has caveats, so make sure to think about the consequences (what if ips change etc)

how to monitor gearmand daemon by Monit?

So the configuration file for monitoring gearman server is:
set logfile /var/log/monit.log
check process gearmand with pidfile /var/run/gearmand.pid
start program = "sudo gearmand --pid-file=/var/run/gearmand.pid"
stop program = "sudo kill all gearmand"
if failed port 4730 protocol http then restart
from monit.log
[EST Nov 26 19:42:39] info : 'gearmand' start: sudo
[EST Nov 26 19:42:39] error : Error: Could not execute sudo
[EST Nov 26 19:43:09] error : 'gearmand' failed to start
but Monit says that process failed to start. Does anyone know how to make it work? Thanks in advance.
check process gearman_daemon with pidfile /var/run/gearmand/gearmand.pid
start program = "/bin/bash -c '/usr/sbin/gearmand -d --job-retries 3 --log-file /var/log/gearmand/gearmand.log --pid-file /var/run/gearmand/gearmand.pid --queue-type libsqlite3 --libsqlite3-db /var/tmp/gearman-queue.sqlite3'"
stop program = "/bin/bash -c '/bin/killall gearmand'"

Resources