Version 16 (modified by ajj, 6 years ago) (diff)

UTK Server Admin Notes and History

To Do

  • Check permissions on major services etc on machines. Since the upgrades, various userids changed (e.g. postfix userid) and thus access to necessary directories might not work.

Admin Activity

2018-10-30

  • Starting work for migration plan:
    • Paul Butler to install new boot disk + 4Tb spare from danse1 into danse2 on 5/11/2018
    • In preparation, need to clone /dev/sdb to /dev/sdc
      • First, copy stuff from /extra into a directory under /home
      • Copy danse2 backup material from /backup to directory under /home on danse1
      • Clone /dev/sdb to /dev/sdc via mounts of partitions and rsync

2017-10-27

  • At Code Camp in Copenhagen …
  • Almost certainly has been admin activity since a year ago … probably should have been documented.
  • Adding mail server and mailman for sasview.org and canvas.org : MailmanConfig

2016-09-07

  • Upgrade of danse2 to Ubuntu 16.04
    • Worked

2016-07-03

  • Updated postfix configuration to only listen on localhost. Had noticed lots of (blocked) attempts to send spam via danse.chem. Hopefully this will sort it out and not break delivery from services running here.
  • fixed ntpd failure on danse.chem.utk.edu see update to #274

2015-11-18

  • Replaced batteries in UPS

2015-01-11

  • Cleaned out trac spam

2015-01-26

  • First commit requiring build in a month had same problem of getting stuck on "archiving artifacts". Stopped and restarted build manually from Jenkins and new failure: Slave not starting
    • Jenkins would log into VM and sart slave but then "waiting for Phase 2" retries timed out (max of 11 it seems)
    • rebooted VM machine did not fix problem
    • Looking at event logs a warning was being raised regularly on starting a child java process but could not tell what the problem was
    • Looking at loaded services slave was loaded but no running. started it but it stopped immediately
    • started the slave manually from command line. It started but then windows firewall popped up that it was blocking something - clicked unblock and then everything ran
    • after completing the build I rebooted the machine for good measure.

2014-11-01

  • Ongoing problem with windows builds via jenkins getting stuck on "archiving artifacts"
    • tried to update slave-agent.jnlp and that didn't work
    • renamed C:\Jenkins to C:\jenkins-old
    • deleted slave from jenkins master
    • re-installed java on slave
    • recreated slave with new name on master
    • re-installed slave-agent.jnlp
    • re-connected build jobs to new slave
    • ran builds with success.
    • Hope that updating to latest slave agent will fix problem with stuck windows builds. Needs monitoring
  • Note that danse.chem seems to have a clock that is running fast and ntpd is not updating it. Needs looking at see ticket #274

2014-10-30

  • Updated danse and danse2 (apt-get update/upgrade)

2014-09-27

  • Problem with ssh to danse machines after reboot following updates.
    • For danse.chem seems to have been a problem with fstab (AJJ fat fingers most likely cause) which prevented the reboot after updates.
    • Unable to ssh to danse2.chem even before update
    • Bill Gurley rebooted machines on 2014-09-29 and solved problem.

2014-09-09

  • Installed bumps on danse and danse2 to fix Ubuntu builds
  • Could not install on WinXP vm due to compiler issues.

2014-08-27

  • Problems with the windows build not running from jenkins
    • corrected time on danse.chem and this seems to have fixed the failure of the build to start.
    • The time servers used on ubuntu 10 (danse.chem) seem to no longer exist. Updated ntp.conf to use the same as ubuntu 12 (danse2.chem)
    • Build now working again
  • updated danse2.chem (apt-get update/upgrade)
  • updated danse.chem (apt-get update/upgrade)