Skip to content

ServerAdmins.NET

Stuff for Server Admins…

Okay, I’m lazy. I fully admit it. Want proof? Instead of writing up a huge long post articulating something with awesome analogies, I’m only going to talk about one command today.

Fuser.

Why?

fuser is awesome. Not awesome in a “run it and it fixes everything” way, but awesome in a “What in the hell is binding to this port??” kind of way. Two classic scenarios where this is handy…

1. Apache won’t start, “Can’t bind to port ::80″ or “Can’t bind to port ::443″, etc. This typically means something else is already tied to that port, and won’t relinquish it…

2. A security scan of your machine shows something funny running on port 6667… You didn’t start this or know what it is.

What to do now? Well you can sift through netstat output, but that’s, well, boring and slightly annoying.

netstat output

[root@vps ~]# netstat -anp |grep 80
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 3266/httpd
tcp 0 0 127.0.0.1:58725 127.0.0.1:80 TIME_WAIT -
tcp 0 0 10.10.10.10:2078 192.168.1.23:63024 ESTABLISHED 18088/cpdavd - acce
unix 3 [ ] STREAM CONNECTED 49222880 11574/dovecot-auth /var/run/dovecot/login/default
unix 2 [ ] DGRAM 6804658 14078/named

Okay now we see that 3266/httpd is running on 80. Then we do this to find the process..


[root@vps ~]# ps auxwww |grep 3266
nobody 3266 0.0 0.3 65704 3516 ? S 17:05 0:00 /usr/local/apache/bin/httpd -k start -DSSL
root 21665 0.0 0.0 6024 640 pts/2 S+ 18:51 0:00 grep 3266
[root@vps ~]#

Okay there we go…

Now for hacked systems, this could (and probably) is fully forged for a lot of remote shells. Going back to my previous post at http://serveradmins.net/ssh-on-nonstandard-ports-how-to-not-do-it/ which talks about priveleged ports, you could in theory have trojaned ‘ps’, top, etc masking that real process. It may *look* like httpd, but bound to a port like 23425… So dont’t trust that too much, but a bit on that in a second. :)

The fuser approach…


[root@vps ~]# fuser -n tcp 80
80/tcp: 3266 3267 3268 3269 3271 16078 18274
[root@vps ~]#

Oh look at that a list of all pids bound to that port. Nice, clean, to the point and easily parsable. fuser rocks. :)

Now a bit more about the masked processes… To run those down, here’s a quick tip. Forget ps/top and your other normal utilities, /proc/ is your friend here…

Proc looks like this on a linux box…

[root@vps ~]# cd /proc/
[root@vps proc]# ls -al
total 1
dr-xr-xr-x 78 root root 0 Jan 26 09:58 .
drwxr-xr-x 24 chrismm chrismm 1024 Feb 4 22:06 ..
dr-xr-xr-x 4 root root 0 Feb 5 05:05 1
dr-xr-xr-x 4 root root 0 Feb 5 05:05 11573
dr-xr-xr-x 4 root root 0 Feb 5 05:05 11574
dr-xr-xr-x 4 dovecot dovecot 0 Feb 5 05:05 11575
dr-xr-xr-x 4 dovecot dovecot 0 Feb 5 05:05 11576
...
...

These directories match the pids of the running process… So if you have something advertising itself as ‘httpd’ on port 234234 and you know it’s pid 3266, you’d just do the following…


[root@vps proc]# cd /proc/3266
[root@vps 3266]# ls -al
total 0
dr-xr-xr-x 4 nobody nobody 0 Feb 5 17:08 .
dr-xr-xr-x 78 root root 0 Jan 26 09:58 ..
-r-------- 1 root root 0 Feb 5 18:56 auxv
-r--r--r-- 1 root root 0 Feb 5 17:08 cmdline
-rw-r--r-- 1 root root 0 Feb 5 18:56 coredump_filter
-r--r--r-- 1 root root 0 Feb 5 18:56 cpuset
lrwxrwxrwx 1 root root 0 Feb 5 18:54 cwd -> /
-r-------- 1 root root 0 Feb 5 18:56 environ
lrwxrwxrwx 1 root root 0 Feb 5 17:10 exe -> /usr/local/apache/bin/httpd
dr-x------ 2 root root 0 Feb 5 18:49 fd
?r--r--r-- 1 root root 0 Feb 5 18:56 io
-r-------- 1 root root 0 Feb 5 18:56 limits
-rw-r--r-- 1 root root 0 Feb 5 18:56 loginuid
-r--r--r-- 1 root root 0 Feb 5 18:54 maps
-rw------- 1 root root 0 Feb 5 18:56 mem
-r--r--r-- 1 root root 0 Feb 5 18:56 mounts
-r-------- 1 root root 0 Feb 5 18:56 mountstats
-r--r--r-- 1 root root 0 Feb 5 18:56 numa_maps
-rw-r--r-- 1 root root 0 Feb 5 18:56 oom_adj
-r--r--r-- 1 root root 0 Feb 5 18:56 oom_score
lrwxrwxrwx 1 root root 0 Feb 5 18:54 root -> /
-r--r--r-- 1 root root 0 Feb 5 18:56 schedstat
-r-------- 1 root root 0 Feb 5 18:56 smaps
-r--r--r-- 1 root root 0 Feb 5 17:08 stat
-r--r--r-- 1 root root 0 Feb 5 17:10 statm
-r--r--r-- 1 root root 0 Feb 5 17:08 status
dr-xr-xr-x 3 nobody nobody 0 Feb 5 18:56 task
-r--r--r-- 1 root root 0 Feb 5 18:56 wchan
[root@vps 3266]#

Bam, there you go. cwd and exe are the things you’re looking for It shows you the dir it was spawned from (typically a users homedirectory on a shared hosting machine) and the full path/binary actually being executed (usually lame perl/php listeners)… Also the ./fd/ folder is kind of neat as it shows you all the open file handles tied up by that pid as well.

Anyway, /proc/ examination too, is for another day, I just wanted to ramble on about one of my favorite, neat little single use utilities that no one else seems to know about. fuser. Enjoy. =)

I’m feeling a bit lazy tonight, and wanted to get an update here, so for a bit I’ll show you a handy little tool to update your ports tree on FreeBSD. After that, I’ll show you the ugly, old method.

Quick and easy…

Newer versions of FreeBSD come equipped with the ‘portsnap’ utility. This, makes it *VERY* simple to update your ports tree.

For your first run, do this…

portsnap fetch && portsnap extract

This is going to grab a snapshot of the current ports tree, and simply extract it over your new tree, replacing *everything* as it goes. You should only run the ‘extract’ command the first time you run portsnap.

After that, you’ll want to run the following for any further updates…


portsnap fetch && portsnap update

Not only is this much quicker, it doesn’t overwrite everything. :)

If you want to use this in a cron’d task, you should use the ‘portsnap cron 1′ command. It should be noted the number appended to the end of this is the number in seconds that portsnap will randomize the start of the app from. For example, if you say ‘cron 2000′, portsnap will kick off *sometime* in the next 2000 seconds. The reasoning for this is for larger serverfarms. If you’re running that in cron on all of them and give portsnap a large window, it will keep them all from starting at the same time, loading the BSD servers and abusing your bandwidth. I used 1 in the command above as I wasn’t really looking to use that. :) Keep in mind this will only fetch the updates, you still need to update the tree afterwords…

A cron entry for this would look something like the following…


0 3 * * * root /usr/sbin/portsnap cron && /usr/sbin/portsnap update

So for normal, day to day operation once you’ve initialized your ports tree the following is what you’ll want to use and update.


/usr/sbin/portsnap fetch && /usr/sbin/portsnap update

Now, if you don’t have portsnap, you should use the following method to update your ports tree. We’re going to go oldschool with cvsup here.

First of all, let’s find our fastest cvsup mirror…


[root@R34 ~]# cd /usr/ports/sysutils/fastest_cvsup/
[root@R34 /usr/ports/sysutils/fastest_cvsup]# make && make install

This is going to install the ‘fastest_cvsup’ port… Afterwords, for the US locale, you can run the following to find your fastest cvsup mirror…


[root@R34 /usr/ports/sysutils/fastest_cvsup]# fastest_cvsup -c us
>> Querying servers in countries: us
--> Connecting to cvsup.us.freebsd.org [72.233.193.64]...
- server replied: ! Access limit exceeded; try again later
- time taken: 69.51 ms
--> Connecting to cvsup2.us.freebsd.org [130.94.149.166]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 27.19 ms
--> Connecting to cvsup3.us.freebsd.org [128.31.0.28]...
- server replied: ! Access denied
- time taken: 31.65 ms
--> Connecting to cvsup4.us.freebsd.org [149.20.64.73]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 55.77 ms
--> Connecting to cvsup5.us.freebsd.org [208.83.20.166]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 36.99 ms
--> Connecting to cvsup6.us.freebsd.org [64.202.113.190]...
* error: connect: Invalid argument
--> Connecting to cvsup7.us.freebsd.org [64.215.216.140]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 26.64 ms
--> Connecting to cvsup8.us.freebsd.org [216.165.129.134]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 6.23 ms
--> Connecting to cvsup9.us.freebsd.org [128.205.32.21]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 26.28 ms
--> Connecting to cvsup10.us.freebsd.org [69.147.83.48]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 54.01 ms
--> Connecting to cvsup11.us.freebsd.org [63.87.62.77]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 35.11 ms
--> Connecting to cvsup12.us.freebsd.org [128.205.32.24]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 26.86 ms
--> Connecting to cvsup13.us.freebsd.org [128.205.32.24]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 26.54 ms
--> Connecting to cvsup14.us.freebsd.org [216.87.78.137]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 34.63 ms
--> Connecting to cvsup15.us.freebsd.org [35.9.37.225]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 23.49 ms
--> Connecting to cvsup16.us.freebsd.org [128.143.108.35]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 23.47 ms
--> Connecting to cvsup17.us.freebsd.org [65.212.71.21]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 35.93 ms
--> Connecting to cvsup18.us.freebsd.org [128.205.32.84]...
- server replied: OK 17 0 SNAP_16_1h CVSup server ready
- time taken: 3026.06 ms

>> Speed Daemons:
- 1st: cvsup8.us.freebsd.org 6.23 ms
- 2st: cvsup16.us.freebsd.org 23.47 ms
- 3st: cvsup15.us.freebsd.org 23.49 ms
[root@R34 /usr/ports/sysutils/fastest_cvsup]#

Cvsup8 it is!

So now, let’s get our ports-supfile in place…


cp /usr/share/examples/cvsup/ports-supfile /root/

Now edit /root/ports-supfile and look for the following line…

*default host=CHANGE_THIS.FreeBSD.org

And modify it to read…
*default host=csup8.freebsd.org

Now, run the following to get *everything* up to date…


cvsup -g -L 2 /root/ports-supfile

Voila, you have an updated ports tree. :)

Hey there!

Today I wanted to focus on something that’s helping me do my job in a more efficient fashion. At a former workplace, I was responsible for ~200 high capacity webhosting machines, and a host of supporting machines. Back then, I was a huge fan of a management system mostly comprised of SSH Keys and a ton of bash scripts. It worked, quite well for the time, but if I could do it again, I’d go with a slightly more refined approach, which is what we’ll discuss today.

So, let’s get started. The first thing you’ll need is a working perl installation, a few devel libs and a handful of perl modules.

yum install gmp-devel
perl -MCPAN -e 'install Crypt::DH , Math::GMP, Net::SSH::Perl'

This is going to install the GMP math development libraries necessary for Math::GMP to compile. Math::GMP and Crypt::DH are prereqs for Net::SSH::Perl.

So once this is done, we can proceed. :)


#!/usr/local/bin/perl -w

use strict;
use warnings;
require Net::SSH::Perl;

#declare our login vars...

my $user = "root";
my $password = "SEKUREPASSWORD";
my $server = "localhost";

#Setup our SSH Connection...
my $ssh = Net::SSH::Perl->new($server,port=>22,use_pty=>1);

#Initiate out conneciton to the server...
$ssh->login($user, $password);

# Declare our variable for the request...
my $uptime;

# Run our SSH Command and retrieve the output...
($uptime) = $ssh->cmd("/usr/bin/uptime");

print "\n$uptime\n";

exit 0;

That’s a very basic/barebones SSH Connection script… If you have any questions or problems, please don’t hesitate to post in the comments. :)

Next up, we’ll go over a more complex variant of this script using subroutines and a few other nifty tricks. :)

Hey there!

I’m going to show you a few different ways to install Perl modules in a quick and easy way.

First up, the one liner. :)

perl -MCPAN -e 'install HTML::Template'
CPAN: CPAN::SQLite loaded ok (v0.199)
CPAN: LWP::UserAgent loaded ok (v5.834)
CPAN: Time::HiRes loaded ok (v1.9719)
Fetching with LWP:

http://www.stathy.com/CPAN/authors/01mailrc.txt.gz

CPAN: YAML loaded ok (v0.71)
Fetching with LWP:

http://www.stathy.com/CPAN/modules/02packages.details.txt.gz

Fetching with LWP:

http://www.stathy.com/CPAN/modules/03modlist.data.gz

Database was generated on Thu, 21 Jan 2010 20:40:30 GMT
Updating database file ...

Gathering information from index files ...
Obtaining current state of database ...
Populating database tables ...
.... snipped for brevity....
Running make install
Prepending /home/.cpan/build/HTML-Template-2.9-bALXdn/blib/arch /home/.cpan/build/HTML-Template-2.9-bALXdn/blib/lib to PERL5LIB for 'install'
Installing /usr/local/lib/perl5/site_perl/5.8.8/HTML/Template.pm
Appending installation info to /usr/local/lib/perl5/5.8.8/x86_64-linux/perllocal.pod
SAMTREGAR/HTML-Template-2.9.tar.gz
/usr/bin/make install UNINST=1 OTHERLDFLAGS=-L/usr/lib64 LDFLAGS=-L/usr/lib64 EXTRALIBDIR=/usr/lib64 -- OK
[root@vps ~]#

And there you go, quick and easy.

Now a lot of Perl modules are going to require other modules to be built, in which case, you’ll see something like this…

Writing Makefile for Net::SSH::Perl
---- Unsatisfied dependencies detected during ----
---- TURNSTEP/Net-SSH-Perl-1.34.tar.gz ----
Crypt::DSA [requires]
Convert::PEM [requires]
Crypt::RSA [requires]
Math::Pari [requires]
Crypt::IDEA [requires]
Digest::BubbleBabble [requires]
Crypt::DH [requires]
Math::GMP [requires]
Shall I follow them and prepend them to the queue
of modules we are processing right now? [yes]

Just go ahead and answer “yes” here, and let it continue… cpan *should* be smart enough to grab all of the required sources and build what you need, but sometimes, not so much. :) Perl modules are basically subroutines packaged into nice containers, ready to use. Now some of these require specific programs, libraries or even other perl modules to do what they do best. When a module has a large chain of dependencies and one of those fails, it can bring the whole show to a screeching halt.

For an example here, I’ll use Net::SSH::Perl, which happens to be what I use for a host of different things.

If you use the above listed one-liner to install it, i.e.,

perl -MCPAN -e 'install HTML::Template'

You’re going to end up seeing this…


Files=12, Tests=106, 1 wallclock secs ( 0.07 usr 0.04 sys + 0.40 cusr 0.14 csys = 0.65 CPU)
Result: PASS
TURNSTEP/Net-SSH-Perl-1.34.tar.gz
Tests succeeded but 2 dependencies missing (Crypt::DH,Math::GMP)
TURNSTEP/Net-SSH-Perl-1.34.tar.gz
[dependencies] -- NA
Running make install
make test had returned bad status, won't install without force

So, we have a dependency of Net::SSH::Perl that simply isn’t present. So let’s go ahead and get it installed…

On a RH Based system (CentOS/Trustix/RedHat Enterprise Linux), you can do the following…


yum install gmp-devel

On a Debian based distribution (Debian/Ubuntu, etc)

apt-get install libgmp-ocaml

On FreeBSD, I prefer prots builds personally, so let’s do the following…

cd /usr/ports/math/libgmp4
make && make install

So, now that you’ve got that taken care of, let’s proceed. :)


perl -MCPAN -e 'install Net::SSH::Perl'
...
...
Tests succeeded but one dependency not OK (Crypt::DH)
TURNSTEP/Net-SSH-Perl-1.34.tar.gz
[dependencies] -- NA
Running make install
make test had returned bad status, won't install without force

So, we need to build Crypt::DH… Apparently dependency handling isn’t too bright in this case. :) I’ll save you the trouble of the blow-by-blow here. We need to install Crypt::DH which depends on Math::BigInt::GMP. So, use your handy oneline skills, and get Math::BigInt::GMP installed, then do the same for Crypt::DH. You should now have a working Net::SSH::Perl installation. :)

You can specify multiple packages in the following way…


perl -MCPAN -e 'install Net::SSH, Term::ReadLine'

The other option, should cpan fail, is to just grab the module package yourself, which is typically a .tar.gz file, and perform the following.


wget http://search.cpan.org/CPAN/authors/id/T/TU/TURNSTEP/Net-SSH-Perl-1.34.tar.gz
tar -xzvf Net-SSH-Perl-1.34.tar.gz
cd ./Net-SSH-Perl-1.34
perl Makefile.PL
make && make install

That’s more or less what cpan is doing, except it will try to sort out requirements and dependencies for you (when it can).

So I hope you’ve learned a bit of something about getting Perl modules installed and running. If you have any questions, feel free to leave a comment. :)

Recently I’ve seen quite a large trend in customers that use alternative SSH ports. I like the idea behind this but as with most things, I don’t consider it a cure all.

Essentially, for those not in the know, when you have a public facing SSH Daemon on the standard port 22, you can just expect brute force attempts. It’s a fact of life. As is people using common usernames and common passwords. We have one issue, simply because we have the other.

Now, I’m fine with moving SSH to a different port, this avoids just about all types of standard SSH brute force scanning (with the exception of someone trolling the ports on your server and doing individual banner checks). My gripe is that people use non-priveleged ports for their SSH daemons. This, my friends, is an issue.

Linux has what’s known as “privileged ports”. These are ports from 0-1024 and what distinguishes these ports from the other 64512 ports you can bind to is that the linux kernel simply won’t let you open a socket on one of them unless you’re root. That’s it, nothing else. Any user can bind a socket on a port between 1025->64512 without an issue, but in that 0-1024 range, the kernel says NO.

Why? Well, security. With 1024 ports only bindable by the user root, we create a bit of a ‘haven’ for public facing services.

I’m a huge fan of analogies, so here’s one to try to illustrate the importance of privileged ports and more or less what they signify.

Let’s say your server is a Concert and the vendors, ports. At the concert, everyone wants to get their schwag, the T-shirts, the CDs, the posters, all the good stuff. Now the *quality* merchandise is inside the gates. The band can say “This stuff is quality and we stand behind it.” These shirts aren’t going to fall apart in three washes, posters aren’t going to be misprints, and the CDs not labeled with a sharpie. The vendors you buy this stuff from are your privilged ports. You know it’s authentic, good stuff.

Now, you go out to your car and pass the hippies selling hand made t-shirts out of a bag, posters printed at kinkos, and burned rips of the bands CD. It’s just not as good as the stuff the band stands behind, but anyone can open up shop out there in the parking lot. These, are your unprivileged ports.

So with basic services being under port 1024, you can gaurantee the authenticity of these services. That’s essentially it.

When server admins start binding SSH to port 9999, this raises an interesting security risk. Any standard user on the machine in question is capable of starting an app and binding to that port. The tricky part here, is getting the existing SSHd to drop it’s tie to the port so the users modified SSHd can attach to it. There are various methods of doing this, however I think the easiest way would be to symply synflood the port until the server operator triggers a restart. Given that your average hosting machine has a control panel that *will* still respond even if there’s a synflood on the SSHd port, it’s a easy enough to click a few buttons to restart SSH in order to see if that fixes the issue. Quite a few administrators have auto-restarts tied into their monitoring as well, so the server operator may not even know there was an issue with the daemon.

When that restart occurs, the trojaned SSHd is ready for it and grabs ahold of the port. Now, the fun starts. Basically we just have to answer with a standard banner and then force a downgrade of the SSH protocol to v1. After that, all of the authentication can be decrypted quite easily.

Now typically, Man in the Middle attacks through SSH are based around a separate node on the network ARP flooding with false information hoping to redirect an SSH session *through* the server with some form of hacked SSHd. They’ll then typically act in a passthrough state after downgrading the SSH connection to v1 in order to appear fairly seamless to the end user. In the above example, a hosting machine, a standard user account and a non-priv’d SSHd, you’re not going to be able to do successful passthrough as the trojaned daemon is going to be running as a non-priv’d user. You might be able to trick a remote user by throwing them into a sandbox, but that probably wont’ last too long. However they *will* have your password which can be retransmitted pretty quickly.

I’ll see if I can get a test environment setup for to be able to replicate this sort of behavior, but in the meantime there’s no shortage of information on implementing a MiTM attack via SSH on google. :) Stay tuned for part 2 where we’ll implement this in a test lab.

In closing, you should know what a privileged port is and you should now know to QUIT RUNNING SSHD ON NON PRIVILEGED PORTS. :)

Initially, as you may have noticed, this site was originally myself with several other admins trying to make a buck on what we do best. Just when we’d broken the profitability line, an awesome opportunity to work with one of the big names in hosting automation software popped up.

It wasn’t an easy choice by any stretch. Continue with fairly small margins on building my own business better and stronger? Take the opportunity to learn even more about hosting automation and work with some of the smartest brains in the industry.

I took B.

I learned a lot, running my own operation for those months. There’s no substitute for hard work, never turn down a job (no matter how small), and most importantly, never ever get comfortable.

So, that being said, I’d like to do something with this domain to help give back to the community a bit. In my day to day job, I run into some pretty crazy, one off issues that simply don’t have an answer on the interwebs. If I can document a few of these a week, in a proper, well worded and commented way, then I think that’s the biggest way that I can give back.

So with that being said, welcome to the new serveradmins.NET. :) There’s currently no services offered, or for sale here anymore, but hopefully I can help you along your way nonetheless. :)

-Chris

Here at serveradmins.NET, we’ve handled quite a few different tasks for customers in the past. Several of these have been automating day to day tasks in large scale hosting operations. The thought behind this is that if you can save a technician even two minutes on a task they day 60 times a day, you’ve just freed up two hours of their day to handle other things. Now if you apply that to an entire tech operation of 20+ employees, you start to see the advantages quite quickly. Two hours per day per employee multiplied by 20 employees is 40 hours of “free” time you just created. That’s a whole work week right there! Chances are that no matter how slick your operations run, there’s always an opportunity to do *something* better and this is where the experienced admin can step in.

This is the difference between an Admin and a true Senior Admin. A Senior admin has been in the industry long enough to see a good way to do things a bad way to do things and sometimes simply a better way to do things. A true Sr. Admin will be able to look at your operations from the top down, break down the individual components and analyze each one for weaknesses, make and prioritize a list and then act on it.

For example, at a prior clients site, we were brought in to streamline overall operations and “fix things”. We initially started off by looking at the public facing problems and digging down from there. After a bit of recon, we noticed that server restoration times were abysmal, server load averages were way too high across the board and a VERY high failure rate of machines. Way above the norm. This not only caused the obvious direct impact of un-happy customers and complaints, but quite a few side effects as well. Support technicians spent an inordinate amount of their time keeping customers happy. Admins spent way too much time watching server restorations. Billing had an insane spike in chargebacks and cancellations. Unaffected customers got caught up in the flood of support/billing requests and had their problem resolution time skyrocket. Loads were greatly increased on the backup servers on average, which meant normal backup operations for the non-broken machines went over into business hours which caused higher loads via i/o wait on the non-broken machines, etc.

We crafted our plan of attack by looking at the most frequent cause of full server crashes. In this case it was that there was no monitoring on any of the disk arrays in the hosting machines. One drive would fail and the machine would keep operating and then 6-7 months later, the next disk would go causing a full crash of the server and loss of all data. We audited the hardware, did a large scale sweep for broken arrays and array status and found quite a few alarming issues. At least 9 machines with a single failed disk in the array, 3 machines operating on raid 0 raid arrays and several machines with no raid and ailing hardisks! These were all catastrophes just waiting to happen so that’s where we started. Disks were replaced, NRPE RAID checking was put into place so we could be informed of drive failures and act on them immediately. One fix for quite a few problems.

Where I’m going with that example is that you should always be aware of not only the obvious effects a problem manifests, but all of the other problems that stem out from there. After that single fix, we moved onto server load and capacity guidelines, then onto properly defining what an ‘abusive’ customer was and putting a stop to that, etc.

These were all aspects of the tech side of things. For an operations perspective, let’s look at another problem we tackled. We noticed during the initial recon that there were thousands of suspended accounts on the machines. After asking around a bit, we discovered that old account removal was done by hand by an outsourced support staff that was paid monthly for every server they supported. These accounts had been building up for years, eating up backup server space, live production server space and preventing the re-use of pre-existing machines. With a few small shell scripts, we were able to fully remove cancelled, non-pay etc customers on a nightly basis without any manual intervention. A *very* simple fix that freed up a tech from “pruning” the servers only when they ran out of disk space, allowed thousands of new accounts to be placed on existing hardware dramatically cutting new hardware costs for several months and lowering new equipment costs for the entire timeline of the company. Small fix, big savings from a money standpoint.

Domain registration was another issue at this company. A high volume shared hosting operation was registering every domain by hand, one at a time. Sometimes signups exceeded 7-800 a day! This required several techs to do nothing but sit there and register domains all day which if you ask me is a pretty thankless job. With a simple modification to their Ubersmith instance, that was fully automated and 2-3 techs were freed up to alleviate workloads from the main support and phone support systems.

When you handle anything on a large scale, the best fixes are usually simple fixes. As I mentioned at the opening of this blog article, if you save a tech 2 hours a day and then scale that across a week, then a month then a year, you can start to see the cost savings this will net you.

We’re in an industry powered by incredibly intelligent people trying to show their skillset. All too many times I’ve seen an admin spend hours and hours of his day crafting a super complex fix to a simple problem as opposed to taking the quick fix and moving on. Sometimes an issue *does* require a very well crafted complex solution, but in my time doing this 95% of the issues that assault a company are very simple workflow or operations problems that tend to compound and build on eachother.

I hope you can take something from this and apply it to your operations. Remember an hour a day adds up pretty quickly. :)

In our first installation in this blog, we went through a basic XCache installation with CentOS, Apache and PHP 5. Now, we’re going to look into how to make it work for your personal installation.

Much like every other tutorial out there, if you explicitly follow the instructions you find on the internet, you’re usually going to run into problems. One of the biggest problems I see in systems configurations, is getting someone that “tunes” the machine by copying and pasting configs straight from a forum or blog. Sure, it works, but that doesn’t mean it works *good*. It could also cause many more problems that it solves.

The best way to tune a machine will always be understanding what you’re doing. And that’s what we’re going to try to do here today with XCache. So, let’s take a look at that xcache.ini file we setup in the last post.


[xcache-common]

zend_extension = /usr/lib/php/modules/xcache.so

[xcache.admin]
xcache.admin.enable_auth = On
xcache.admin.user = "admin"
; xcache.admin.pass = md5(a5851d1f3a3cff5a42b3163a7c45e5ae)
xcache.admin.pass = "blahblahblahblahblah"

[xcache]
; ini only settings, all the values here is default unless explained

; select low level shm/allocator scheme implemenation
xcache.shm_scheme = "mmap"
; to disable: xcache.size=0
; to enable : xcache.size=64M etc (any size > 0) and your system mmap allows
xcache.size = 32M
; set to cpu count (cat /proc/cpuinfo |grep -c processor)
xcache.count = 2
; just a hash hints, you can always store count(items) > slots
xcache.slots = 8K
; ttl of the cache item, 0=forever
xcache.ttl = 0
; interval of gc scanning expired items, 0=no scan, other values is in seconds
xcache.gc_interval = 0

; same as aboves but for variable cache
xcache.var_size = 0M
xcache.var_count = 1
xcache.var_slots = 8K
; default ttl
xcache.var_ttl = 0
xcache.var_maxttl = 0
xcache.var_gc_interval = 300

xcache.test = Off
; N/A for /dev/zero
xcache.readonly_protection = Off
; for *nix, xcache.mmap_path is a file path, not directory.
; Use something like "/tmp/xcache" if you want to turn on ReadonlyProtection
; 2 group of php won't share the same /tmp/xcache
; for win32, xcache.mmap_path=anonymous map name, not file path
xcache.mmap_path = "/tmp/xcache"

; leave it blank(disabled) or "/tmp/phpcore/"
; make sure it's writable by php (without checking open_basedir)
xcache.coredump_directory = ""

; per request settings
xcache.cacher = On
xcache.stat = On
xcache.optimizer = Off

[xcache.coverager]
; per request settings
; enable coverage data collecting for xcache.coveragedump_directory and xcache_coverager_start/stop/get/clean() functions (will hurt executing performance)
xcache.coverager = Off

; ini only settings
; make sure it's readable (care open_basedir) by coverage viewer script
; requires xcache.coverager=On
xcache.coveragedump_directory = ""

The first thing we’ll look at is the xcache.count variable in the xcache.ini file. To put it simply, think of every one of these caches as a bucket. In these buckets you have marbles. Now, I’m going to hand you 200 marbles, all of different colors and dump them into your buckets, dispersed evenly. After that, you have to find the blue marble in your bucket(s). Now, if you have 8 buckets, each with a helper to look through it, you’re going to find that blue marble FAST. One bucket and one person searching, yeah, grab a snickers, it’ll be awhile. :)

In this analogy, the buckets are the caches, and the marbles are your PHP scripts (after they’ve been compiled and stored in memory). When your webserver gets a request for blah.php, it hands off to your PHP Handler (whatever that may be), which first references the xcache extension before trying to actually compile and serve the script. Your limit on buckets is the number of processors you can have that can execute an instruction at the same time. So, as the xcache.ini says, cat /proc/cpuinfo | grep -c processor will give you how many xcache queues you can have at once. Personally, I always subtract at least one proc in multicore/hyperthreaded setups, simply because there’s always more going on with your machine than processsing PHP/XCache requests. The reality is that you’ll *never* have all of your procs handling that request at the same time. Save some CPU cycles for other stuff. :)

Next up is the xcache.slots variable. This is basically a big hash table, or index of the items stored in your “buckets” from above. To visualize it, think about it as if you had a string tied to each marble in the above scenario, with a tag that says “blue, green, red”. The bigger the hash table, the more strings you have to tie to marbles. You’ll notice quicker seek times for the precompiled opcode by increasing this variable. Each “bucket” can hold a lot of marbles (opcode) and this is just a way of finding those pieces of opcode even faster. The hash tables don’t take up too much memory, so if you throw 32-64k at it, you’ll probably be good. Play around and see. :)

So, that’s how the how and why of your xcache.count and xcache.slots variables. Don’t set it too high, Don’t set it too low, set it just right.

Now, your next most important piece is going to be the Size. This is the total amount of ram you provide in the xcache.size variable. The sole purpose of this variable is to allocate a specific amount of memory strictly for storing pre-compiled PHP scripts. For the most part, these don’t take up much space in memory at all (as you’ll see), so throwing a bit of ram at this is a VERY important item.

If your site is live, and you’re serving out PHP, then you’ve already started collecting the necessary statistics to judge if you’ve allocated enough memory here. The two biggest tells are the “Avail” and “OOMs” columns. If you have 0.00M available in all of your queues, then you’re probably seeing increasing numbers in the OOMs column. What this means is that XCache has stored so many compiled PHP scripts in the chunk of memory that you gave it, that it just ran out of room. When this happens, any PHP requested by the webserver that’s not already in the cache is going to be interpreted normally. No Caching. This is bad. :)

In order to tune this variable correctly, throw a bit more memory than the initial 16M we threw at it in our initial config. Go ahead and double it, I’m sure you can justify the 32M of RAM here. :) Restart apache for the changes to take effect, and watch again. Does it fill up? Any OOMs? If not, just let it go and run for a few hours. You’re looking for the sweet spot to where you always have a bit of Available memory, and not hitting any OOMs, that’s when XCache is working at peak efficiency.

So, let’s say you have a HUGE site. Thousands upon Thousands of PHP scripts, all that need compiled and cached. You have the option of playing the ante up game, or looking down the config just a bit. :) The two next most important options as I see it are the xcache.ttl and the xcache.gc_interval config options.

These main goal of these two variables is to 1. Set an “expiration date” on your compiled code and 2. Clean it out of the memory after it expires. Both of these options are in seconds and disabled by default. In a perfect world, you could fit your entire website, post-compilation into 32-64M of ram and would never have to flush it. However, this isn’t a perfect world. If you’ve thrown all the memory that you can safely dedicate to this that you can, then this is the next options you should tune. The tuning here truly depends on how busy your site is. If you have 1000 PHP scripts, room to store 800 in memory and a busy site with fairly dispersed hits, then you might need to up the TTL to something like 60 seconds and th GC frequency up to every 4 minutes or so.

When an opcode cache is marked as expired, it sits in your queue until the next run of the GC (garbage collector, fwiw). The next time it is called, you’ll need to recompile it again and re-insert it into memory. So as you can see, you don’t want to set the TTL to low here otherwise you’ll see your box more loaded than it should be because it’s re-compiling PHP it shouldn’t be.

Now, the next issue isn’t so much of a performance concern (to a small degree I guess, but it’s negligible, really…) is the xcache.readonly_protection flag. By default this is set to Off. What’s going on here, is involving the mmap caching for the compiled php opcode. If set to Off, Xcache can modify files in memory rather than ditching the full opcode and recompiling if it needs to make a change.

The tradeoff here is security. With this set to On, you’re going to end up recompiling opcode for a slight adjustment, as opposed to making the adjustment. This also means that should there be some sort of hack or compromise that allows an attacker to modify any data in your caches (think someone editing your PHP on the server directly without your permission) that they simply can’t.

These updates don’t happen to often, hence the negligible performance hit. So please, enable this unless you really need to disable it. :)

With this in mind, you have the core essentials to XCache down. There are, of course, variables I didn’t discuss here, and as you see, I didn’t just give you a generic config to copy and paste. I’ve given you the information necessary to tune the heck out of your XCache installation.

Go forth and serve… (precompiled opcode. =P)

T-Mobile Hackers trying to sell Network and Customer data…

Wow.  That’s all I can say.  T-Mobile is hacked BIGTIME and from the looks of things, I’d say they knew about it.   Typically in these sorts of situations, the hackers will contact the company they just owned and try to buy their silence.  Now, if the company refuses, then that data gets shopped around.

Now, if there’s no buyer after that, the hacker isn’t just going to walk away, especially on a large scale hack like this.  They’re either going to A: Auction your data off to the highest bidder, or B: Publicly release it for the fame/glory/etc.

I think the biggest thing that concerns me about this, being a T-Mob customer and all, is that even after being contacted by the hackers (assumed at this point), I’ve seen ZERO notice from T-Mobile about this.   Given the scale of their operations, and the control they have over your private data, this is quite concerning.

Let’s think about this for just one second, here’s a bit of info T-Mobile has on you…

  • Full Name
  • Home Address
  • Phone Number
  • Answers to private security questions (commonly reused by people from site to site)
  • Social Security Number
  • Birthdate
  • Credit Card Number
  • Credit Card Expiration Date
  • Credit Card CCV Value (possibly, vendors aren’t supposed to store, but you never know)
  • Billing address if it differs from your Main acct address.

Now, that’s just the billing info alone, and if they hackers do have root access on the machines in that URL above, which contains quite a few billing machines, we can assume they have this data in some form.  Let’s look at the other data they have on you…  Here’s where things start to get even freakier than simple credit card fraud and identity theft potential of the situation.

  • Your Phone Number
  • Your Phones IMEI (numerous repercussions from this)
  • Your Call History (Inbound and Outbound)
  • Radio Tag Number
  • GPS Tag (Yes, your phone has a GPS/Cell location unit.   Yes it can be used to track you without your knowledge)
  • Text Message History (inbound and outbound)
  • Email History
  • Access to your cam-phone pictures (most phones upload and store these online now)

Now, the above list is all stuff that your cell provider logs and tracks.  We know this, it’s public knowledge, etc.  Let’s go ahead and put on the SUPAR BIG tinfoil hat now…   With the advent of government pushes into call logging/tracing/information tracking, we know for a fact that several of the big telcos already record phone calls, fully log all data communications, and have active taps on all of this information.

As more or less of a mental exercise, what do you think the repercussions from this hack look like now?   How far did the intrusion go, and what is the extent of the data turnover?

Spooky, isn’t it.  The even spookier part is that I doubt we’ll ever get a public acknowledgment from T-Mobile regarding this intrusion, any turnover of customer data, nothing.

If you don’t want to end up like T-Mobile, it might be time to look into a proper security audit of your network.
serveradmins.NET

PHP has become the de-facto standard for web applications over the past few years.  Bigger and bigger applications are being developed for it every day, and with the advent of news-networking sites such as digg.com and slashdot.org, heavy traffic can happen with no notice whatsoever.

One of PHPs downfalls in this sort of situation is that it’s an interpreted language.  This means that every time your webserver gets a request for a page written in PHP, it has to compile that individual page, before it can then process it and serve the results to the user.  Most of the time, this happens in milliseconds, hardly noticeable at all.  However under high load situations, several processes are spawned for every web request, memory is consumed, apache handlers are held open, etc.   Things can get ugly pretty quickly.

Enter, XCache.  XCache was developed by the lighttpd team, and is ported over to work with just about any webserver.   What XCache does, and why it’s a beautiful thing, is that it stores a copy of the PHP page post-compilation in memory and then serves that to the next user that requests the same page.  Not only do you gain the benefit of a ‘compile once, serve many times’ application, the page is also being pulled straight from memory.  *WAY* faster than serving it up off of your disks.

In this post, I’m going to show you how to install XCache on CentOS and Apache with PHP 5.  However this is easily ported to other Operating systems very quickly.

So let’s get started, shall we?

The current build of XCache is 1.2.2, and can be downloaded from http://xcache.lighttpd.net/pub/Releases/1.2.2/xcache-1.2.2.tar.gz .   So, SSH into your server and do the following

cd /usr/src; wget http://xcache.lighttpd.net/pub/Releases/1.2.2/xcache-1.2.2.tar.gz
tar -xzvf xcache-1.2.2.tar.gz

This is going to unarchive the xcache installation in your /usr/src/ directory.   Next we need to go in there and get it ready to build for our particular PHP installation.

cd xcache-1.2.2

It should be noted here, that you’ll need the PHP-Devel RPM installed.
To install it, just enter the next command. 
yum -y php-devel

Now, let’s prep and build our xcache install.

phpize --clean && phpize; ./configure --enable-xcache
make && make install

There, Xcache is installed as a PHP extension now!  Let’s go ahead and setup the configuration for it.

cat xcache.ini > /etc/php.d/xcache.ini

Now, open up your favorite editor of choice and let’s make a couple of changes here…

put a ‘;’ in front of all of the zend_extension lines that don’t already have one.  In PHP, this comments out that line so it’s ignored.

add in

zend_extension = /usr/lib/php/modules/xcache.so

Now, change the xcache.admin.user line to read

xcache.admin.user = "admin"
xcache.admin.pass ="yourpassword"

The above password needs to be an MD5 Encrypted version of your password. You can generate an encrypted version of your password by doing the following from the commandline.

md5 -s "YOURPASSWORD"

It’ll look something like this…
Maynard:~ chrismm$ md5 -s "blah"
MD5 ("blah") = 6f1ed002ab5595859014ebf0951522d9

xcache.size = 16M

xcache.count = 2    ; (this is the number of real processors in your machine.)

Now save your document and type ‘php -v’, you should see xcache mentioned in the output.

[root@omega xcache-1.2.2]# php -v
PHP 5.2.9 (cli) (built: Mar 25 2009 13:18:19)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2009 Zend Technologies
with the ionCube PHP Loader v3.1.34, Copyright (c) 2002-2009, by ionCube Ltd., and
with XCache v1.2.2, Copyright (c) 2005-2007, by mOo

Now, let’s go ahead and setup the web interface for Xcache. This isn’t necessary, but it’s a good tool for tuning your Xcache installation.

cp -pR ./admin/ /var/www/html/ ; (or wherever your website is)

This creates a /admin/ folder off of your site. Now you should be able to go there in a webbrowser, and enter the username/password
that you entered into the xcache.ini.

Voila! You’re now caching all of your PHP code. Expect a MUCH lower serverload, and much faster page serving. My next post on this blog is going to be about tuning your XChache instance. A base install doesn’t do you much good unless you know how to tweak it!