NONparametrics

geekdom without assumption
NONparametrics

MySQL Data Footprints: How Big are Your Tables?

If you work on the architectural end of things (or if you’re just curious), it can be extremely helpful to see what kind of footprint your data tables actually have in real use. In testing, it can be difficult to visualize the big-picture impact of design decisions such as:

  • How much space should I allocate for a field? (eg, INT vs BIGINT vs TINYINT, TEXT vs VARCHAR(65,535))?
  • How much of my total storage will be taken up by table X ?
  • How much impact does each additional row in table X use?
  • What difference does the engine make in the storage size of a table?

To that end, I wrote a simple view that utilizes the information_schema tables to give you answers to some of these questions. Feel free to post improvements on this in the comments–it’s a pretty rough tool.
Read more

Back to the classics: rsync

If you haven’t read my post about using RAID1 with an external USB drive, don’t bother. I followed the steps as laid out in the guides that I had found, but a couple hours of wrestling with it ended with the realization that booting Maverick on a degraded RAID array with encryption on top just wasn’t going to happen. With a little more consideration, I realized that I would still need a backup solution, since RAID doesn’t mitigate the risk of file corruption or accidental deletion. Aside from that, it would be awfully nice to have snapshot-style history, in case I made an erroneous change and overwrote something important.

Read more

GnuPG, PHP, and Ubuntu

One of our upcoming projects at work is killing our old session management in favor of a RESTful Authentication and Authorization application to serve our Service Oriented Architecture framework (a mouthful, isn’t it?). A key piece of this authentication server relies on a single cookie that contains signed data. Given our bent toward open source–among other things–Gnu Privacy Guard seemed a natural choice for the encryption. We already make use of Mcrypt for storing sensitive information in our database, but that doesn’t seem to have a certificate handling and signing facility.

Unfortunately, there isn’t a maintained php5-gpg package for Ubuntu, so we turned to pecl (php extention community library). Building the module was pretty straightforward once I figured out the names of the Ubuntu packages that it depended on. So in three easy steps, here it is:
Read more

Timezone fun with MySQL and UTC

Since our company has timeclocks and users all over the country (and world, for that matter), we switched pretty early on to using UTC exclusively in the back-end. The views handle spitting out the correct representations respective to the users’ timezone, but the PHP is always thinking in UTC; naturally, that means that MySQL needs to think in UTC as well. MySQL defaults to using system time with no timezone data loaded. My understanding is that they do this on purpose since our wonderful legislators worldwide have a habbit of changing the timezone information (daylight savings, etc.) and there isn’t a real clean way of managing that for MySQL server admins.

Read more

Digital assets for the ultra-mobile

It seems like I always have at least three computers that I am using at a any given time: a laptop (by far the most used since I travel,  have no desktop at home, and even move around the office a lot), a desktop (faster than the laptop for testing and/or staging), and a netbook. As you can imagine, my personal trove of digital possessions is of some magnitude–no less than 280gig with all of old stuff gziped. Naturally a good portion is music and photos. The problem is lugging around this mess of files between the various machines I’m actively using.

A few months ago, the issue of synchronicity (or lack there of) caused me to abandon the desktop altogether in favor of the slower laptop for all things. To be fair, the laptop has more memory and a faster processor, but the big bottle neck is the single, slow spinning disk. If I had a few grand to drop on a new SSD, I might be able to prune some of my files and get the speed, but the fact is that I don’t. Aside from that, I still have the issue of maintaining good backups, so I need to rsync that to a usb drive/network drive so that if the laptop is lost/destroyed, I’m still in good shape. I’m not worried about someone else getting the data, since it’s all LUKS encrypted, but I do want to make sure I still have it.

While walking home from work the other day, I lamenting the fact that only one disk slot in a laptop meant no RAID. It struck me that maybe there was another way. At one point, a machine I used had an eSATA port–although the one I have now doesn’t–and there’s always USB. But in either case, the drive would be disconnected part of the time: either when traveling light or in the absence of AC. Getting back to the apartment, a little Googling found a post on Ubuntu and hotswappable RAID. Someone already thought of this, and lo and behold, it’s been possible in Ubuntu since 9.04!

The instructions listed on the page even mention the LUKs encryption with LVM, so I’m off to the races to try it out. I bought a 7200rpm notebook drive with a 16MB cache for $75 to replace the 5400rpm  8MB one that my laptop shipped with, and am now in the process of moving files onto a temporary storage drive to clear my USB disk of the same size for being the hotplug drive. If I understand it right, I can lose either and boot the other immediately. I could have my laptop disappear, and–without skipping a beat–plug my mirror drive into one the many desktop machines at the office to be working again in minutes. My fingers are crossed.

Lightscribe on Linux!

Being the idealist that I am, naturally linux is my operating system of choice. In fact, I don’t remember the last time I booted something else on one of my machines. Maybe minix, a couple years ago. At any rate, there is a price to pay for jumping off of the M$ bandwagon, and most of the time that comes in hardware support. Our monopolistic software giant likes to encourage hardware vendors to support only a certain, popular OS (for the dis-believing, check out this article on GrokLaw, or at least check out the paper it mentions.). The result is many devices that end up having to be reverse engineered, usb snooped, or other in order to force them into penguine submission.

Laptops are the worst for this, since there are more vendors turning out more new crap…. For example, my HP Pavailion dv6-1230us; the HDMI port is tempermental, and the sound card fails to run at all in Ubuntu 9.04. I’m currently upgrading to the Beta 9.10 with high hopes, but I’ll be sure to let you know. Another feature this machine happened to have is LightScribe–a feature I just assumed would not be usable. On a whim, I googled “ubuntu lightscribe” and was thrilled to discover a page detailing the 5 easy steps to get it running on Ubuntu. It’s a pretty stripped, simple interface, but hey–it works!

You can find the debs or rpms pretty easily on the lightscribe site if you need them for another distro.

Project madness (~management)

If you’re anything like most developers I know, then time management, project management, and organization are not among your most natural strengths. Thank goodness Wikipedia has a list of project/ticketing systems to ease the pain; if only it wasn’t so long that it takes several weeks to read, let alone evaluate.

Caught in the balanace of wanting to get something good and wanting to just get something soon, I followed a recommendation on a forum to try out trac. Trac is an open source, “Integrated Project and SCM Management” system. After digging around in a couple of demos, and reading a few reviews, I decided to jump in and give it a shot. The features that sold me: 1)the entire thing functions like a wiki; there is an explict wiki built in, but then all milestones, tickets, and revisions (see next feature) can be written in wiki formatting and accessed via internal wiki links. 2) Subversion and git integration; you can review the timeline, and see change sets or entire source files right in the browser! 3) plug-in extensibility; I immediately added plugins for finer-grained access control, and account management. I also tried out a time estimation plugin, but decided that it has a ways to go before it is truely useful.

A couple months into it, I’d say it’s been well worth while. The interface has been well done, including the ability to pull ticket reports based on which side of the milestone progress bar you click (very cool), and it’s saved us a lot of headaches by concentrating and connecting our internal documentation with our tickets and our source code. A couple of things I would love to see in the future are out-of-the-box ldap support and the ability to nest tickets. I really can’t complain though, given that it beats out a lot of pricey solutions, and it took next to no time to connect our geographically distributed teams up. Quite a few well-known FOSS projects have adopted it, including Piwik, the WordPress Plugin Directory, and Browsershots (there is a decent list on the trac site).

Interface tuning

Alyssa sent me a really cool article on interface design today. I resonate with this one particularly because simplicity in software and SaaS (Software as a Service) is one of my standard soap-boxes. I believe that a large part of the gap between techies and the real world can be attributed to the complexity that we blast the poor non-techies with–real or perceived. I definitely recommend that you read it, if you have the time.

The article mentions the 80/20 rule of feature use, and that got me to thinking of something I’d seen in the Piwik plug-ins directory: clickheat. I looked it up out of curiosity, and not ten minutes later I had a copy running, logging clicks on my development system. Our system is resource based (having several hundered individually assigned/authorized resources), and each user is presented with a menu tailored to the features that they have access to. That gives us a pretty good edge on keeping things simple, but even so I know that there are well-intentioned features that can get in the way; as such I’m not about to turn down any intelligence available to me.

Clickheat–like it sounds–logs all the clicks on whatever page you add it to, and then does some geometric math to overlay a heat pattern on a copy of the webpage; the result? A graphical representaion of what users are doing on any given page, overlaid on the actual page that clickheat fetches with javascript.

A "thermal" map of clicks overlaid on the page

A "thermal" map of clicks overlaid on the page

To be perfectly candid, it’s not the easiest thing in the world to figure out, but it sure wasn’t bad. You have to do a couple of simple things (like setting a site name in the javascript tag before pasting it into your html, and deciding how to group pages for logging) and the documentation leaves much to be desired, but all in all it was worth a few moments that it took to install. It will be great when this tool matures a little and possibly adds a database back-end (right now it’s all file logging), but in the mean time it will still provide interesting use information to help us reduce the effort for our users to find and access what they really want: that golden 20.

Analytics without ‘the Man’

One of my buddies turned me on to an awesome new tool a few months back. If you’re not familiar with analytics, then you owe it to your geekiness to get in touch. Analytics allow you to see ( with non-personally identifiable information ) how–and how much–your site is being used. It will tell you geography of users, bounce rates, how people find you, and what pages they look at/files they download. This may sound a bit creepy at first, but the practice is already widely in use, and the information can prove invaluable when making decisions on changing and improving your site.

If you already understand the value of analytics, then you likely also know that Google is the modus operandi for the majority of webmins: mostly due to the lack of viable alternatives. Now, I’m all about the innovation, convenience, and especially anti-Microsoftism that Google brings to bear. That being said, they have more information about more people….and “Don’t be evil?” Really?

Enter a new option, Piwik. Piwik offers all of the basics (referrers, search engines with keywords, page counts, etc.), but what’s more, it’s all open source MVC. In other words, anything it lacks, you can add. It has an easy modules and hooks design so that your MVC plug-ins can be displayed in the eye-candy, flash-based graphing widgets–it took me all of five minutes to drop in the GeoIP plugin. The API is very clean for dropping calls into other apps as well. Installation couldn’t be easier, requiring only a database user and php5 with the mysql module.

I’m now using Piwik on pretty much all the pages I host and a few others I have a stake in, or as favors for friends, so I know what you’re looking at =). Don’t worry, I’m just using it to improve things.

Tech for the techies

After a long and ardorous battle to claim the time required for it, I have decided to make use of my long-parked domain, nonparametrics.com. The intent is to migrate all of the techie posts from my personal blog, bryanjeffrey.com, over to this site and allow readers to abstain from the torturous shop talk that has been its mainstay. I’m starting without the flashy style and design, since Google (and most readers) could care less about the look as long as the content is there. At some yet-undetermined time in the future, I may get to that. No promises, but I’m hoping this will help with some of my prior blogging inhibitions, and both will see some action henceforth. Happy surfing.

 Page 1 of 2  1  2 »