Movin’ On Up

June 22, 2007

It’s official.  This blog has moved to http://benjaminste.in

Please update your feed URLs accordingly and I’ll see you there!

Advertisements

Why Computer Programming is Hard II

June 7, 2007

I’ve written in the past about why computer programming is hard.  I think at the time I blamed invisible things like .files and \r\n.

A client sent me the following string of text to include in his application:

“We’ll add it to our database”

What’s wrong with that String?  Can anyone tell?

Here’s a hint.  Compare it to this one:

“We’ll add it to our database”

Exactly!  The single quote in we’ll is a curly quote in the first sentence!  I’m guessing the client probably uses Microsoft Outlook and it automagically makes straight quotes into curly quotes.  Usually not such a big deal until you start encoding the string into other character sets and wondering why the world is broken.

That is why programming is hard.  That, and refactoring algorithms to take advantage of multicore processors.


Ruby ActiveRecord & Java Hibernate

May 25, 2007

This is the first in probably a bunch of posts about my experience transitioning from a Java mindset to a Ruby one.

So my first experience creating Hello World in Rails (which is really more like a fully functional blog application, complete with comments) really blew me away. The tiny amount of code required to get all your basic database operations was quite fantastic. A lot of the advantages come from the ‘convention over configuration’ and ‘sensible defaults’ mantra that the Ruby guys espouse.

There’s very little technical reason that can’t be done this simply using Java stuffs, so I set out to simplify my data access layer.

In Ruby, the process of creating database, ORM layer, and CRUD classes is as follows:

1. Create the database, either with DDL or a Rails migration

2. Create the ORM layer by creating an ActiveRecord class, aka simple mapping file
3. run generate/scaffold
4. Ok, that was pretty fast and easy

Ok, so I’m using the usual lightweight Java suspects: Spring & Hibernate. What I’m about to show took some work that you get for free with Rails, but I only had to write it once and I’m pretty pleased with the results.

I’ll start with the end product. Let’s say I want to create a Person class, with fields age and name. Also, let’s assume that I subscribe to the theory of class invariance, which assumes that an object is ALWAYS in a consistent state, thus eliminating the need to check for validity all the time. This is the number one aspect of Ruby (and javascript, php, etc) that I can’t get 100% behind.

(NB Let’s assume that a person’s name is invariant. If you’re writing code for a neonatal facility, that assumption might not stand, but for our application, let’s assume the person is named.)

public class Person {

private Integer id; //surrogate key
private Integer age;
private String name;

// can’t construct a person with no name
// this ensures it’s not implicitly called ever
// plus, hibernate like a (not necessarily public) default constructor
private Person() {}

// public constructor
public Person(final String name, final Integer age) {

// don’t actually assert in production! throw instead
assert name!=null && !name.isEmpty() : “Name can’t be null or blank”;
assert age > 0 : “name must be positive”

this.name = name;
this.age = age;

}

//accessors
public String getName() {
return name;
}
public Integer getAge() {
return age;
}
public Integer getId() {
return id;
}

// no javabean style mutators!

// irrelevent business logic omitted

}

Ok, so that’s a pretty simple and straightforward class. Seems like a bunch of code, but don’t forget that eclipse can generate almost all of it for you (Source -> Constructor Using Fields & Source-> Generate Getters)

Next up, we need to map it to the database. That’s easy enough with a Hibernate mapping file.

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE hibernate-mapping PUBLIC “-//Hibernate/Hibernate Mapping DTD//EN”
http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd”&gt;
<hibernate-mapping>
<!– map the class to the table –>
<class name=”com.company.Person” table=”person”>
<!– let mysql (or whatever) handle the autoincrement PK –>
<id name=”id” column=”id” access=”field”>
<generator class=”native”/>
</id>
<!– by default, field names map to table names and type is inferred –>
<property name=”shortcode” access=”field”/>
<property name=”aggregator” access=”field”/>
</class>
</hibernate-mapping>

Note the access=”field” attribute on each of the properties. Hibernate has a magical ability to set object member variables, even when they are private. That lets me keep my class invariant, but still allow object creation from the database. It just needs a hint so it knows to set the variable directly instead of trying to use a mutator. If you have setters, you don’t need that attribute.

It’s also worth noting that ActiveRecord syntax has Hibernate beat hands down. It’s actually quite unbelievable. It takes a massive amount of XML to do a simple mapping such as this one that in Rails requires literally no code at all. (Technically, since I have a non-conventional legacy database name, it would require one line: def self.table_name() “person” end).

OTOH, I love having a DTD to validate my mapping document against. Why? Because I have fat fingers and I make a lot of typos. In Rails, this is only apparent at runtime. That is the second piece I have trouble with: failing at runtime because of a typo just seems crazy.

NB I’ll save this for another post, but I’ve been having trouble with more complicated ActiveRecord mappings. Specifically, I can’t quite get multitable inheritance with a discriminator column to work properly. Also, I do a lot of composition in my class designs and all this lazy loading is killing me. Any way to do lazy-load=”false”?

OK, on to the Java DAO code. I created a (Java5) generic Dao interface with the basic CRUD operations and a generic implementation. When I say “generic”, I am referring to the class being accessed – not the actual data access later. It is certainly possible to use the pimpl idiom to abstract away the data access technology so you could plug in Hibernate, iBatis, JPA, JDBC, etc with no client code change. But I like Hibernate and have work to get done 🙂

Oh, and I like Spring a lot too, so I use their HibernateDaoSupport class to handle all the ugly Hibernate & database stuff. Tying the DAO to the implementation a little, but again, I got work to do 😉

Let’s say I don’t want any operations other than CRUD for now. (These aren’t coded – just showing you what I would ideally want by default):

Person findById(Integer id);
List<Person> findAll();
public List<Person> findByExample(Person person, String[] excludeProperty);
Person save(Person person);
void delete(Person person);

Here is my actual interface (yes, you should be coding to interfaces!)

public interface PersonDao extends GenericDao<Person> { }

Good. No code there. That’s perfect.

What about the implementation?

public class PersonHibernateDao extends GenericHibernateDao<Person> implements PersonDao {

// need a session. Just dependency inject it in the constructor
public PersonHibernateDao(SessionFactory sessionFactory) {
super(sessionFactory);
}
}

Good. No code there either. Let’s test it out (Spring context file omitted)

public static void main(String[] args) {

// just for testing – this would be injected in production, of course
String[] xml = { “DatabaseContext.xml”};
ClassPathXmlApplicationContext ctx = new ClassPathXmlApplicationContext( xml );
PersonDao dao = (PersonDao)ctx.getBean(“PersonDao”, PersonDao.class);

// try persisting a person
Person p = new Person(“ben”, 28);
dao.save(p);

// look them back up
System.out.println(“Looking up person with id=1”);
Person p1 = dao.findById(1);
System.out.println(“Found person ” + p1.getName());

// let’s assume we have more data
System.out.println(“Looking up all people”);
for (Person person : dao.findAll()) {
System.out.println(“FOUND ” + person.getPerson());
}

// lookup a person by example – poor mans criteria query
Person example = new Person(“”, 30);
System.out.println(“Looking up all 30 year olds, ignoring name”);
for (Person person : dao.findByExample(example, new String[]{“name”}) ) {
System.out.println(“FOUND ” + person.getName());
}

}

Excellent! That all worked like a charm without any actual code!

Of course it took some time to write the GenericDao, which I will present below, but it only had to be written the once. Very nice.

// the generic interface
public interface GenericDao<T> {
T findById(Integer id);
List<T> findAll();
public List<T> findByExample(T o, String[] excludeProperty);
T save(T o);
void delete(T o);
}

And the generic implementation:

// slightly tied to spring for convenience, but obviously not necessary
public class GenericHibernateDao<T> extends HibernateDaoSupport implements GenericDao<T> {
private final Class<T> persistentClass;
private final static Logger log = Logger.getLogger(GenericHibernateDao.class);

public GenericHibernateDao(final SessionFactory sessionFactory) {
super.setSessionFactory(sessionFactory);
this.persistentClass = (Class<T>)((ParameterizedType)getClass().getGenericSuperclass()).getActualTypeArguments()[0];
}
// all the CRUD implementations
@SuppressWarnings(“unchecked”)
public T findById(Integer id) {
log.debug(“Finding by id ” + id);
return (T) super.getHibernateTemplate().get(getPersistentClass(), id);
}
public List<T> findAll() {
log.debug(“Finding all”);
return findByCriteria();
}
@SuppressWarnings(“unchecked”)
public List<T> findByExample(T o, String[] excludeProperty) {
log.debug(“Looking up ” + o);
Criteria crit = getSession().createCriteria(getPersistentClass());
Example example = Example.create(o);
for (String exclude : excludeProperty) {
example.excludeProperty(exclude);
}
crit.add(example);
return crit.list();
}

public T save(T o) {
log.debug(“Saving and returning ” + o);
super.getHibernateTemplate().saveOrUpdate(o);
return o;
}
public void delete(T o) {
log.debug(“Deleting ” + o);
super.getHibernateTemplate().delete(o);
}
// and the helper methods
private Class<T> getPersistentClass() {
return persistentClass;
}
@SuppressWarnings(“unchecked”)
protected List<T> findByCriteria(Criterion… criterion) {
Criteria crit = getSession().createCriteria(getPersistentClass());
for (Criterion c : criterion) {
crit.add(c);
}
return crit.list();
}
}

Thanks to this Hibernate forum post and this blog post for the help. And screw wordpress for ruining my code formatting 😉 Sorry about that.


Stop Stop Bush

April 27, 2007

Kids, it’s been 7 years.  He’s not going to be impeached.

Start focusing your effort on what happens NEXT.

Please.


the composite extension is not available

April 26, 2007

My Google homepage has an “Interesting Things For You” widget.

One of the interesting things it lists is a search for “the composite extension is not available

I really need to start spicing up my searches!


My Thoughts on Google Web History

April 25, 2007

Last week, Google enabled Personal Web History.
And of course, everyone freaks out about privacy concerns. Oh no! Google knows what I do on the Internet!

OK, so if you haven’t been paying any attention at all, I understand that this might be a surprise. If you’ve been paying a little attention, it’s not that surprising, but maybe a little concern.

What I’d like to do here is discuss a number of Google’s products and how they relate to collecting personal information and integrating it into their product and their business model.

=== if you’ve been “paying attention,” skip ahead ===

The Product:
Google’s value as a search engine is, in a sense, knowing what you are looking for. How does it know what you are looking for? By analyzing more data than you can shake a stick at! And where does that data come from? Classically, it was by analyzing the link structure of the web, etc, etc. But it’s been 10 years since then, and they must be doing some smarter things. One invaluable asset they own and continue to collect is your clickstream, or what you click on and the sites you visit. If they know that, they know how to tailor their results.

The Business:
Google’s value as an amazingly successful company is their advertising platform. And what’s the value there? What makes it so much better than everyone else? Contextual ads. Text ads that are in context with what you are looking at. And of course the holy grail of contextual advertising is knowing what the user wants to do (say, buy a digital camera) and providing them the best link to accomplish that goal.

What does this all have to do with the Google Web History? Well, it boils down to what John Battelle calls the Database of Intentions. If Google knows and understands my surfing habits, both individually and in aggregate with the rest of the web, they can deliver the best possible results to me.

When I search for “tomcat”, am I searching for the fighter jet, the animal, or the Java application server? They *know* it’s the latter. You better believe I’m not going to get ads for jets. I’m going to get ads that say “Google is hiring Java programmers!”

==== End background information ====

Ok, many of you already know all of that. Not big whoop. So where’s the interesting part? How do they get all this data?

I’m going to break this discussion down into two parts.

1. The (relatively) obvious stuff – the things Google knows about you because you straight-up type it into a search box on their web site

2. The stuff that they know about you that you probably didn’t realize they did. That’s the interesting part.

Lets just review a bunch of their products and see how each one breaks down with respect to those two criteria.

Search:
This one is pretty straightforward. They know everything you search for. Obvious. They also know every link you click on. That’s not that very surprising, but not quite as obvious to most people. And they also know when you click it. So basically they know every single web page you go to from their search page. That’s a great start in aggregating data. But there is so much more data to gather from your online activities that don’t start at the Google home page!

AdSense:
Next up is AdSense – Google’s ad platform that let’s publishers (say, bloggers, geeky humor websites, etc) put ads on their site and get paid when people click. At first glance, it seems like a win-win for everyone. Google gets paid. The publisher gets paid. And the advertiser gets targeted traffic.

But there’s one more subtle thing going on here. Something that was pioneered back in the day by DoubleClick. Google *knows* that I’m reading, say, the New York Times. How? Well, they are the ones showing me the ads. If they serve up an ad, they know who is viewing it. That’s just the way the Internet works*. So even though I didn’t get there through google.com, they still know. They know tons of pages that I go to. Every time you see a Google ad, Google knows you’ve seen, what page you saw it on, what time you saw it.

(* Small technical note: How do they know you see it? Google uses a cookie in your browser. You keep the cookie b/c that’s how you stay logged into Gmail and how they know your search preferences and remember your zip code. Everyone uses cookies. If you have a Google cookie, any time you browse a page that has Google content, they know. The real value in the cookie is the federated network. If you want, you can disable “3rd party cookies” in the Preferences in your browser, which is exactly what these are called. Yahoo has good details about the practice).

Google Analytics:
So everyone who runs a website wants to know how many people visit. I sure do. The best free way to track this? Google Analytics. Paste 2 lines of code into your web site and you instantly have some of the best analytics & stats for your site, completely free. Tres cool!

Where’s the rub? Why does Google do this? (Yes, I know it’s a great service to their advertising partners aiming to raise their click through rates, ROI, etc). But where’s the hidden value? Google now knows everyone who visits a site that uses Google Analytics. Just like that. I currently run about 10 different web sites. For whatever reasons, only 2 of them run Google ads. But all 10 run Google Analytics. Google knows every single person that comes to any of my sites.

Google Toolbar:
The toolbar is awesome. Popup blocker. Autofill. SMS. Custom search. Auto-Suggest. Completely free. Who wouldn’t want that? It’s such a great product. The catch? They know every site you go to when using the Toolbar, regardless of your search engine. Now you start to see why they push so hard to get the toolbar out there. (Of course it also has something to do with the fact that 9x% of people don’t change their default settings, and if they are the default search box, they get search traffic. If MSN search is the default, the don’t. Which is why they pay so much money to Dell, Mozilla Firefox, Opera, etc to get the toolbar & search box pre-installed.)

Gmail:
This one has a few interesting side effects. First, it all but guarantees that I’m keeping my Google cookie around forever. Who wants to sign-in to check mail every time? I certainly don’t.

When gmail first came out, the privacy advocates were up in arms about Google’s algorithms serving context sensitive ads based on the content of the message. They completely missed the point!

You want to hear the tinfoil hat privacy problems with GMail? Google knows my relationship with every single person in my address book. Oh, and remember when Gmail was invite only? And to prevent spammers from abusing the system, they made you verify your identity with an SMS text message? One account per SMS? Well blam-o! They can now tie you to a phone number!

Google Maps & Local:
Well, they certainly know where I live. “Save this address” was the first thing I did (after drooling over the AJAX & satellite imagery, of course). And thanks to Local search, they know I really like Thai food. And since I click-to-call my cell phone, they can continue to keep tabs on my cell phone number. And they know that I use a Treo and I have Verizon.
You think all those Google Maps mash-ups are free? Think again. There are currently 10,000’s mash-ups of data with maps all over the web. And you know what? Google gets all that data fed right back into their database. They know who’s looking at it, and who’s clicking on what. It’s just like Analytics & AdSense. More federated networks.

Google Talk:
Physical presence. Google now know *when* I’m online and whether I’m active or not. 24 hours a day monitoring. And for those of you who don’t really IM, let’s add this online presence to GMail too so we can track it there.

Google Mobile Search:
I think this one is particularly brilliant. Ever surf the web on your cell phone? Google is a good place to start. And what do they do? They realized that most of the internet looks like crap on the cell phone. So they proxy all the web pages through their servers and reformat them make them look really nice on the tiny screen. For free. I usually go to google, even if I know the URL, just to get thing formatted for mobile.
The catch? All your web traffic if funneled through Google again! It’s tough to install a toolbar on a phone browser. But they’ve done it again! They know every page I visit on my phone!

Google Reader:
I read dozens (hundreds?) of feeds in Bloglines, my feed aggregator. I don’t surf the web for news. I don’t store bookmarks and go to cnn.com every morning. I just read my feeds. What’s the problem there? Bloglines (IAC) is in the middle. No cookies. No Google ads. Nothing. Suddenly, Google doesn’t know what I’m reading anymore. Can’t serve me ads. Can’t get at my clickstream.
So what do they do? Build a kickass Feed Reader, make it better than all the competition, and get people switching! Makes a lot of sense now, huh?

Google Desktop:
A search index of all the content on your local hard drive. Awesome. Next step? Synchronize this data across your computers. Of course you want to search from your laptop. And maybe while you’re at the office. How do you do this synchronizing? Easy! You upload the entire index of your hard drive to Google’s servers! Now they have access to all that data! Brilliant!

Google Checkout & Google Finance:
You mean other than getting my credit card number and knowing what’s in my 401K? I don’t see too much benefit to Finance, but I do see a ton of great data coming through Google Checkout. In addition to the obvious cash cow it could (potentially) become for them, they are now getting hard data about who buys what, when, and for what price. Cool.

Orkut & Dodgeball:
Know my friends. Know where I am in the physical world at any given time. Could be useful.

Google Browser Sync:
Want to keep your bookmarks synchronized at work, home, and on the laptop? Make sure you have the same extensions installed in both places? Hell, why not keep all the same tabs open in both places! Guess how this is done? You guessed it – you send every open web page, every bookmark, every extension, right up to Google’s servers. Clever girl!

Google Calendar, Google Docs & Spreadsheets:
To tell you the truth, I see a ton of strategic value in all these products for Google, but I haven’t really gotten my head around any alternative data gathering motivations. I think it’s a brilliant business strategy, and you better believe they will make serious bank with these, especially with branded Apps for your Domain, but I don’t see any other snooping privacy issues other than the glaringly obvious ones.

Free Google WiFi:
Use free Google wifi in California? Of course all the traffic goes through their servers. They are your ISP! They know every site you visit.

Google Web Accelerator:
Not a very well known one – this is a little proxy & caching server to speed up your web surfing experience. The result? All your traffic flows through their servers.

Google Secure Access:
I think this product is no longer available, but for a while you could use Google as a secure proxy server when you were in a public wi-fi hotspot. That way no one could snoop on what you were doing. A pretty nice free service. But again, they are getting every bit of traffic data that you surf.

=======

So what’s my point here? This isn’t conspiracy theory or paranoia or a privacy rant. Google offers more amazing free services than any other company, and I use almost all of them.

What I find frustrating is that people don’t *realize* what goes on with their data and their information. 99% of people don’t understand these concepts. Maybe they care and maybe they don’t. Not for me to judge. But what gets my goat is the sudden outcry! “Oh no! Google has a web history! They know where I surf!” Of COURSE they know where you surf. Now they’re just SHARING it with you! You should have been way more pissed off for the last 8 years when they WEREN’T sharing it.

It sort of reminds me of when Google refused to turn over search records to the DoJ. And all the media spin was about Google having search records stored. The fact that MSN, Yahoo and AOL silently turned over all the data without a fight was completely glossed over in the MSM! Frustrating!

Some other thoughts if you’re still reading:

ISPs: Don’t think for a second that Google is the only company that can and does do this. No one does it nearly as well, but you better believe that your ISP (Comcast, Time Warner, AOL, Cox, etc) knows every single packet that you send and receive. Every single one (except secure sites starting with https). They don’t have the same coverage or breadth or application level understanding as Google, but they still get every packet.

Disclaimer:
I don’t work for Google. Lots of this may be considered speculation. I obviously don’t know how they analyze this data. Or what the value of my open Firefox tabs & cell phone number is to them. All I know is the features in their products, a bit about how they work, and what _I_ would do if I had all this data to play with 🙂

Concerns:
So am I concerned? Does this bother me? Do I trust Google? Well, let me first say that it really doesn’t matter what I think. By the very nature of living online, you surrender all of this data to someone. If you switch search engines, you’re giving it to someone else.

To be honest, I’m currently much more concerned with Equifax, who has copies of my hospital records, and who makes our voting machines than Google. How often does the government seize computers for forensic evidence? Is that more or less scary to you?

Yes, I would like to see more transparency (see Attention Trust). Yes I would prefer more data portability (Google Web History alone is such a great feature that it has me locked in – I can no longer switch search engines).

Notable Missing Topics

Because I have to actually get some work done, some pieces I wanted to touch on are going to have to wait until next time. Specifically, Google’s Privacy Policy and what that all means. Other interesting related topics include anonymizing search records, deleting (or not deleting) data when the user requests it, releasing data in aggregate, security measures taken to ensure data can not be stolen (including by employees).

Also, I only really covered things that affect consumers – I didn’t get into any discussion about the data that they can gather from Ad Words, such as optimal pricing on keywords.

Denouement

Giving up my privacy is a decision that I have made. It’s something I think about often and am cognizant of when surfing the web. Oh, and more importantly, I try to keep my confidential information (mostly finance stuff) confidential.

Oh, and I am careful to use my wife’s computer when listening to Lily Allen, Kennyb’s computer when surfing for pr0n, and my neighbor’s wifi when downloading Heroes 🙂


Time to Upgrade

April 24, 2007

You know it’s time to upgrade when you can’t watch the new Harry Potter and the Order of the Phoenix trailer on your computer.