mgroves

CodeMash 2010

I've arrived at CodeMash 2010 in Sandusky, Ohio.

It's a great time to reflect on how much has changed since last year: what I've learned, the new people I've met, the new kid on the way, everything. Keep an eye on my Twitter feed if you're interested in what I'm up to, and keep an eye on the RSS feed, as I'm going to try to update at least once a day, as I did last year.

On the docket for Wednesday: Ruby Koans with Jim Weirich and Joe O'Brien. Should be interesting, as I don't know a lick of Ruby. After that, Software Craftsmanship? Then maybe check out the game room or the Microsoft room.

What is connascence?

Connascence (con-nass'-sense) is a term that combines ideas from coupling and the single responsibility principle Connascence means that (1) if you change A, you have to change (or at least check) B to keep the program working correctly, or (2) some change outside of A and B would require both A and B to be changed to keep the program working correctly.

There are many types of connascence, here are a few listed in Chapter 8 of What Every Programmer Should Know About Object-Oriented Design:

Jim Weirich

Connascence of name. This one's easy: variables have names, so if you change the declaration, you have to change all referring code. These days, that's usually easy to do with refactoring tools like ReSharper, but consider a SqlDataReader. If you change a column from "foo" to "bar", then you have to change your SqlDataReader references from dr["foo"] to dr["bar"], and then if you are mapping to an object, you might even want to change MyObject::Foo to MyObject::Bar.

Connascence of type/class. If a variable has a type, (like int), and you decide to change it to string, you (may) need to change code that sets it to an int. C#, a static language, will give you build errors (hopefully) when you come across this situation, but consider a language like PHP or Ruby where, depending on the situation, might work just fine or might not work how you intend in this situation.

Connascence of meaning. This one is a real pain. Let's say that account number 0 is defined to always be the administrator. If you do that, then your code might have a bunch of if(accountNum == 0) { doAdminStuff(); } all over the place. If you ever changed that, you'd have to make changes almost everywhere.

Connascence of algorithm. This is very similar to connascence of meaning: if you know that the elements in some sort of generic collection (say, List<T>) always iterate the same order that they were inserted, you could build some piece of code to take advantage of that. However, if List<T> doesn't make any claims to being sorted or deterministic, a change down the road could break all of your code until you change it to a SortedList<K,V>.

Connascence of position. Code must be in the right sequence or the right adjacency in order to work. The only example I could think of for this is method arguments: unless they are named arguments, if you want to switch them, you have to switch them everywhere the method is used. Of course, if they are named arguments, then maybe you are just trading one connascence for another (which might be more easily refactored).

According to Page-Jones, those are all types of "static" connascence, which applies to the code itself. "Dynamic" connascence applies to the execution, or the runtime of the code.

Connascence of execution. This is similar to connascence of position. In some languages, a variable must be initialized before it is used, and that's an example.

Connascence of timing. This one is related to real-time systems, with which I don't have much experience. Page-Jones gives an example of an X-ray machine that must turn off n milliseconds right after it is turned on.

Synchronization, get it?  Hey, YOU try to come up with pictures for this stuff.

Connascence of value. The values of an object or variable are constrained. For instance, if you define a rectangle by 4 points, then those 4 points just can't be any point. Page-Jones also gives the all-to-familiar example of (bidirectional) database synchronization, where two databases are required to hold redundant information. Any minor change in either database could make the whole bridge go kablooey.

Connascence of identity. If an object A and object B must both point to the same object C in order to work correctly, or in concert, then A and B have connascence of identity. So, if A is pointing to some database, for instance, B must be pointing to the same database. If they aren't they might still work, but show the wrong data to the user, or they might just biff completely at runtime.

Contranascence. So, this is not the opposite of connascence, but rather a "connascence of difference". If you have an int i; and an int j;, you can't rename j to i without also renaming i to j (or something else). These variables are related by the fact that they are different. Multiple inheritance can cause all sorts of problems here, as can a language without namespaces.

So, what does all this mean? Page-Jones posits that connascence and contranascence are absolutely key to understanding modern software development. Without encapsulation, managing connascence and contranascence become incredibly hard. A huge chunk of procedural code would be very difficult to make changes to, as it could incur one or more of the above listed connascences and break the program. So, there are three guidelines that he lists to reduce that problem in an object-oriented system:

  1. Minimize overall connascence by breaking the system into encapsulated components (duh).
  2. Minimize any remaining connascence that crosses encapsulation boundaries (single responsibility principle).
  3. Maximize connascence within encapsulation boundaries. (I think this is the key)

Basically, keep like things together and keep unlike things apart. And using the types of connascence, "likeness" can now be qualified.

Capcom and Dark Void Zero

So, you probably remember Mega Man 9, Capcom's retro-style Mega Man sequel, and you've probably heard about Mega Man 10 being a similar release. But Capcom is also doing the retro-style thing for a new property: Dark Void Zero. Here's a video about the (fake) backstory:

I'm not sure how I feel about this retro campaign from Capcom. On the one hand, I love the styling, music, and nostalgia, but on the other hand I feel like if they milk it too much, it won't be special anymore. What do you think?

Project Euler: the beginning

Project Euler is an ongoing series of math/programming problems. I'm going to try to go through as many of these as I can this year, post the code for them, and talk about anything interesting that I learn in the process.

You can follow my progress by checking out a CodePlex repository for Project Euler that I've set up. If you have a Project Euler account, you should be able to see my progress on the mgroves Project Euler profile page.

The first problem was pretty easy: "Find the sum of all the multiples of 3 or 5 below 1000." To do this, I just loop through each number from 3 to 999, and take the modulus 3 and 5 of that number. If either modulus is 0, add it to a list. When done, sum up the list:

            for (int i = 3; i < x; i++)
{
if (((i%3) == 0) || ((i%5 == 0)))
{
list.Add(i);
}
}

The second problem is slightly more difficult: Find the sum of all even numbers in the Fibbonaci sequence which do not exceed 4,000,000. My code takes a straightforward approach; I'm sure the math majors out there can think of a better way:

        public static IList FibonacciAllEvenValueTermsLessThan(int x)
{
var list = new List();
int term1 = 1;
int term2 = 2;
list.Add(term2);
while(term2 < x)
{
var newterm1 = term2;
var newterm2 = term1 + term2;
if((newterm2 % 2) == 0)
{
list.Add(newterm2);
}
term1 = newterm1;
term2 = newterm2;
}
return list;
}

For both problems, I take the result and use the Linq Sum() extension method, because I'm a total Linq junky these days. I also started out using longs instead of ints, before I realized that, hey, 32-bit numbers go up to around 4 billion (232), not 4 million.

PHPExcel Podcast

So, as I've mentioned before, I'm a (very minor) contributor to the PHPExcel project. Well, the coordinator of the project, Maarten Balliauw, was recently featured on an episode of the Connected Show podcast (Episode 22), in which he talks about, among other things, PHP Linq and PHPExcel.

Seems like PHP and MIcrosoft are creeping closer to each other on the world's Venn diagram, which I find very interesting.

Games I'm playing these days

For Christmas, I received a bunch of gift cards, so I bought some new games.

I bought:

  • New Super Mario Bros Wii
  • A Boy and His Blob (Wii)
  • Batman: Arkham Asylum (Xbox 360)

So far, they are all pretty fun, but I think I've spent the most time with Batman, especially if you include all the times I played through the demo. The most fun parts, for me, are the parts where you get to do stuff that's very uniquely Batman. Specifically, the parts of the game where you are in a room with a handful of armed thugs, and you have to take them all out stealthily. The main strategy is simple: wait until they get far enough away from their friends, and take them out, one by one. You can do this a number of ways: sneak up behind them and do a silent takedown, glide across the room from up on a high perch and kick them, hang upside-down from a perch and string them up, etc. It really feels very Batman-esque, and is the most fun I've ever had playing a stealth game (which I usually find frustrating). This video will give you some idea what I'm talking about:

I haven't played much of A Boy and His Blob, but keep this in mind: it was only $20, and it just came out. I first heard about it on the Retronauts podcast (episode 79). I mostly enjoyed the original one for NES, so I thought I'd pick it up.

As far as New Super Mario Brothers Wii: don't play it with other people (especially people you like). Otherwise, it's a fantastic 2d-style Mario game that's very reminiscent of Super Mario Brothers 3 (more so than any other 2d Mario platformer since).

What Every Programmer Should Know About Object-Oriented Design

I've been reading What Every Programmer Should Know About Object-Oriented Design by Meilir Page-Jones. It's been on my reading list for some time since I first saw it mentioned in a video of Jim Weirich's presentation titled "Grand Unified Theory of Software Design" (about 15:30 in the below video).

UPDATE: I've heard that there are some audio problems in some parts of this video, so maybe check out a video of this same presentation at a different conference (about 18:40 is when the book is mentioned).

The book is mentioned mainly to introduce the term "connascence", as a way to talk about coupling within code. I will write a post about connascence later, but right now I just wanted to give a quick overview of this book, as I haven't finished it yet.

The book is broken into three main parts:

  • Part I - Introduction: what does object-oriented mean, and why should it be used? This was a sort of review chapter for me, as it just covers the basics of encapsulation, inheritance, polymorphism, etc. I would recommend skimming this chapter, as it doesn't really cover anything you haven't covered in an intro to programming class.
  • Part II - Object-Oriented Design Notation: This is a big chunk of the book (5 chapters) which details a visual notation to design and represent objects. This is not a notation that's in wide use, at least not that I'm aware of, so you might consider completely skipping this part of the book, or at least skimming very fast.
  • Part III - The Principles of Object-Oriented Design: This is the real meat of the book, as far as I'm concerned, and the part that I'm currently reading. This part introduces concepts of a well-designed object-oriented program, including encapsulation, connascense, cohesian, encumbrance, domains, polymorphism, and interfaces.

New Year's Resolution?

Well, I've been neglecting this blog for some time now. But starting in 2010, you will see some more activity here.

Why? It's not a New Year's Resolution, per se (because I think those are silly), but it's more to do with the mentoring program I'm involved in at work.

See, Quick Solutions matches up programmers with a mentor that acts as a professional resource, career advisor, etc. Mine has asked me to make some concrete, measurable goals to accomplish over the next year, and here they are:

  • Read books: I will be reading around 2 books related to my profession every quarter (for instance, the first one is What Every Programmer Should Know About Object-Oriented Design by Meilir Page-Jones. As I read these books, I will be blogging about them to a) demonstrate my progress, and b) share what I've learned.
  • I will be taking the PMP exam in February. Not really blog related, but you might see something about it.
  • Contribute in a meaningful way to an open source project. I've already been accepted as a contributor to the PHPExcel project (believe it or not), and I've already contributed a patch to it. You may see some posts about my contributions, and what I've learned from them.
  • I've thought about writing up a whitepaper/case study on the various mobile application marketplaces. There are 3 major marketplaces now for mobile apps, and I'm wondering what the advantages/disadvantages are for each. What is the pricing like, what are the platforms like, what are the marketplace restrictions, etc. I would certainly blog about that.
  • I plan to give at least 3 presentations within the company and/or the local developer community. I've already done an intro to CakePHP presentation for the local PHP meetup group, and I'm scheduled to give it internally again. I'd also like to learn more about db4o (and giving a presentation on it is a great way to learn). I haven't come up with another topic yet. (By the way, if you have a local group you'd like me to speak to about these topics, let me know).
  • I will be attending conferences and seminars at a similar rate to last year, including: CodeMash, Stir Trek, Central Ohio Day of .NET (CODODN), and various Firestarter events and user groups, and what not.
  • I plan to tackle one CodeKata or Project Euler project every 2 weeks for the entirety of the year. I'll probably do the CodeKata's in C# and Project Euler's in some other language (F#, Ruby, PHP, etc). This will represent a large portion of my blogging activity, and will probably be very boring for you non-programmers. Sorry about that.

There are some other things that are part of the plan, but they really are more internal to Quick Solutions, and thus I won't be blogging about them.

I mention all of this in the hopes that doing so will keep me somewhat accountable: if I don't do all these things, then it will look rather silly and pathetic for this post to constantly be on the main page of my blog.

New look and new engine

It's the new year, time for fresh starts, resolutions, and..uh..drinking.

So in the spirit of all 3, welcome to the (4th?) new look of mgroves.com. This time, I'm using a whole new blogging engine. Instead of my custom engine that I wrote years and years ago, I'm using Habari, which is an open source PHP blogging engine. The Habari guys came to Columbus to demo and promote Habari at the PHP Meetup. I liked it and thought I would give it a go.

Additionally, you'll notice a new design. This design was not created by the always talented Jon Plante, but instead it is a Habari theme, with a few minor adjustments. I've never been a fan of using themes--I'm afraid someone else will use the same theme and then it'll be like we wore the same dress to a party, and we all know how embarrassing that is. But I'll give it a go, and if it doesn't work out, I can always cajole Jon for some more free labor.

So anyway, leave a comment, let me know what you think, and let me know of any kinks that I need to iron out.

Code Kata 4 - Data Munging

Here is the fourth of the CodeKata exercises.

Data munging--sounds kinda gross in some...indefinable way. But it's actually some programmer's slang meaning "Mash Until No Good", or if you prefer recursion, it means "Mung Until No Good".

This Kata is mostly in code with a few brief questions. I've chosen...wait for it...ASP.NET with C#. C# because it's what I'm currently best with, and ASP.NET because I find console apps dirty, and a full Windows app to be overkill. I've made the source available on CodePlex. Okay, on with the Kata:

Part One: Weather Data

I initially thought the data was tab-delimited, but no, turns out it's just fixed width. Fixed width data from a 3rd party can be very tricky to write a reliable parser around, but since this is just a learning exercise, I'm going to take a very naive, minimal approach. I created a "WeatherData" entity class to hold the day, max temp, and min temp, and a method to calculate the difference. I don't need the other data, so why waste time trying to parse it? I created a WeatherReader class to read in the text file and output a nice list of entity objects.

When parsing, first of all, I need to ignore the first 8 lines of the text file, because they are not data. Then, I need to read in each line of data one at a time, and use a .Substring call to pull each column out. I put a wrapper around the .Substring because I also wanted to Trim, remove invalid characters (notice the asterisks), etc.

When reading each line in a loop, I need a stopping place. Again, I took a naive approach and just stopped reading once I reached the "mo" line.

Once I have a nice List of WeatherData, I just use some Linq to search for the day with minimum spread. If you've not used Linq in C#, think of it like SQL for collections of objects, because, that's pretty much what it is.

Part Two: Soccer League Table

Pretty much the same story here. I created a SoccerTeam entity and a SoccerReader class to read in the text file, parse it, and output a list. The data is a bit different in that it's 5 lines at the top to ignore, there's a big dashed line right smack above Ipswich, and the data ends with a '</pre>' instead of 'mo'.

Part Three: DRY Fusion

So here's the important part: factor out the common functionality of parts 1 and 2 in order to follow the DRY principle.

This can be a tricky exercise because the parts are so deceptively simple. There's obviously some of the same things going on here, but I think an approach that's too aggresive can lead to some Liskov Substitution Principle issues. A very aggresive approach might lead to one class to handle both the Weather and Soccer information, which one might be able to pull off, but there are some downsides to that:

  1. If the weather.dat file format would change, any updated code could break the football.dat processing, and vice versa.
  2. If you want to add another file and entity, like grocery.dat, you will likely have a bunch of if/switch statements to update, and again, you risk breaking the other processors.
  3. Intuitively, Weather data and Soccer data are not really cohesive concepts. Even Baseball data and Soccer data would be a really big stretch, whereas English Premier League and Major League Soccer data could be much more cohesive.

So, instead, I took more of an incremental, systematic approach to refactoring.

I first created a base class: DatReader. I then made WeatherReader and SoccerReader inherit from DatReader. There's one method that was an obvious move to the base class: GetColumnValue, my SubString wrapper. The method was 100% identical in both classes, so that's an easy one. I also had a FileName string field that was identical in both, so that was easy to move too.

Okay, that's the fruit that's close to the ground, anything else would require some actual refactoring:

  1. There were two constructors in each class, one of them worked directly with FileName, so I moved that and just changed the parameterless contructors in the subclasses to call "base" instead of "this".
  2. The MapText...() and TheText() methods contain logic that is specific to the dat file, so while there is some overlap that is very tempting to refactor, I mostly left those alone, because in the real world, those naive implementations will probably change very, very often.
  3. However, the GetAll...() methods looked similar enough to factor out. In order to do that, the base class is going to need to be generic. Then I needed to add two abstract methods for TheText and MapTextToEntity, the latter of which will return my T generic type, and also be the new name of the MapText...() methods in the subclasses.
  4. WeatherReader should now inherit from DatReader<WeatherDay>, and likewise for SoccerReader.
  5. Now that MapTextToEntity is abstract, the GetAll...() methods don't need to call anything but base class methods, so I refactored that to be just plain GetAll, returning an IList<T>.

Beautiful. It looks pretty darn elegant to me. Each subclass contains exclusively logic specific to the dat file it's trying to read. Any changes to the dat formats will only require changes in the subclasses, and nowhere else.

Kata Questions

To what extent did the design decisions you made when writing the original programs make it easier or harder to factor out common code?

Splitting the code into an Entity and EntityReader classes ala the Repository pattern I think made it easy to identify what logic should go where. Additionally, splitting the functionality of the Reader classes into small, readable methods made it much easier to see where refactoring would be a good idea, and more importantly, where it would not.

Was the way you wrote the second program influenced by writing the first?

Yes. I practically copied and pasted my way through the whole thing. It took only minutes to write, compared to the first, which took an hour or two.

Is factoring out as much common code as possible always a good thing? Did the readability of the programs suffer because of this requirement? How about the maintainability?

1) Yes, except when it's not. 2) A little. "GetAllSoccerGames" is much more usable than "GetAll", but I think the tradeoff is okay, because the class name is still "SoccerReader". 3) Because I've dealt with fixed width text file parsing in the past, I was very cognizant of the maintainability issues, so I think the way that I refactored it would be very easy to maintain, were this a real application.

This a cool exercise. I feel like my solution is a very good one, and I think I went through this Kata in exactly the way it was intended. If you disagree, please check out my CodeKata source code at CodePlex.