CodeHeadWords

Monday, October 27, 2008

It's Been Awhile

I've never been good at blogging, really, although I should embrace it, I suppose.

Anyway, I thought I'd put this idea in the blog for kicks.

All of the STL containers have their abstractions and their implementations. The abstraction is conveyed through the iterator - all containers support iteration for accessing, inserting, deleting, etc. Any function that uses iterators should, in theory, generate the same result on all containers, although performance may vary.

The containers also have their concrete implementations. Set will only hold unique instances. Vector is contiguous. List supports quick insertion and removal along its length.

Then, there is the grey area, where implementation and abstraction blend. The fact is, if we could embrace only the abstraction of containers, we would only have one: std::container.

It is very rare that any code is completely abstract when it comes to stl containers - and it is entirely possible that code that fully embraces abstraction can miss the opportunity to use the strength of any given container to address a problem domain.

That being said, we must all guard against trying to be too clever.

Thursday, November 15, 2007

Trees

There is no tree in the STL. Specifically, there are no n-ary trees in the STL, which means that in C++, there is no option to create a tree structure.

It just so happens I needed one. So I had two options: download one from the web, or roll my own.

I tried the former, in accordance with the axiom: never, ever, roll your own when you can steal one. It is just stupid. Really. Unless you are trying to learn; then it is good.

I think Picasso who is credited with the line "Good artists copy, great artists steal." Coding is art. Don't gloss over it - sure, just like art, there are a bunch of hacks who cobble shit together and use a bunch of buzzwords, and try to bamboozle people with a flurry of concepts to prevent them from seeing that what they have in front of them is truly crap. And, just like art, the layman is not sure what art is, or what makes up good art. But trust me on this: coding is art. It is design as pure as graphic design, and an elegant design is beautiful in a way that compels you to use it. The STL is beautiful. Really. Even if you hate C++, if you spend a half hour with it, the STL reveals its elegance. But just like a painting by a master, it looks kind of crappy next to the smooth, shiny, plastic stuff. That is the trick with art: to appreciate it, you really have to look, you have to peel it apart and see how it interacts within the context of its existence.

So, when someone makes something great, you steal it. The problem is, of the two tree implementations I found on the web, neither was robust enough to do what I needed. In that case, I had to make it.

Now, before I continue, realize that implementing a tree structure is not easy at all. A tree, more specifically, an n-ary tree, is a difficult structure to create. The n-ary means it has any number of children, and it is, by definition, a recursive structure. It is different from a binary tree, where each node has, at most, two children. Binary trees are used in the STL to manage the data in the std::map, and they have the advantage of very fast lookup times for keys. A tree is also different from a map because, ultimately, the map is an array of keys with a data component that is in one sorted list. A tree, by contrast, has multiple levels and recursing through that structure presents many challenges, especially if you are trying to maintain STL compatibility.

Back to the subject at hand - an n-ary tree is damn useful. It can represent an XML file or a 3d scenegraph, both of which I work with a great deal. It offers a high degree of flexibility when organizing information - think of the file system on your computer. It has directories that contain files, each with a string to label them. This is an n-ary tree.

Now, the tree implementations I found online were just unsuitable. The first one was Kasper Peeters', and it is used in a bunch of open-source stuff. This one was good enough to build a prototype, but fell down when it came to the iterators - it was difficult to create the iterator I wanted. In addition, you had to use two different methods to insert nodes - insert(), for branches at the root, and insert_child() for other nodes. This is because inserting was kind of STL style, and added the new node before end(), which was a sibling iterator.

Long story short, it was unreliable in production. I also looked at Justin Gottschlich's core::tree<>. I was very un-impressed, as he has defined each tree<>::iterator as a whole tree. In my tests, this was just as hard to work with, to the point of being unusable - I felt both trees were unreliable.

To be fair to both of these guys, there may not be enough sample code to understand how to implement these trees properly. Gottschlich decided to spend an inappropriate amount of time in his first article discussing how futile it is to build a tree from stl::maps, which is pretty obvious - he would have been much better served by going over the intricacies of his tree in the context of a relevant use case instead of generating integers down each branch of his sample tree.

Thus, I am writing my own tree.

Wednesday, October 17, 2007

Back to the grind

It's been a while since I have posted. I am hoping to keep this going - my exploration of coding.

Today, we are talking about classes. Object oriented design promotes the class as the end-all, be-all. Java, for example, requires everything to be a class - there are no static functions. They must be assigned a class.

Obviously, static functions are useful. And in a language like Python, you can have both. So when is the best time to use classes?

For many applications, there is a simple test for when to use classes: does the operation require state data? 'State data' is another way of saying 'global variables'. 'Global variables' is another way of saying 'parameters I don't want to pass in because it is not easy to do so'.

In short, if you are writing a function that would need, say 12 parameters, that is a good candidate for a class. The more parameters, the more of a pain it is to pass them all in, and the more error prone it is to pass them in correctly.

If you have a function that needs to tie together more than 5 parameters, it probably needs to be in a class or use a class to encapsulate the parameters. Five is might even be too many. Sometimes it pays to be lazy.

Not that it is lazy, but object oriented programming is good at making objects that know how to take care of their data - so use it. It is better to work with objects that know how to set their values, can recognize when their values are out of bounds, etc. It frees you to concentrate on developing the higher-level operations that are really the meat of what you are trying to do anyway.

Sunday, June 10, 2007

Ideas for Development

So I was trying to come up with some ideas to guide me in the fast-paced world of production development. Some these are probably common sense, but I think it will help me to put them down.

As I've thought about it, I have come to one central conclusion that I feel should guide my tool development:

All tools should teach the user how to use them

In a production environment, where there are many people working on different parts of the same project, there is no time for communication. In an ideal world, there would be - people would get together over coffee and talk it all out. But, the world isn't ideal, and so we move forward, stumbling at times, dealing with issues and problems that could have been avoided if everyone were on the same page.

So, I've come to the conclusion that software should be its own guide - it should be its own documentation, not only in the code, but in its usage. This is not easy, however. It takes time to build in documentation, it takes effort to present information to the user that keeps them informed, and time is a scarce resource.

Absent of any concrete strategy to implement this idea, I am first going to try and outline some points that can serve as guidelines to shape my development.

Enforce Assumptions with Errors - I have lost count of the times a tool was designed to a spec that made assumptions, but the assumption was just that 'the user will know this'. This never happens - all assumptions should be enforced with errors. As a matter of fact, assumptions should be listed in any spec that is developed. Assumptions should be checked before actions whenever possible.
Communicate all errors with dialogs when possible - this may not make as much sense, because in the windowing environment, we are used to it. But, for someone with more technical savvy, I have gotten used to reading errors on the command line.
Attempt to inform the user how to correct the error - Again, common sense. The challenge with this one is that knowledge of a particular system may not be widespread, and it takes time to reference all the information that the user may need to correct the error.

On reading it, it is pretty common sense. I sometimes try to put things down, even if they are common sense, because the work environment is so hectic that it is hard to think straight. I am constantly pulled in many directions, to the point where common sense becomes a luxury.

Friday, January 26, 2007

Contracts and You

I am a neophyte at all this programming hoo-ha, but I have heard the term "Contract Based Programming." The idea of the contract is a good one, and I think it can be applied to many levels of software design. You can think of software as a collection of little subcontractors - you tell the application what you want, it delegates to objects and functions, and hopefully you get the desired result. Just like contractors, tho, code may not adhere to the contract.

Think about the software you use daily. A word processor, for example, has set some implicit contracts with the user at the highest level of abstraction. One such contract is "the software will not modify the copy of a file on disk without being explicitly told to do so." As far as I know, this contract is not explicitly stated anywhere - it is a kind of common law contract that has evolved with software. No-one would expect anything different from a piece of software.

Similarly, cut, copy and paste have certain guarantees in any software worth its salt. Copy guarantees that it will take whatever is selected and place it on the clipboard. Paste does the opposite - it takes the information on the clipboard and does its best to insert it into whatever document is active. Copy will not remove what is selected - that is its guarantee. Cut will.

So software is filled with implicit contracts. I ran full steam into this lately. I upgraded the animation system that our project was running on. I wanted to make some new tools available for the animators, but, in my zeal to make the most Nifty-Neat Thing Evar^(tm), I violated a number of implicit contracts with the animators.

What I failed to realize is that, once a piece of software is rolled out and people get to using it, EVERY feature is drafted into an implicit contract with the user. The name-remapper is expected to work the way people are used to it working. The kicker is, that even if you make it better, you are changing the contract.

This leads me to my newest axiom: change is bad. And it is. The only time change is beneficial is when it is immediately and blatantly obvious that the new change provides a more efficient means of operation (it also may be required for technical reasons, but that is outside the scope of this topic.) Consider a widget that loads files. It begins life as a simple text field, that people type into to load files. you would get people entering files like so:

c:/the_project/animation/characters/bob/cycles/bob_run_cycle.anim

This is a pain in the ass and prone to error. So, being the enterprising engineer you are, seeking to be adored by hordes of grateful animators/groupies, you add a file browser option.

WHAM! This is better! It is so obviously better - clicking on file folders is way better than manually entering text paths. It is less prone to error, and the artist gets to see what files exist in that path, etc. It is obviously better. This is a change that you could probably roll out, and be welcomed as a savior by many an over-worked animator.

But, like any engineer that possesses even a modicum of passion for his craft, you want to make it even better. So you realize that all the animators are saving their files in similar areas for each character. So you decide to introduce a widget that looks at each character's directories for animation.

WHAM! Wait... What's this? Where are my files going? Are you sure I'll be able to find them again? This change is not obviously better. It is probably better - it filters out information that the animators don't need. But it isn't a standard. These animators work hard - sometimes they need to copy files to other directories, maybe back them up, rename files like deranged monkeys, etc. How do you do these things in this interface?

In this case, the interface is not bad, but it isn't standard. It isn't familiar. In keeping with the tone of this post, it does not adhere to the implicit contract of how files are accessed on the computer. The modern GUI and file browsers have established a contract with the user. To change this contract, or create a new one that replaces it, requires careful groundwork unless your interface is obviously better.

So be mindful of your users. They are usually busy and have enough things frustrating them without software changing underneath them. Even if the change is better.

Tuesday, January 16, 2007

The Right To Privacy

So why should there be private member variables and methods at all?

A very compelling reason is interface. When member-variables are private, the class that owns them is responsible for changing their values. Consider a fraction class:


//Psuedo-code - not valid C++
class Fraction
{
public:
Fraction(const double num, const double denom); //Construction
print(); // prints out fraction "numerator/denominator
double value(); // get the value of this fraction as a double

double numerator;
double denominator;
};

In this instance, the class is merely a tuple binding two numeric values into a convenient package. A typical use might be:


Fraction f(2, 3); //two-thirds
Fraction g(1, 2); //One half

g.print();
// 1/2

Simple enough, I hope. Ideally, there would be operators to handle addition, subtraction, etc. What is significant about this class is that the numerator and denominator are held seperately in order to print as a fraction - which seems reasonable. The value of the class is not calculated until requested via the value() method.

Setting the value of this class in easy because the numerator and denominator are doubles, and because all the members of the class are public, we can just set the values directly. This is analagous to a C-style struct or a Python object.


f.numerator = 5;
f.denominator = 4;

f.print(); // Result: 5/4

So, everything is hunky dory. We can make fractions, set their values all day long. Damn, I'm happy.

Then one day, we realize that someone is setting the denominator as follows:


f.denominator(0.0);  // oh, crap!
f.value(); //divide by zero!

Only they aren't just flat out setting it to 0.0, like an idiot, they are parsing some data from somewhere and, due to a parsing or rounding error, they are getting 0.0 in the denominator.

Now, the response on comp.lang.python would no doubt be "Duh! You are supposed to be a smart programmar! Check to see if your value is zero before you set it!" And they would be right, to a certain extent.

In practice, however, people are often rushed. They may have made a mistake deep inside some code, and didn't realize there was a condition in which this error could occur. Let's say they are setting the values of thousands of fractions in many different source files. They would never realize that the denominator was 0 until the value() function was called, in which case the run-time engine would probably throw an exception.

In such a case, let's also say that you adopt a policy that, if a user attempts to set the denominator to zero, it will instead be set to a very small double value.

But now you have code littered with 'f.denominator = x;' calls. These can't be intercepted because they are setting the value directly. How do you enforce this constraint on the object? You can do a global search and replace, but that is probably as tedious and error-prone as it sounds.

This all goes away with accessor methods and private variables. If I am designing this class, I can ensure that the class takes care of its own data. If I had a setDenominator function, it could decide what to do if someone sets bad data. It could throw an exception, make the data match the closest valid value, or just not change the data at all, depending on what I need it to do within the context of my application.

If I had used a setDenominator function, I would have this throughout my code:


f.setDenominator(2);
//Result: 5/2

I wouldn't have to go back through all the code and make sure that all explicit .denominator calls were valid. My setDenominator function would probably look like this:

setDenominator(double d)
{
if(d == 0)
 d = .000001; //some small value approximating 0
*this.denominator = d;
}

In this way, I have used the interface to allow the object to manage its own data. The object can evaluate the data and take an appropriate course of action. This frees me from worrying about that everytime I need to set the denominator of an object and allows me to concentrate more on how I am going to use Fractions in my application.

I think this reason alone is justification for private variables and members. It's not that I don't trust that anyone using the class is a good programmer - even a junior programmer knows about divide by zero. The fact is, I am a busy programmer. I don't want to nursemaid every little part of a large class hierarchy if I can get the computer to do it for free. It takes a little more up-front work, but it can save alot of headache down the road.

Friday, January 05, 2007

Python v1.0

So I am learning Python. There are good and bad things about it.

Now don't get me wrong - I dislike Python. Sure, you can do alot of things with it, and it is hands-down more powerful than MEL and, once Maya 8.5 rolls out, I will probably use it alot more than MEL. But it has alot of issues that, while not being insurmountable, are problematic in a programming lanuage.

Bad Things:

Member variables are not private: Which begs the question, why call them classes if they are basically structs? Protecting access to members is one of the key mechanisms that makes OO programming worthwhile, IMO. (I have since learned that the double underscore '__member' will 'hide' a variable. I have been told that this does not prevent anyone from accessing it)
Evaluation Of Functions Used As Parameter Defaults On Loading: When a default param value calls a function, it gets the return value of the function at the time it is loaded into the interpreter, not when the function is actually called. For example, def logTime(time = getCurrentTime()): will set time to whatever time it was when the interpreter evals the function, apparently. Could this be called "too much closure?" Perhaps, but it most certainly is "annoying".
Self: My god, all the selfing makes me want to actually shoot myself in the foot with the hundred or so objects I could instantiate to do it in C++, as the saying goes. I don't believe any keyword in a programming language should be used in code as much as 'the' is in English.

Python is not all bad. It has some really cool features:

Iterator Generators: Using the 'yield' keyword, these turn up the next item in a sequence. These are very cool as they can be written outside of the containing object.
Reflection/Type Introspection: This is very nice - it is not true that C++ lacks type introspection, but it doesn't have a very developed system. Python allows extensive querying of objects.
Libraries: Python has alot of kickass libraries, from regex to image manipulation

I think the biggest detriment to Python are the people who evengelize it. What strikes me is an attitude of superiority, as evidenced in thread about member variables being private. Instead of addressing the reason why it is not in the language, many of the Python faithful seek to justify that it is not needed (I believe they are wrong, and that would need to be the subject of another post). This need to establish Python as 'the best' is annoying and counterproductive.

My first foray into Python has been relatively productive so far.