Patterns

Patterns – Prototypes and Optimization

The hardest interview I ever had: someone told me to go up to a whiteboard and solve a programming problem, in code, optimally, on the first try.

It’s a basic fact of our field that we iterate toward the final product. Not unlike a sculptor bringing the David out of a piece of marble, we go from a rough piece of code toward a refined piece of code in a series of stages.

Rough Outline

We usually start with a piece of pseudocode that generally describes what we want a function to do. This can be written in basically any format (text, outline, pseudocode, diagram, etc.), but it has to detail the basic elements we expect to find in our function.

Specifically, our outline serves these two purposes:

  • Expose functionality for better breakdown
  • Reveal the interface elements that we might want or need

If we skip this step entirely, we tend to spend a lot of time reworking our basic code as we discover features we wanted to include.

Exposed Interface with Dynamic Allocation

Next, we write the first draft of our code. At this stage, we are basically limited to the information provided by the rough outline.

Because we don’t necessarily know what size our final data will be, we dynamically allocate space using malloc() or calloc(). This allows us to quickly modify the size and range of our data as we determine what elements we will include in the final product. This serves the secondary purpose of preserving our outline, as we have not yet optimized away extraneous variables required for simple reading.

Furthermore, because we don’t yet know how many of our parameters will be standardized (through precompiler definitions or static variables), we want to take as many of them as we can into account. At this stage, our interfaces can be incredibly daunting, because they allow us to tweak EVERY parameter through the function call. We also want to pass pointers into these functions, but we’re not sure whether to make them const, so there’s a risk of silly errors.

Note: At this stage, we might start defining structures to carry these parameters, just to reduce the size of the interfaces.

Refined Interface with Dynamic Allocation

As we move on, we begin to determine the size of our parameters and the scope of our data. While we’re not yet ready to jump to static allocation, we are becoming aware of what that statically allocated data might look like.

We are also ready to restrict the interface, as we have determined which parameters we intend to manipulate and which we will set to default.

There are two approaches to entering this stage:

  • Rewrite your functions to restrict the interface to the core function
  • Write wrapper functions to restrict the interface that the user deals with

Generally speaking, it’s safer and easier to go with the latter.

Refined Interface with Static Allocation

Now we’ve reached the point where we basically know what our data will look like. We’ve made the tough calls, determined the final flow of the code, and settled on the size and types of data we’ll employ at each level.

Now we begin replacing dynamic allocation with static allocation. If you’ve structured your interfaces properly, this is about as simple as replacing all -> with . and removing all allocation and free operations.

Note: Don’t forget to clear the values in your static variables, especially if you were relying on calloc() to do that.

Minimal Interface with Static Allocation

Now we perform the true optimization stage.

Because we’ve finally settled exactly how our code will work, we can begin to restrict the inputs to our functions with const and similar keywords. This restricts our interfaces to the smallest and most restricted they can be, resulting in code that is reliable and easy for the end user to work with.

We also start to clean up our headers, removing any of the “old” function interfaces which are wrapped by the cleaner interfaces. This helps restrict the interfaces and prevents the end user from doing dangerous things.

We also start working with our variables. Depending on the level of optimization required, we start to do things like move things to global scope or reorganize code to reuse static variables (which I generally do not recommend if it makes the code harder to read).

This stage should produce a final set of code that you’re happy to run and maintain, but there is another possible layer of optimization you can employ…

Optional Library Optimization

This is the sort of thing programmers had to do in the early days, to maximize utility of tiny systems.

Here we start reusing variables on the register level, optimizing out log code (and other valuable information), and generally render the code impossible to maintain.

Compiling with optimization does much of this naturally.

Generally speaking, this is only recommended for code you are virtually 100% certain you will never have to touch again.

Lesson: Change and uncertainty is law in the earlier stages of program development. Leave your early code very open for changes, and optimize as the project reaches final definition.

Patterns: Shielding Inputs with const

One of the key worries I have heard from those ill-informed OOP programmers is that C cannot protect inputs you pass into functions. They use private fields and retrieval functions to ensure that the stored value is protected from unwanted modification.

However, this concern is addressed by C’s const keyword.

Taking a const value

There are times when we want to guarantee to all end users that our function will not modify particular values. Perhaps we are reading a stored key value in order to provide an access, but we want to ensure that nothing we do modifies that key.

If our function prototype and related documentation employ pass-by-reference (a common practice for large values) but do not employ the const keyword, the end user has no guarantee that the stored value will be the same after we’re done.

The difference between…


int func ( char * key)

…and…


int func( const char * const key )

…is that the first interface has full control of the key, while the second promises not to change either the pointer or the values at that pointer.

Creating a const value

Often enough, we want to create a value that never changes throughout operation. Perhaps it’s a static variable used to allocate arrays, or a static message we want to print out a number of times. In these cases, we use the const keyword to protect an initialized value from future changes. For example, we can create a constant integer value like this:


const int index = 256;

We can create a constant error message in the same way, but we usually make sure that we preserve both the pointer and the value stored therein:


const char * const error_message = "ERROR: Something done goofed\n";

Note: We have to assign the variable when we declare it, because it’s a constant value that we can never change again.

Keyword const rules

The const keyword prevents anyone from modifying a value. However, when dealing with pointers we actually have two values to preserve: the value of the pointer, and the value at that pointer.

The keyword const can be seen to preserve whatever comes right after it. That means that the statement…


const char *

…protects the pointer itself, while the statement…


char * const

…protects the value at that pointer. To preserve both, we use…


const char * const

Dangerous, but useful: Casting with const

We can actually protect our values by casting them to a const value. For example, if we know we don’t want a function to change something it can technically change (no const in the prototype), we can say something like this:


int func( char * key ) {}

char * value = "penny";

int i = func( (const char * const) value );

However, we can also cast away the const (which is where it gets dangerous). That means that the program can act as though the value is naturally unprotected. That looks something like this:

int func( const char * const key ) 
{
    (char *) key = (char *) calloc( 50, sizeof(char) );
}

Generally speaking, these actions are both considered unsafe. They can be extremely useful (for example, to free a const value inside of an allocated structure), but exercise extreme caution.

Lesson: The const keyword creates a contract in your code ensuring that the compiler protects the integrity of certain values.

Patterns: The If-Else Error Chain

In languages like Java, we have a standardized error-handling paradigm in the try-catch expression. Effectively, this hands all error-handling off to the computer, which monitors all code in the try loop for any and every kind of possible error. While we are able to restrict the range of errors in many cases, the fact is that this bloats the program in both memory footprint and time required to operate.

In C, we have no such paradigm, so we have to use alternative methods to handle errors.

Common Practices

There are any number of ways to deal with errors outside of a rigidly defined paradigm.

Some (like those who operate predominantly in C++) may define supervisor functions that simulate the try-catch expression. This is less than common, because in defining the supervisor function you usually begin to appreciate the range of all possibilities. When you start trying to handle every possible error with one megafunction, you start to appreciate the simplicity of catching errors manually.

The most common practice is to test the output of functions and related operations. Whenever you call an allocation function like malloc() or calloc(), you test the output to ensure that space was properly allocated. When you pass-by-reference into a function, you test the inputs to ensure that they make sense in the context of your program. Methods like these allow us to manually handle both the flow and the error-handling of our code.

However, in most cases we have a “multiple-breakout” pattern of tests. These patterns look something like this:

char * blueberry = (char * ) malloc(50*sizeof(char))
if(blueberry == NULL)
    return -1;
int pancake;
do_thing(blueberry, "there is stuff", pancake);
if(pancake < 0 || pancake > 534)
    return -2;
do_other_thing(pancake, time() );
if(pancake < 65536)
    return -3;
...
return 0;

This pattern runs the risk of terminating before memory is properly freed and parameters are properly reset. The only ways to avoid this terrible condition are to manually plug the cleanup into every error response (terrible) or to use goto (not terrible, but not strictly kosher).

If-Else Chain

There is one method for handling errors that is consistent with another pattern we cover (error-orientation): we build a great if-else chain of tests.

This pattern is confusing to many for two reasons:

  • If fundamentally reorients the code away from the “happy-case” paradigm (in which all error-handling is a branch off the main path) to a “failure case” paradigm (in which the happy-case is the result of every test in the chain failing)
  • All our happy-path code finds itself inside of an if() statement – nothing can be permitted to break the chain

It’s a bit hard to describe this pattern without an example, so bear with me:


int copy(const char * const input, int size, char * output)
{
    int code = 0;
    if( input == NULL )
    {
        code = -1;
    }
    else if ( output = (char *) malloc (size * sizeof(char) ) , output == NULL )
    {
        code = -2;
    }
    else if ( strncpy(output, input, size), 0 )
    {
        //impossible due to comma-spliced 0
    }
    else if (strncmp(output, input))
    {
        code = -3;
    }
    else if ( printf("%s\n", output) < 0 )
    {
        //printf returns the number of printed characters
        //Will only be less than 0 if write error occurs
        code = -4;
    }
    else
    {
        //could do something on successful case, but can't think of what that would be
    }
    //Normally we would pass output back, but let's just free him here for fun
    //This is where we do all our cleanup
    if(output != NULL)
        free(output);
    return code;
}

As we can see, each step in the error-handling function is treated as a possible error case, each with its own possible outcome. The only way to complete the function successfully is to have every error test fail.

Oh, and because this code is fundamentally modular, it is very easy to add and remove code by adding another else-if statement or removing one.

Lesson: Switching to an if-else chain can improve error awareness and accelerate your programs in operation, without requiring much additional time to design and code.

Patterns: Creation and Destruction Stack

Virtually 100% of all memory leaks are preventable, assuming the programmer knows where to look. In many cases, the leaks occur because a programmer has failed to destroy what he has created, but it can become more complicated than that.

Creation Stack: The Order Matters

In many cases, we allocate memory as we use it, and we keep it at a simple layer. I’ve nearly lost track of how many calls to calloc() or malloc() I’ve hard-coded in the past few months alone.

This simple allocation is great, because it lets us dynamically generate variables and store them. These variables can be simple (like char or int), or they can be structures of varied complexity.

The most complex structures we allocate require us to allocate within the struct. Consider the following structure:

struct potato

{

struct potato *next;

char * string;

int size;

}

This structure contains not one, but two pointers, and we need these pointers to point to something. Generally speaking, we would want to create a simple constructor that does something like this:

struct potato * make_potato()

{

struct potato * hi = calloc(sizeof(potato));

return hi;

}

This would initialize a potato with null pointers and a size of 0. However, it’s usually more useful for us to fill in some values on construction of a structure like this:

struct potato * make_potato(int input_size, const char * something)

{

struct potato * hi = calloc(sizeof(potato));

hi->size = input_size;

hi->string = calloc(size * sizeof(char));

strncpy( hi->string, something, size);

return hi;

}

In this case, we not only allocate a structure, but we allocate within the structure. This can get messy very quickly.

Destruction: Follow it up the stack

The rule to prevent memory leaks here is fairly simple: every destructor should be the inverse of its constructor.

If the constructor is simple, the destructor is equally simple:

int eat_potato(struct potato * hi)

{

free(hi);

return 0;

}

However, when the constructor gets more complex, we have to treat it like a stack. That means that we should run every destructor in the inverse order of construction, so that we make sure to get everything freed before all pointers to it are lost.

int eat_potato(struct potato * hi)

{

free(hi->string);

free(hi);

return hi;

}

Lesson: You build a building from the ground up, and you tear it down from the top. The same goes for structures.

Patterns: Logs are Instant Insight

Has this ever happened to you: a program crashes, and all you get is a black screen or a set of meaningless numbers?

I know it happens to me. Continue Reading…

Patterns: Comment Through History

Programmers are not unlike Dory from “Finding Nemo”, in that we have a limited memory. It seems like every time we sit down to code, and we are randomly struck by a lightning bolt of inspiration, we immediately lose it when we notice something shiny.

That’s an extreme example, sure, but programmers do have a problem of forgetfulness. It’s the nature of the job – we have so many tasks to perform and so many paths to track that we can’t possibly hold on to all our thoughts.

Thank God for comments.

Level 0 Comments: In-line

This practice, so common to college-age programmers, is often lost quickly in the “real world”. However, these comments are perhaps the most useful comments we can have.

After all, which is faster: tracing all the calls and variables in a block of code, or reading a short sentence describing the intended function?

While I generally recommend you not write useless comments (“This printf() function prints a line of text”), there are several key things you can do:

  • Outline your code BEFORE you start writing, then use that outline to walk your way back through it later
  • Explain why you chose one function over another
  • Describe the operation of a library-based function, so you don’t have to keep looking it up
  • Leave TODO markers in your code (vi will highlight these specifically, so they’re easy to find again)
  • Comment out a “bad” line, so that you can see what it did before the fix
  • Leave some tag that’s easy to search for later (like TODO, only without the convenient highlighting)
  • etc.

All of these comments improve readability by restoring some of the mindset you had when you were writing the code in the first place.

Level 1 Comments: Flowerboxes

Less common, but equally important, are flowerbox comments. These comments allow the author of a piece of code to relay more detailed information in a compact, highly-visible format.

There are a number of uses for flowerboxes:

  • Doxygen comments – these not only generate HTML documentation, but they also describe a function’s purpose, arguments, and return types inside of the code itself
    • I cannot recommend Doxygen-style commentary enough
    • Seriously, if you haven’t looked into it before, LOOK IT UP
  • Flow descriptions – these comments describe a higher-level flow for a function or program, allowing the programmer to quickly get a deeper sense of how the program is supposed to work
  • Disclaimers and Formalities – Want the world to know who designed the code, and what it’s for? Flowerboxes at the top of the page get it done
  • Detail an event or conversation relevant to the code – Maybe an offhand quote from a fellow programmer inspired the design of the next segment of code. Recording that information helps future programmers understand not just what the code is doing, but why you chose to do it that way

Level 2 Comments: Logs

Some of my more recent work contains fewer comments than I usually employ. This is because, instead of using inline commentary to describe an event, I print out a string detailing what is supposed to come next.

These are still basically comments, because they serve the purpose of a comment while providing information during runtime. It’s a win-win.

Level 3 Comments: Code Segments

Sometimes (usually) we decide to replace whole sections of code with new code. However, when we do a delete-and-replace, we run the risk of damaging functionality with no way to roll back the source.

Using flowerboxes or #if statements, we can make sure that the old code is safely kept away from the final product while allowing us to restore that functionality if the time comes.

Also, it’s interesting to see how the code has evolved over time.

Level 4 Comments: Extra Documentation

Strictly speaking, everything that happens during the development of a piece of code should be documented. All conversations, whiteboard diagrams, emails, flowcharts, and other documents should be retained, if only so you can see what decisions were made.

Lesson: We put comment features into our languages for a reason. Use them liberally to spare everyone a lot of time and effort.

Patterns: Too Much Information

How many times have I seen this pattern in object oriented code?

  • Public Class
  • All members of the class are private
  • All members can be accessed with accessor functions
  • All members can be changed with modifier functions

I hate to break it to y’all, but that’s just a struct. The only thing you’ve done is add a hundred lines to a simple process.

Access Functions: Only What You Need

The purpose of an access function is simple: to provide access to a portion of the data, while hiding the structure of the class.

As such, it only makes sense to provide access functions in two cases:

  • When it’s important to obscure how a structure works (due to complex operations, typedef abstraction, etc.)
  • When it’s important to restrict how much information you reveal

In either of those cases, you only need enough functions to suit the need. If you’re exposing the entire structure to the world, there was no reason to make anything private in the first place.

Modifier Functions: Only what you want to change

If your structure or class contains constants you set only once, why would you create explicit modifier functions for it?

Or, if you always modify a set of values simultaneously, why would you create individual modifier functions?

Think about your design before coding up modifier functions.

Constructors: Two Types

There are two reasons to create an explicit constructor function:

  • You have to allocate values inside of the new structure (a char *, for example)
  • You want to fill the new structure with default values of some kind (perhaps by input)

In the first case, you usually call either another constructor or an allocation function inside of the constructor, and don’t worry so much about anything else.

In the second case, you have two options:

  • Take no arguments, and fill in the variables with a general default value
  • Take some arguments, and fill in some variables with specific values

Note that I don’t list among the options “Fill every variable at once”. While not inherently bad, this pattern shows up with alarming frequency in poorly-designed code. I recommend that you consider exactly how many variables you need to fill with a non-default value on initialization.

Destructors: Only One Type

Destructor functions should always follow the same pattern:

  • Destroy any allocated values inside the structure
  • Destroy the structure

If your destructor does anything else (besides, perhaps, print something), you should reevaluate your design.

Lesson: Think before you do. Lots of programmers over-code, resulting in functions they neither want nor need.

Patterns: Names as Documentation

While it’s usually less of a problem in C, in my Java days I saw any number of functions with names like “solve” or “act”. These functions were usually overloaded, so that “solve” meant one thing for integers and a wholly different thing for strings. Continue Reading…

Patterns: Return values should mean something

I don’t know how many hundreds of functions I’ve dealt with with either a void or boolean return type. While a boolean return type at least tells us whether the function ever completed properly, a void return carries absolutely no meaning. Where’s the debugging or error-handling value in that?

Constructors

Constructor functions take a number of arguments and return a pointer to an allocated space. Even so, there is a simple rule for meaningful returns:

Constructors return either a newly allocated pointer or NULL

Of course, if you work with a slightly more complex constructor, you can return the pointer in a parameter. In these cases, you should still make the constructor return something meaningful.

Personally, in these cases, I use a tri-state return type. If the function is successful, I’ll return the size of the allocated space (in bytes). If it has an error, I’ll return a negative value correlating to the type or location of the failure. However, if the function works EXCEPT that the malloc does not successfully allocate any space, I’ll return 0.

Simple Access Functions

A simple access function returns some meaningful data value from a structure. In these cases, we return the value that we retrieved.

Anything that can possibly fail (even when it can’t)

If it’s possible for a function to fail, we return that value inside of a parameter and use the basic rule:

Return 0 on success and non-zero on error

This basic rule applies generally, across the board. Even when the operation has no chance of failure (a state which is less common as I construct better code), we return the 0 value.

Debugging and Logging errors

As systems grow in complexity, the number of locations where an operation could fail increases.

Further, as the size of a function increases, the number of ways it could fail increases.

When we employ meaningful return types, we create a path directly to the area where the problem occurred. So long as we know that function panda failed with an error value of -5, we know where the error was triggered (even if the system is thousands or millions of lines long). Even better, if we designed our return codes around specific tests, we know exactly how the function failed.

This means that, without ever touching the debugger, we have identified the location of our failure and can immediately begin determining the sequence of events that led to failure.

As Martha Stewart would say, “It’s a good thing”.

Patterns: Protect the Iterators!

Some patterns are more obvious than others. This is one of the more obvious patterns, but it’s violated often enough to deserve enunciation.

Iterator Abuse

Just about every program out there uses an iterator for one reason or another. Without them, we can’t build for loops, and more than a few while loops require iterators as well.

They let us perform simple operations on buffers and arrays.

They let us control flow of information.

Fundamentally, an iterator is a control line. It is essential that we maintain the integrity of our controls, because without that control we exponentially increase the complexity of the problem.

Iterator Abuse is the act of violating the integrity of an iterator, which destroys the line of control and makes the program act in complicated or unexpected ways. This abuse is performed in a number of ways:

  • Reusing the iterator before its control function has been completed
  • Modifying the iterator inside of the loop (usually by a non-standardized unit)
  • Passing the iterator into a function without protection
  • Using your only copy of a pointer as an iterator
  • etc.

What can you do?

Iterator abuse is one of the more easily preventable issues in our programs. We have to adhere to a few simple rules:

  1. Only use specially-marked values as iterators (the classic name is “i” or “j”)
  2. Only use iterators for iteration
  3. Iterate each value only ONCE per cycle (no i++; i+=j)
  4. In cases where we want to modify the iterator by a non-standardized unit (for example, by a number of bits equivalent to a structure), use extreme caution
  5. If iterating inside of a for loop, never modify the iterator
  6. If iterating over a pointer, iterate over a copy of that pointer instead of the original
  7. Either avoid non-standard loop termination (break, etc.) or avoid referencing the iterator outside of the loop
  8. Don’t pass iterators into functions which might change the iterator value

Lesson: Iterators are oft-abused values. Remember that their purpose is to establish simple loops, and don’t go crazy with them.

Facebook Auto Publish Powered By : XYZScripts.com