In the previous lesson on The copy constructor and overloading the assignment operator, you learned about the differences and similarities of the copy constructor and the assignment operator. This lesson is a follow-up to that one.
Shallow copying
Because C++ does not know much about your class, the default copy constructor and default assignment operators it provides use a copying method known as a shallow copy (also known as a memberwise copy). A shallow copy means that C++ copies each member of the class individually using the assignment operator. When classes are simple (eg. do not contain any dynamically allocated memory), this works very well.
For example, let’s take a look at our Cents class:
class Cents
{
private:
int m_nCents;
public:
Cents(int nCents=0)
{
m_nCents = nCents;
}
};
When C++ does a shallow copy of this class, it will copy m_nCents using the standard integer assignment operator. Since this is exactly what we’d be doing anyway if we wrote our own copy constructor or overloaded assignment operator, there’s really no reason to write our own version of these functions!
However, when designing classes that handle dynamically allocated memory, memberwise (shallow) copying can get us in a lot of trouble! This is because the standard pointer assignment operator just copies the address of the pointer — it does not allocate any memory or copy the contents being pointed to!
Let’s take a look at an example of this:
class MyString
{
private:
char *m_pchString;
int m_nLength;
public:
MyString(char *pchString="")
{
// Find the length of the string
// Plus one character for a terminator
m_nLength = strlen(pchString) + 1;
// Allocate a buffer equal to this length
m_pchString= new char[m_nLength];
// Copy the parameter into our internal buffer
strncpy(m_pchString, pchString, m_nLength);
// Make sure the string is terminated
m_pchString[m_nLength-1] = '\\0';
}
~MyString() // destructor
{
// We need to deallocate our buffer
delete[] m_pchString;
// Set m_pchString to null just in case
m_pchString = 0;
}
char* GetString() { return m_pchString; }
int GetLength() { return m_nLength; }
};
The above is a simple string class that allocates memory to hold a string that we pass in. Note that we have not defined a copy constructor or overloaded assignment operator. Consequently, C++ will provide a default copy constructor and default assignment operator that do a shallow copy.
Now, consider the following snippet of code:
MyString cHello("Hello, world!");
{
MyString cCopy = cHello; // use default copy constructor
} // cCopy goes out of scope here
std::cout << cHello.GetString() << std::endl; // this will crash
While this code looks harmless enough, it contains an insidious problem that will cause the program to crash! Can you spot it? Don’t worry if you can’t, it’s rather subtle.
Let’s break down this example line by line:
MyString cHello("Hello, world!");
This line is harmless enough. This calls the MyString constructor, which allocates some memory, sets cHello.m_pchString to point to it, and then copies the string “Hello, world!” into it.
MyString cCopy = cHello; // use default copy constructor
This line seems harmless enough as well, but it’s actually the source of our problem! When this line is evaluated, C++ will use the default copy constructor (because we haven’t provided our own), which does a shallow pointer copy on cHello.m_pchString. Because a shallow pointer copy just copies the address of the pointer, the address of cHello.m_pchString is copied into cCopy.m_pchString. As a result, cCopy.m_pchString and cHello.m_pchString are now both pointing to the same piece of memory!
} // cCopy goes out of scope here
When cCopy goes out of scope, the MyString destructor is called on cCopy. The destructor deletes the dynamically allocated memory that both cCopy.m_pchString and cHello.m_pchString are pointing to! Consequently, by deleting cCopy, we’ve also (inadvertently) affected cHello. Note that the destructor will set cCopy.m_pchString to 0, but cHello.m_pchString will be left pointing to the deleted (invalid) memory!
std::cout << cHello.GetString() << std::endl; // this will crash
Now you can see why this crashes. We deleted the string that cHello was pointing to, and now we are trying to print the value of memory that is no longer allocated.
The root of this problem is the shallow copy done by the copy constructor — doing a shallow copy on pointer values in a copy constructor or overloaded assignment operator is almost always asking for trouble.
Deep copying
The answer to this problem is to do a deep copy on any non-null pointers being copied. A deep copy duplicates the object or variable being pointed to so that the destination (the object being assigned to) receives it’s own local copy. This way, the destination can do whatever it wants to it’s local copy and the object that was copied from will not be affected. Doing deep copies requires that we write our own copy constructors and overloaded assignment operators.
Let’s go ahead and show how this is done for our MyString class:
// Copy constructor
Mystring::Mystring(const Mystring& cSource)
{
// because m_nLength is not a pointer, we can shallow copy it
m_nLength = cSource.m_nLength;
// m_pchString is a pointer, so we need to deep copy it if it is non-null
if (cSource.m_pchString)
{
// allocate memory for our copy
m_pchString = new char[m_nLength];
// Copy the string into our newly allocated memory
strncpy(m_pchString, cSource.m_pchString, m_nLength);
}
else
m_pchString = 0;
}
As you can see, this is quite a bit more involved than a simple shallow copy! First, we have to check to make sure cSource even has a string (line 8). If it does, then we allocate enough memory to hold a copy of that string (line 11). Finally, we have to manually copy the string using strncpy() (line 14).
Now let’s do the overloaded assignment operator. The overloaded assignment operator is a tad bit trickier:
// Assignment operator
MyString& Mystring::operator=(const Mystring& cSource)
{
// check for self-assignment
if (this == &cSource)
return *this;
// first we need to deallocate any value that this string is holding!
delete[] m_pchString;
// because m_nLength is not a pointer, we can shallow copy it
m_nLength = cSource.m_nLength;
// now we need to deep copy m_pchString
if (cSource.m_pchString)
{
// allocate memory for our copy
m_pchString = new char[m_nLength];
// Copy the parameter the newly allocated memory
strncpy(m_pchString, cSource.m_pchString, m_nLength);
}
else
m_pchString = 0;
return *this;
}
Note that our assignment operator is very similar to our copy constructor, but there are three major differences:
- We added a self-assignment check (line 5).
- We return *this so we can chain the assignment operator (line 26).
- We need to explicitly deallocate any value that the string is already holding (line 9).
When the overloaded assignment operator is called, the item being assigned to may already contain a previous value, which we need to make sure we clean up before we assign memory for new values. For non-dynamically allocated variables (which are a fixed size), we don’t have to bother because the new value just overwrite the old one. However, for dynamically allocated variables, we need to explicitly deallocate any old memory before we allocate any new memory. If we don’t, the code will not crash, but we will have a memory leak that will eat away our free memory every time we do an assignment!
Checking for self-assignment
In our overloaded assignment operators, the first thing we do is check for self assignment. There are two reasons for this. One is simple efficiency: if we don’t need to make a copy, why make one? The second reason is because not checking for self-assignment when doing a deep copy will cause problems if the class uses dynamically allocated memory. Let’s take a look at an example of this.
Consider the following overloaded assignment operator that does not do a self-assignment check:
// Problematic assignment operator
MyString& MyString::operator=(const MyString& cSource)
{
// Note: No check for self-assignment!
// first we need to deallocate any value that this string is holding!
delete[] m_pchString;
// because m_nLength is not a pointer, we can shallow copy it
m_nLength = cSource.m_nLength;
// now we need to deep copy m_pchString
if (cSource.m_pchString)
{
// allocate memory for our copy
m_pchString = new char[m_nLength];
// Copy the parameter the newly allocated memory
strncpy(m_pchString, cSource.m_pchString, m_nLength);
}
else
m_pchString = 0;
return *this;
}
What happens when we do the following?
cHello = cHello;
This statement will call our overloaded assignment operator. The this pointer will point to the address of cHello (because it’s the left operand), and cSource will be a reference to cHello (because it’s the right operand). Consequently, m_pchString is the same as cSource.m_pchString.
Now look at the first line of code that would be executed: delete[] m_pchString;.
This line is meant to deallocate any previously allocated memory in cHello so we can copy the new string from the source without a memory leak. However, in this case, when we delete m_pchString, we also delete cSource.m_pchString! We’ve now destroyed our source string, and have lost the information we wanted to copy in the first place. The rest of the code will allocate a new string, then copy the uninitialized garbage in that string to itself. As a final result, you will end up with a new string of the correct length that contain garbage characters.
The self-assignment check prevents this from happening.
Preventing copying
Sometimes we simply don’t want our classes to be copied at all. The best way to do this is to add the prototypes for the copy constructor and overloaded operator= to the private section of your class.
class MyString
{
private:
char *m_pchString;
int m_nLength;
MyString(const MyString& cSource);
MyString& operator=(const MyString& cSource);
public:
// Rest of code here
};
In this case, C++ will not automatically create a default copy constructor and default assignment operator, because we’ve told the compiler we’re defining our own functions. Furthermore, any code located outside the class will not be able to access these functions because they’re private.
Summary
- The default copy constructor and default assignment operators do shallow copies, which is fine for classes that contain no dynamically allocated variables.
- Classes with dynamically allocated variables need to have a copy constructor and assignment operator that do a deep copy.
- The assignment operator is usually implemented using the same code as the copy constructor, but it checks for self-assignment, returns *this, and deallocates any previously allocated memory before deep copying.
- If you don’t want a class to be copyable, use a private copy constructor and assignment operator prototype in the class header.
10.1 — Constructor initialization lists
|
Index
|
9.11 — The copy constructor and overloading the assignment operator
|
10.1 — Constructor initialization lists
Index
9.11 — The copy constructor and overloading the assignment operator
I’d like to note, to avoid problems with inherited classes, one have to make destructor and assignment operators to be virtual.
– serg.
“// Problematic copy constructor” ==> “//Problematic assigment operator”?
[ Fixed! Thanks. -Alex ]
In the last example, when we say delete [] m_pchString, the program is likely to crash because of
if (cSource.m_pchString)
or
strncpy(m_pchString, cSource.m_pchString, m_nLength)?
When we delete a pointer using delete [], does it still contain a valid memory address?
In the last example, we omitted the check for self-assignment. This means that if the user does a self-assignment, m_pchString will equal cString.m_pchString. When we delete m_pchString, we will also delete cString.m_pchString.
When we delete a pointer using delete or delete[], it does not contain a valid memory address any longer. However, that does not necessarily mean the program will crash if you use the pointer for reading.
Consequently, in the last example, the if statement will succeed because cString.m_pchString is non-zero. It will then allocate new memory for m_pchString (and cSource.m_pchString). Then it copies cSource.m_pchString, which is that uninitialized new memory (aka. garbage) into m_pchString, which is itself.
The end result of the self-assignment without the self-assignment check is that you’ll get a new string of the correct length, but it will be filled with garbage.
In that case how can the program crash? Also, it’s still counterintuitive for a pointer not to contain a valid memory address but point to some garbage region and be copiable.
The way the above code without the self-assignment check is written, the program won’t crash (it just produces an unexpectedly incorrect result, which in some ways can be worse). I changed the wording of the text to be more accurate in that regard. However, any time you are dealing with a pointer to unallocated memory, you are “living on the edge” so to speak and are just begging for a crash, corrupted memory, or some other bit of nastiness to happen.
I should be slightly more clear about my terminology — when I said “valid memory address” in the above comment, I really meant “allocated memory address”. Also, to be a bit more clear, accessing an unassigned bit of memory won’t necessarily cause a crash. It may or it may not, depending on where the piece of memory is and whether the operating system has protected that memory. Windows, for example, prevents programs from reading and writing to certain memory addresses in order to prevent malicious code from overwriting parts of the operating system. If you try to do that, you will get an access violation.
For example, consider the following program:
Hello Alex -
In the code box after the paragraph that begins with the sentence “Now let’s do the overloaded assignment operator.”, I think line 1 of the code should be:
Not:
Thanks.
[ Good eye. Fixed! Thanks. -Alex ]