Search

17.1 — std::string and std::wstring

The standard library contains many useful classes -- but perhaps the most useful is std::string. std::string (and std::wstring) is a string class that provides many operations to assign, compare, and modify strings. In this chapter, we’ll look into these string classes in depth.

Note: C-style strings will be referred to as “C-style strings”, whereas std::strings (and std::wstring) will be referred to simply as “strings”.

Motivation for a string class

In a previous lesson, we covered C-style strings, which using char arrays to store a string of characters. If you’ve tried to do anything with C-style strings, you’ll very quickly come to the conclusion that they are a pain to work with, easy to mess up, and hard to debug.

C-style strings have many shortcomings, primarily revolving around the fact that you have to do all the memory management yourself. For example, if you want to assign the string “hello!” into a buffer, you have to first dynamically allocate a buffer of the correct length:

Don’t forget to account for an extra character for the null terminator!

Then you have to actually copy the value in:

Hopefully you made your buffer large enough so there’s no buffer overflow!

And of course, because the string is dynamically allocated, you have to remember to deallocate it properly when you’re done with it:

Don’t forget to use array delete instead of normal delete!

Furthermore, many of the intuitive operators that C provides to work with numbers, such as assignment and comparisons, simply don’t work with C-style strings. Sometimes these will appear to work but actually produce incorrect results -- for example, comparing two C-style strings using == will actually do a pointer comparison, not a string comparison. Assigning one C-style string to another using operator= will appear to work at first, but is actually doing a pointer copy (shallow copy), which is not generally what you want. These kinds of things can lead to program crashes that are very hard to find and debug!

The bottom line is that working with C-style strings requires remembering a lot of nit-picky rules about what is safe/unsafe, memorizing a bunch of functions that have funny names like strcat() and strcmp() instead of using intuitive operators, and doing lots of manual memory management.

Fortunately, C++ and the standard library provide a much better way to deal with strings: the std::string and std::wstring classes. By making use of C++ concepts such as constructors, destructors, and operator overloading, std::string allows you to create and manipulate strings in an intuitive and safe manner! No more memory management, no more weird function names, and a much reduced potential for disaster.

Sign me up!

String overview

All string functionality in the standard library lives in the <string> header file. To use it, simply include the string header:

There are actually 3 different string classes in the string header. The first is a templated base class named basic_string<>:

You won’t be working with this class directly, so don’t worry about what traits or an Allocator is for the time being. The default values will suffice in almost every imaginable case.

There are two flavors of basic_string<> provided by the standard library:

These are the two classes that you will actually use. std::string is used for standard ascii (utf-8) strings. std::wstring is used for wide-character/unicode (utf-16) strings. There is no built-in class for utf-32 strings (though you should be able to extend your own from basic_string<> if you need one).

Although you will directly use std::string and std::wstring, all of the string functionality is implemented in the basic_string<> class. String and wstring are able to access that functionality directly by virtue of being templated. Consequently, all of the functions presented will work for both string and wstring. However, because basic_string is a templated class, it also means the compiler will produce horrible looking template errors when you do something syntactically incorrect with a string or wstring. Don’t be intimidated by these errors; they look far worse than they are!

Here’s a list of all the functions in the string class. Most of these functions have multiple flavors to handle different types of inputs, which we will cover in more depth in the next lessons.

Function Effect
Creation and destruction
(constructor)
(destructor)
Create or copy a string
Destroy a string
Size and capacity
capacity()
empty()
length(), size()
max_size()
reserve()
Returns the number of characters that can be held without reallocation
Returns a boolean indicating whether the string is empty
Returns the number of characters in string
Returns the maximum string size that can be allocated
Expand or shrink the capacity of the string
Element access
[], at() Accesses the character at a particular index
Modification
=, assign()
+=, append(), push_back()
insert()
clear()
erase()
replace()
resize()
swap()
Assigns a new value to the string
Concatenates characters to end of the string
Inserts characters at an arbitrary index in string
Delete all characters in the string
Erase characters at an arbitrary index in string
Replace characters at an arbitrary index with other characters
Expand or shrink the string (truncates or adds characters at end of string)
Swaps the value of two strings
Input and Output
>>, getline()
<<
c_str
copy()
data()
Reads values from the input stream into the string
Writes string value to the output stream
Returns the contents of the string as a NULL-terminated C-style string
Copies contents (not NULL-terminated) to a character array
Returns the contents of the string as a non-NULL-terminated character array
String comparison
==, !=
<, <=, > >=
compare()
Compares whether two strings are equal/unequal (returns bool)
Compares whether two strings are less than / greater than each other (returns bool)
Compares whether two strings are equal/unequal (returns -1, 0, or 1)
Substrings and concatenation
+
substr()
Concatenates two strings
Returns a substring
Searching
find
find_first_of
find_first_not_of
find_last_of
find_last_not_of
rfind
Find index of first character/substring
Find index of first character from a set of characters
Find index of first character not from a set of characters
Find index of last character from a set of characters
Find index of last character not from a set of characters
Find index of last character/substring

Iterator and allocator support
begin(), end()
get_allocator()
rbegin(), rend()
Forward-direction iterator support for beginning/end of string
Returns the allocator
Reverse-direction iterator support for beginning/end of string

Note: The above table will look funny if your browser is too narrow

While the standard library string classes provide a lot of functionality, there are a few notable omissions:

  • Regular expression support
  • Constructors for creating strings from numbers
  • Capitalization / upper case / lower case functions
  • Case-insensitive comparisons
  • Tokenization / splitting string into array
  • Easy functions for getting the left or right hand portion of string
  • Whitespace trimming
  • Formatting a string sprintf style
  • Conversion from utf-8 to utf-16 or vice-versa

For most of these, you will have to either write your own functions, or convert your string to a C-style string (using c_str()) and use the C functions that offer this functionality.

In the next lessons, we will look at the various functions of the string class in more depth. Although we will use string for our examples, everything is equally applicable to wstring.

17.2 -- std::string construction and destruction
Index
16.4 -- STL algorithms overview

30 comments to 17.1 — std::string and std::wstring

  • Nice section on strings, they’ve been unbelievably useful for the app I’m working on at the moment. Just one thing I noticed, the Function/Effect tables Size and Capacity section seems to be a little muddled up.

    [ Thanks for the note. The table has been fixed! -Alex ]

  • Pintsize

    Hi Alex… thanks for the tutorials!! 🙂

    Maybe you should add a whitespace so that it’s easyly understandable that push_back adds an elemnent in the end. As it is now… all the functions and explanations below are in different lines wich might be confusing.

    mmm… Maybe you could do it in general… It will probably be easier to quickly reference that way.

    Also, how about adding that .at[] performs a check to see if the element exists? I know you probably explained it in the access section but it migth come handy here for quick reference (again :P)

  • Puneet

    [code]
    Hello Alex,

    Please help me in the following, I have a string in following format “AB CD EF GH IJ “, here aplhabets are seperated by variable number of spaces. I want to write a program that reads this string and replaces multiple spaces between alphabets with a single space. The string can be of any length.

    Also, please let me know if there is a inbuilt function for this?

    [code]

    • I don’t know if you figured this out yet Puneet but here is one way to do it.

  • Can i manipulate unicode text files using wstring and fstream? And, is there any way to print unicode charatcers in console?

  • Brad

    Great Tutorials!
    Just one quick doubt. You have

    written, when I think you mean

    .

  • octavsly

    For the ones which are as confused as I was

    template<class charT, …………..
    Why class? shouldn't be typename?

    class or typename can be exchanged as described in section 14.1

    <<<<<<<>>>>>>>>>

  • octavsly

    To create a template type parameter, use either the keyword typename or class. There is no difference between the two keywords in this context, and you will usually see people use the class keyword.

  • andalusy

    nice article, i just finished bookmarking it for future reference. i would love to read on future posts. how do i configure the rss again? thanks!
    hguhf
    htbl

  • ice making machine

    Greetings from Ohio! I’m bored to tears at work so I decided to check out your website on my iphone during lunch break. I really like the information you provide here and can’t wait to take a look when I get home. I’m shocked at how fast your blog loaded on my phone .. I’m not even using WIFI, just 3G .. Anyhow, very good blog!

    ice machine,ice block machine

  • Nice section on strings, they’ve been unbelievably useful for the app I’m working on at the moment. Just one thing I noticed, the Function/Effect tables Size and Capacity section seems to be a little muddled up.

  • Why class? shouldn’t be typename?

  • Hi Alex… thanks for the tutorials!!

  • nice article, i just finished bookmarking it for future reference. so thanks

  • they’ve been unbelievably useful for the app I’m working on at the moment. Just one thing I noticed, the Function/Effect tables Size and Capacity section seems to be a little muddled up.

    • Alex

      Yeah, looks like the table doesn’t scale down correctly at small sizes. I’ll have to fix that when I rewrite the lesson. Thanks for pointing that out.

  • Squalus

    Thanks for tutorial, however I think that some terms (used to explain this code) are incorrect:

    1. There are two specializations of basic_string<> provided by the standard library
                    
    I don’t think they are specializations, if they were specializations it will mean that they are different template "stencils", like std::vector and std::vector<bool>, even if you don’t see diference when you use them, std::vector<bool> was "stenciled" from different template. They are rather just typedef-names.

    2. functionality is actually implemented in the basic_string<> class and then inherited by string and wstring                                                                      

    As I wrote they are just aliases for classes, so they do not inherit something, they already have it. 🙂

    Have a nice day.

  • Nice section on strings

  • SJ

    Hey Alex,

    Looks like you’re missing the description for push_back() under the "Modification" part of your table, messing up the corresponding descriptions that follow. I believe that is what someone above was alluding to.

    • Alex

      I see. This only happens if the table gets squashed due to word wrapping. I’ve put in a short term fix (disable word wrapping) for now, but I’ll have to reformat the table at a future date to fix “the right way”.

      • Darren

        (Twitter) Bootstrap for responsive web pages. Go to http://www.w3schools.com/ for a very decent tutorial.
        Alternatively http://getskeleton.com/ can be used and is light-weight.

  • Sourabh S. Rawat

    Correction:
    In the table , under the size and capacity portion, it is written reverse() rather than resize() for "Expand or shrink the capacity of the string".

  • Sourabh S. Rawat

    If I resize a string , e.g. string name = "HomeIsHere" by name.resize(2). But then if  I use [] to deference 5th character like name[4], then it shows ‘e’ but when I use at() to get 5th character , the program crashes.
    So, my question is do resizing does not delete the characters which are greater than the bound given to resize() function.

    • Alex

      It’s unclear to me how this is implemented internally. In all likelihood, when the size is lowered, the string size is being changed but the string data itself is left alone (there’s no reason to clear it). [] doesn’t do any bounds checking, so it’s likely still accessing the valid string. However, at() does do bounds checking, so you’re likely hitting an assert() statement (because technically you’re doing an out of bounds access even though the string data is still there), which is terminating your program.

Leave a Comment

Put C++ code inside [code][/code] tags to use the syntax highlighter