C++ String to Int

In this post I will compare the following methods for parsing a string into an integer in C++:

  • Manually
  • atoi()
  • strtol()
  • sscanf()
  • std::stoi (C++11 only)
  • std::istringstream
  • Boost.LexicalCast
  • Boost.LexicalCast with C locale
  • Boost.Spirit.Qi
  • Boost.Coerce

But first lets look at the requirements I was looking for in a string-to-int parsing method.

Note: Your requirements may differ from my requirements.

  1. Correct — obviously the most important requirement! If it doesn’t work, it’s useless.
  2. Safe — nobody wants buffer overruns.
  3. Detectable Failure — if the string can’t be parsed, there should be a way of determining this.
  4. Base-10 — the solution shouldn’t magically switch the radix. For example “011″ should still parse as base 10, and “0×01″ should fail to parse.
  5. Match All — any trailing characters should cause the parsing to fail. For example “123a” should fail to parse.
  6. Fail on Overflow — parsing should fail if the integer overflows. For example “9999999999″ should fail to parse on a system where int 32-bits.
  7. Type Agnostic — the same solution should be able to parse a 16, 32 or 64 bit integer, either signed or unsigned.
  8. Locale Independent — Not an obvious requirement. In my case, “123,000″ should fail even if the current locale allows it. For user interaction this requirement would be different. If you control the entire program and never change the locale, this is a non-issue.
  9. Fast — performance may not be important depending on your project’s requirements.

Manually

#include <string>

// Does this handle all the edge cases? Who knows...
// Better make sure you test it thoroughly if you write it yourself!
bool String2Int(const std::string& str, int& result)
{
	std::string::const_iterator i = str.begin();

	if (i == str.end())
		return false;

	bool negative = false;

	if (*i == '-')
	{
		negative = true;
		++i;

		if (i == str.end())
			return false;
	}

	result = 0;

	for (; i != str.end(); ++i)
	{
		if (*i < '0' || *i > '9')
			return false;

		result *= 10;
		result += *i - '0';
	}

	if (negative)
	{
		result = -result;
	}

	return true;
}

String to int? Easy just write it yourself. By definition, this solution can handle any requirements you need, but will take time to implement and test. I have included a naive solution in the performance tests for comparison purposes.

int atoi(const char* str)

#include <string>
#include <cstddef>

bool String2Int(const std::string& str, int& result)
{
    result = std::atoi(str.c_str());
    return true;
}

Part of the C standard library. Has quite a few problems though. The inability to detect failure makes this pretty much useless for real-world programs.

  1. Correct — Part of the C standard library, so should be thoroughly tested.
  2. Safe — Requires strings to be NULL-terminated. If you’re using std::string::c_str() this won’t be a problem.
  3. Detectable Failure — No.
  4. Base-10 — Yes.
  5. Match All — No.
  6. Fail on Overflow — No.
  7. Type Agnostic — There are various versions such as atol/atoll/atof and templates could be used to use the correct version depending on the type. There are no unsigned versions in the standard library though.
  8. Locale Independent — No.
  9. Fast — See performance comparison.

long strtol(const char* str, char** endptr, int base)

#include <cstdlib>
#include <climits>

// Updated with range checks thanks to /u/anttirt
bool String2Int(const std::string& str, int& result)
{
    char* endPtr = 0;
    errno = 0;

    const long longval = std::strtol(str.c_str(), &endPtr, 10);

    if ((longval == LONG_MIN || longval == LONG_MAX) && errno == ERANGE)
    {
        return false;
    }

    if (sizeof(long) > sizeof(int)) // let the optimizer do its job
    {
        if (longval > INT_MAX || longval < INT_MIN) // needed for example on linux x64
            return false;
    }

    result = static_cast<int>(longval);
    return endPtr == str.c_str() + str.size(); // ensure the whole string was parsed
}

Part of the C standard library. Much better than atoi. Probably the best C standard library method.

  1. Correct — Part of the C standard library, so should be thoroughly tested.
  2. Safe — Requires strings to be NULL-terminated. If you’re using std::string::c_str() this won’t be a problem. Also endptr loses the const-ness of str, which is a minor issue.
  3. Detectable Failure — Yes, can inspect endptr (2nd argument).
  4. Base-10 — Yes.
  5. Match All — Yes, can inspect endptr.
  6. Fail on Overflow — Yes, can inspect errno.
  7. Type Agnostic — There are various versions such as strtoul/strtoull and templates could be used to use the correct version depending on the type. The 64-bit functions require C++11.
  8. Locale Independent — No.
  9. Fast — See performance comparison.

int sscanf(const char * s, const char * format, …)

#include <string>
#include <cstdio>

bool String2Int(const std::string& str, int& result)
{
	return sscanf(str.c_str(), "%d", &result) == 1;
}

Part of the C standard library. Worse than strtod in terms of error detection.

  1. Correct — Part of the C standard library, so should be thoroughly tested.
  2. Safe — Requires strings to be NULL-terminated. If you’re using std::string::c_str() this won’t be a problem. Uses varargs so argument types are not validated.
  3. Detectable Failure — Yes.
  4. Base-10 — Yes.
  5. Match All — No.
  6. Fail on Overflow — No.
  7. Type Agnostic — Can use different format strings for different types and templates could be used to use the correct version depending on the type.
  8. Locale Independent — No.
  9. Fast — See performance comparison.

std::stoi

#include <string>

bool String2Int(const std::string& str, int& result)
{
	try
	{
		std::size_t lastChar;
		result = std::stoi(str, &lastChar, 10);
		return lastChar == str.size();
	}
	catch (std::invalid_argument&)
	{
		return false;
	}
	catch (std::out_of_range&)
	{
		return false;
	}
}

New function provided in C++11, probably the best native C++ solution if you’re using C++11.

  1. Correct — Part of the C++ Standard Library, so should be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes, throws an exception.
  4. Base-10 — Yes.
  5. Match All — Yes, can inspect pos (2nd argument).
  6. Fail on Overflow — Yes, throws an exception.
  7. Type Agnostic — There are other functions provided (std::stoll/std::stoull, etc). Templates could be used to use the correct version depending on the type
  8. Locale Independent — No.
  9. Fast — See performance comparison.

std::istringstream

#include <string>
#include <sstream>

bool String2Int(const std::string& str, int& result)
{
	std::istringstream ss(str);
	ss.imbue(std::locale::classic());
	ss >> result;
	return !ss.fail() && ss.eof();
}

A native C++ solution. Slow as hell though.

  1. Correct — Part of the C++ Standard Library, so should be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes.
  4. Base-10 — Yes.
  5. Match All — Yes.
  6. Fail on Overflow — Yes.
  7. Type Agnostic — Yes.
  8. Locale Independent — Yes, but only with line 7 (imbue).
  9. Fast — See performance comparison.

Boost.LexicalCast

#include <string>
#include <boost/lexical_cast.hpp>

bool String2Int(const std::string& str, int& result)
{
	try
	{
		result = boost::lexical_cast(str);
		return true;
	}
	catch (boost::bad_lexical_cast&)
	{
		return false;
	}
}

A native C++ solution. Matches all requirements. It’s also trivial to write the opposite int-to-string function. Only problem is locale independence.

  1. Correct — Part of the Boost C++ Library, so should be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes, throws an exception.
  4. Base-10 — Yes.
  5. Match All — Yes.
  6. Fail on Overflow — Yes.
  7. Type Agnostic — Yes.
  8. Locale Independent — No. For example, “123,000″ will parse as 123000 on some locales.
  9. Fast — See performance comparison.

Boost.LexicalCast with C locale

#include <string>
#define BOOST_LEXICAL_CAST_ASSUME_C_LOCALE 1
#include <boost/lexical_cast.hpp>

bool String2Int(const std::string& str, int& result)
{
	try
	{
		result = boost::lexical_cast(str);
		return true;
	}
	catch (boost::bad_lexical_cast&)
	{
		return false;
	}
}

Boost.LexicalCast but with BOOST_LEXICAL_CAST_ASSUME_C_LOCALE defined to force C locale always. If you are going to use BOOST_LEXICAL_CAST_ASSUME_C_LOCALE, make sure you define it globally otherwise it may not work.

  1. Correct — Part of the Boost C++ Library, so should be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes, throws an exception.
  4. Base-10 — Yes.
  5. Match All — Yes.
  6. Fail on Overflow — Yes.
  7. Type Agnostic — Yes.
  8. Locale Independent — Yes.
  9. Fast — See performance comparison.

Boost.Spirit.Qi

#include <string>
#include <boost/spirit/include/qi_parse.hpp>
#include <boost/spirit/include/qi_numeric.hpp>

bool String2Int(const std::string& str, int& result)
{
	std::string::const_iterator i = str.begin();
	if (!boost::spirit::qi::parse(i, str.end(), boost::spirit::int_, result))
		return false;
	return i == str.end(); // ensure the whole string was parsed
}

Another native C++ solution, this time using the the Boost.Spirit.Qi parsing library. You can use Boost.Spirit.Karma to write the opposite int-to-string function.

  1. Correct — Part of the Boost C++ Library, so should be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes.
  4. Base-10 — Yes.
  5. Match All — Yes.
  6. Fail on Overflow — Yes.
  7. Type Agnostic — Can use different parsing primitives (boost::spirit::int_) for different types; templates could be used to use the correct version depending on the type.
  8. Locale Independent — Yes.
  9. Fast — See performance comparison.

Boost.Coerce

#include <string>
#include <boost/coerce.hpp>

bool String2Int(const std::string& str, int& result)
{
	try
	{
		result = boost::coerce::as<int>(str);
		return true;
	}
	catch (boost::coerce::bad_cast&)
	{
		return false;
	}
}

Boost.Coerce is not an official part of Boost yet. It uses boost::spirit::qi::parse behind the scenes.

  1. Correct — Not yet part of the Boost C++ Library, so may not be thoroughly tested.
  2. Safe — Yes.
  3. Detectable Failure — Yes.
  4. Base-10 — Yes.
  5. Match All — Yes.
  6. Fail on Overflow — Yes.
  7. Type Agnostic — Yes.
  8. Locale Independent — Yes.
  9. Fast — See performance comparison.

Performance Comparison

Built using Visual Studio 2010 with “/Ox /Ob2″ (full optimization, inline suitable) with 10,000,000 strings. I also ran the tests again with the locale set to “English” as opposed to “C”.

Method “C” locale “English” locale
μs Factor μs Factor
Manual* 157263 0.57
atoi() 275598 1.00 402978 1.46
strtol() 409886 1.48 545347 1.98
sscanf 1030016 3.74 2949973 10.7
std::stoi 856279 3.10 2708494 9.83
std::istringstream 11540661 41.87 14348828 52.06
Boost.LexicalCast 927043 3.36 3088007 11.20
Boost.LexicalCast with C locale 367073 1.33
Boost.Spirit.Qi 143444 0.52
Boost.Coerce 184420 0.67

* Manual method doesn’t detect overflow and may have other errors.

Conclusion

  • In general, C standard library functions atoi, strtol and sscanf should be avoided as they lack versatile error detection and have safety issues.
  • If you’re using C++11 and don’t want to use 3rd party libraries, std::stoi is a good method if you don’t care about locale independence.
  • Boost.LexicalCast is the most well-known method available, but has problems when the locale is not default. Internally it has special cases for parsing to and from numerical types, so it is much faster than std::istringstream.
  • Boost.Coerce has a good usability with only a very minor overhead over Boost.Spirit, but is not yet an official part of Boost. I wish it provided a non throwing API returning a bool as well like “bool(const Source&, Target&)”.
  • For raw performance, Boost.Spirit seems to be the best short of hand-tweaking your own implementation.
This entry was posted in Code. Bookmark the permalink.

Comments are closed.