Why is it faster to compare strings that match than strings that do not?

We Are Going To Discuss About Why is it faster to compare strings that match than strings that do not? . So lets Start this Python Article.

Why is it faster to compare strings that match than strings that do not?

  1. How to solve Why is it faster to compare strings that match than strings that do not?

    Combining my comment and the comment by @khelwood:
    TL;DR:
    When analysing the bytecode for the two comparisons, it reveals the 'time' and 'time' strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.
    The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only 'name characters' (i.e. alpha and underscore characters). This enables the object's identity check.

    Bytecode:
    import dis In [24]: dis.dis("'time'=='time'") 1 0 LOAD_CONST 0 ('time') # <-- same object (0) 2 LOAD_CONST 0 ('time') # <-- same object (0) 4 COMPARE_OP 2 (==) 6 RETURN_VALUE In [25]: dis.dis("'time'=='1234'") 1 0 LOAD_CONST 0 ('time') # <-- different object (0) 2 LOAD_CONST 1 ('1234') # <-- different object (1) 4 COMPARE_OP 2 (==) 6 RETURN_VALUE

    Assignment Timing:
    The 'speed-up' can also be seen in using assignment for the time tests. The assignment (and compare) of two variables to the same string, is faster than the assignment (and compare) of two variables to different strings. Further supporting the hypothesis the underlying logic is performing an object comparison. This is confirmed in the next section.
    In [26]: timeit.timeit("x='time'; y='time'; x==y", number=1000000) Out[26]: 0.0745926329982467 In [27]: timeit.timeit("x='time'; y='1234'; x==y", number=1000000) Out[27]: 0.10328884399496019

    Python source code:
    As helpfully provided by @mkrieger1 and @Masklinn in their comments, the source code for unicodeobject.c performs a pointer comparison first and if True, returns immediately.
    int _PyUnicode_Equal(PyObject *str1, PyObject *str2) { assert(PyUnicode_CheckExact(str1)); assert(PyUnicode_CheckExact(str2)); if (str1 == str2) { // <-- Here return 1; } if (PyUnicode_READY(str1) || PyUnicode_READY(str2)) { return -1; } return unicode_compare_eq(str1, str2); }

    Appendix:
    Reference answer nicely illustrating how to read the disassembled bytecode output. Courtesy of @Delgan
    Reference answer which nicely describes CPython's string interning. Coutresy of @ShadowRanger

  2. Why is it faster to compare strings that match than strings that do not?

    Combining my comment and the comment by @khelwood:
    TL;DR:
    When analysing the bytecode for the two comparisons, it reveals the 'time' and 'time' strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.
    The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only 'name characters' (i.e. alpha and underscore characters). This enables the object's identity check.

    Bytecode:
    import dis In [24]: dis.dis("'time'=='time'") 1 0 LOAD_CONST 0 ('time') # <-- same object (0) 2 LOAD_CONST 0 ('time') # <-- same object (0) 4 COMPARE_OP 2 (==) 6 RETURN_VALUE In [25]: dis.dis("'time'=='1234'") 1 0 LOAD_CONST 0 ('time') # <-- different object (0) 2 LOAD_CONST 1 ('1234') # <-- different object (1) 4 COMPARE_OP 2 (==) 6 RETURN_VALUE

    Assignment Timing:
    The 'speed-up' can also be seen in using assignment for the time tests. The assignment (and compare) of two variables to the same string, is faster than the assignment (and compare) of two variables to different strings. Further supporting the hypothesis the underlying logic is performing an object comparison. This is confirmed in the next section.
    In [26]: timeit.timeit("x='time'; y='time'; x==y", number=1000000) Out[26]: 0.0745926329982467 In [27]: timeit.timeit("x='time'; y='1234'; x==y", number=1000000) Out[27]: 0.10328884399496019

    Python source code:
    As helpfully provided by @mkrieger1 and @Masklinn in their comments, the source code for unicodeobject.c performs a pointer comparison first and if True, returns immediately.
    int _PyUnicode_Equal(PyObject *str1, PyObject *str2) { assert(PyUnicode_CheckExact(str1)); assert(PyUnicode_CheckExact(str2)); if (str1 == str2) { // <-- Here return 1; } if (PyUnicode_READY(str1) || PyUnicode_READY(str2)) { return -1; } return unicode_compare_eq(str1, str2); }

    Appendix:
    Reference answer nicely illustrating how to read the disassembled bytecode output. Courtesy of @Delgan
    Reference answer which nicely describes CPython's string interning. Coutresy of @ShadowRanger

Solution 1

Combining my comment and the comment by @khelwood:

TL;DR:
When analysing the bytecode for the two comparisons, it reveals the 'time' and 'time' strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.

The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only ‘name characters’ (i.e. alpha and underscore characters). This enables the object’s identity check.


Bytecode:

import dis

In [24]: dis.dis("'time'=='time'")
  1           0 LOAD_CONST               0 ('time')  # <-- same object (0)
              2 LOAD_CONST               0 ('time')  # <-- same object (0)
              4 COMPARE_OP               2 (==)
              6 RETURN_VALUE

In [25]: dis.dis("'time'=='1234'")
  1           0 LOAD_CONST               0 ('time')  # <-- different object (0)
              2 LOAD_CONST               1 ('1234')  # <-- different object (1)
              4 COMPARE_OP               2 (==)
              6 RETURN_VALUE

Assignment Timing:

The ‘speed-up’ can also be seen in using assignment for the time tests. The assignment (and compare) of two variables to the same string, is faster than the assignment (and compare) of two variables to different strings. Further supporting the hypothesis the underlying logic is performing an object comparison. This is confirmed in the next section.

In [26]: timeit.timeit("x='time'; y='time'; x==y", number=1000000)
Out[26]: 0.0745926329982467

In [27]: timeit.timeit("x='time'; y='1234'; x==y", number=1000000)
Out[27]: 0.10328884399496019

Python source code:

As helpfully provided by @mkrieger1 and @Masklinn in their comments, the source code for unicodeobject.c performs a pointer comparison first and if True, returns immediately.

int
_PyUnicode_Equal(PyObject *str1, PyObject *str2)
{
    assert(PyUnicode_CheckExact(str1));
    assert(PyUnicode_CheckExact(str2));
    if (str1 == str2) {                  // <-- Here
        return 1;
    }
    if (PyUnicode_READY(str1) || PyUnicode_READY(str2)) {
        return -1;
    }
    return unicode_compare_eq(str1, str2);
}

Appendix:

  • Reference answer nicely illustrating how to read the disassembled bytecode output. Courtesy of @Delgan
  • Reference answer which nicely describes CPython’s string interning. Coutresy of @ShadowRanger

Original Author S3DEV Of This Content

Solution 2

It’s not always faster to compare strings that match. Instead, it’s always faster to compare strings that share the same id. A proof that identity is indeed the reason of this behavior (as @S3DEV has brilliantly explained) is this one:

>>> x = 'toto'
>>> y = 'toto'
>>> z = 'totoo'[:-1]
>>> w = 'abcd'
>>> x == y
True
>>> x == z
True
>>> x == w
False
>>> id(x) == id(y)
True
>>> id(x) == id(z)
False
>>> id(x) == id(w)
False
>>> timeit.timeit('x==y', number=100000000, globals={'x': x, 'y': y})
3.893762200000083
>>> timeit.timeit('x==z', number=100000000, globals={'x': x, 'z': z})
4.205321462000029
>>> timeit.timeit('x==w', number=100000000, globals={'x': x, 'w': w})
4.15288594499998

It’s always faster to compare objects having the same id (as you can notice from the example, the comparison between x and z is slower compared to the comparison between x and y, and that’s because x and z do not share the same id).

Original Author Riccardo Bucco Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment