How to filter rows containing a string pattern from a Pandas dataframe [duplicate]

We Are Going To Discuss About How to filter rows containing a string pattern from a Pandas dataframe . So lets Start this Python Article.

How to filter rows containing a string pattern from a Pandas dataframe

How to solve How to filter rows containing a string pattern from a Pandas dataframe [duplicate]

In [3]: df[df['ids'].str.contains("ball")] Out[3]: ids vals 0 aball 1 1 bball 2 3 fball 4

How to filter rows containing a string pattern from a Pandas dataframe [duplicate]

In [3]: df[df['ids'].str.contains("ball")] Out[3]: ids vals 0 aball 1 1 bball 2 3 fball 4

Solution 1

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids  vals
0  aball     1
1  bball     2
3  fball     4

Original Author Amit Of This Content

Solution 2

df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1

Step-by-step explanation (from inner to outer):

  • df['ids'] selects the ids column of the data frame (technically, the object df['ids'] is of type pandas.Series)
  • df['ids'].str allows us to apply vectorized string methods (e.g., lower, contains) to the Series
  • df['ids'].str.contains('ball') checks each element of the Series as to whether the element value has the string ‘ball’ as a substring. The result is a Series of Booleans indicating True or False about the existence of a ‘ball’ substring.
  • df[df['ids'].str.contains('ball')] applies the Boolean ‘mask’ to the dataframe and returns a view containing appropriate records.
  • na = False removes NA / NaN values from consideration; otherwise a ValueError may be returned.

Original Author Jubbles Of This Content

Solution 3

>>> mask = df['ids'].str.contains('ball')    
>>> mask
0     True
1     True
2    False
3     True
Name: ids, dtype: bool

>>> df[mask]
     ids  vals
0  aball     1
1  bball     2
3  fball     4

Original Author user3820991 Of This Content

Solution 4

If you want to set the column you filter on as a new index, you could also consider to use .filter; if you want to keep it as a separate column then str.contains is the way to go.

Let’s say you have

df = pd.DataFrame({'vals': [1, 2, 3, 4, 5], 'ids': [u'aball', u'bball', u'cnut', u'fball', 'ballxyz']})

       ids  vals
0    aball     1
1    bball     2
2     cnut     3
3    fball     4
4  ballxyz     5

and your plan is to filter all rows in which ids contains ball AND set ids as new index, you can do

df.set_index('ids').filter(like='ball', axis=0)

which gives

         vals
ids          
aball       1
bball       2
fball       4
ballxyz     5

But filter also allows you to pass a regex, so you could also filter only those rows where the column entry ends with ball. In this case you use

df.set_index('ids').filter(regex='ball

Note that now the entry with ballxyz is not included as it starts with ball and does not end with it.

If you want to get all entries that start with ball you can simple use

df.set_index('ids').filter(regex='^ball', axis=0)

yielding

         vals
ids          
ballxyz     5

The same works with columns; all you then need to change is the axis=0 part. If you filter based on columns, it would be axis=1.

, axis=0) vals ids aball 1 bball 2 fball 4

Note that now the entry with ballxyz is not included as it starts with ball and does not end with it.

If you want to get all entries that start with ball you can simple use


yielding


The same works with columns; all you then need to change is the axis=0 part. If you filter based on columns, it would be axis=1.

Original Author Cleb Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment