python - Extracting the first day of month of a datetime type column in pandas -


i have following dataframe:

user_id    purchase_date    1        2015-01-23 14:05:21   2        2015-02-05 05:07:30   3        2015-02-18 17:08:51   4        2015-03-21 17:07:30   5        2015-03-11 18:32:56   6        2015-03-03 11:02:30 

and purchase_date datetime64[ns] column. need add new column df[month] contains first day of month of purchase date:

df['month'] 2015-01-01 2015-02-01 2015-02-01 2015-03-01 2015-03-01 2015-03-01 

i'm looking date_format(purchase_date, "%y-%m-01") m in sql. have tried following code:

     df['month']=df['purchase_date'].apply(lambda x : x.replace(day=1)) 

it works somehow returns: 2015-01-01 14:05:21.

simpliest , fastest convert numpy array values , cast:

df['month'] = df['purchase_date'].values.astype('datetime64[m]') print (df)    user_id       purchase_date      month 0        1 2015-01-23 14:05:21 2015-01-01 1        2 2015-02-05 05:07:30 2015-02-01 2        3 2015-02-18 17:08:51 2015-02-01 3        4 2015-03-21 17:07:30 2015-03-01 4        5 2015-03-11 18:32:56 2015-03-01 5        6 2015-03-03 11:02:30 2015-03-01 

another solution floor , pd.offsets.monthbegin(0):

df['month'] = df['purchase_date'].dt.floor('d') - pd.offsets.monthbegin(1) print (df)    user_id       purchase_date      month 0        1 2015-01-23 14:05:21 2015-01-01 1        2 2015-02-05 05:07:30 2015-02-01 2        3 2015-02-18 17:08:51 2015-02-01 3        4 2015-03-21 17:07:30 2015-03-01 4        5 2015-03-11 18:32:56 2015-03-01 5        6 2015-03-03 11:02:30 2015-03-01 

df['month'] = (df['purchase_date'] - pd.offsets.monthbegin(1)).dt.floor('d') print (df)    user_id       purchase_date      month 0        1 2015-01-23 14:05:21 2015-01-01 1        2 2015-02-05 05:07:30 2015-02-01 2        3 2015-02-18 17:08:51 2015-02-01 3        4 2015-03-21 17:07:30 2015-03-01 4        5 2015-03-11 18:32:56 2015-03-01 5        6 2015-03-03 11:02:30 2015-03-01 

last solution create month period to_period:

df['month'] = df['purchase_date'].dt.to_period('m') print (df)    user_id       purchase_date   month 0        1 2015-01-23 14:05:21 2015-01 1        2 2015-02-05 05:07:30 2015-02 2        3 2015-02-18 17:08:51 2015-02 3        4 2015-03-21 17:07:30 2015-03 4        5 2015-03-11 18:32:56 2015-03 5        6 2015-03-03 11:02:30 2015-03 

... , datetimes to_timestamp, bit slowier:

df['month'] = df['purchase_date'].dt.to_period('m').dt.to_timestamp() print (df)    user_id       purchase_date      month 0        1 2015-01-23 14:05:21 2015-01-01 1        2 2015-02-05 05:07:30 2015-02-01 2        3 2015-02-18 17:08:51 2015-02-01 3        4 2015-03-21 17:07:30 2015-03-01 4        5 2015-03-11 18:32:56 2015-03-01 5        6 2015-03-03 11:02:30 2015-03-01 

there many solutions, so:

timings:

rng = pd.date_range('1980-04-03 15:41:12', periods=100000, freq='20h') df = pd.dataframe({'purchase_date': rng})   print (df.head())  in [300]: %timeit df['month1'] = df['purchase_date'].values.astype('datetime64[m]') 100 loops, best of 3: 9.2 ms per loop  in [301]: %timeit df['month2'] = df['purchase_date'].dt.floor('d') - pd.offsets.monthbegin(1) 100 loops, best of 3: 15.9 ms per loop  in [302]: %timeit df['month3'] = (df['purchase_date'] - pd.offsets.monthbegin(1)).dt.floor('d') 100 loops, best of 3: 12.8 ms per loop  in [303]: %timeit df['month4'] = df['purchase_date'].dt.to_period('m').dt.to_timestamp() 1 loop, best of 3: 399 ms per loop  #maxu solution in [304]: %timeit df['month5'] = df['purchase_date'].dt.normalize() - pd.offsets.monthbegin(1) 10 loops, best of 3: 24.9 ms per loop  #maxu solution 2 in [305]: %timeit df['month'] = df['purchase_date'] - pd.offsets.monthbegin(1, normalize=true) 10 loops, best of 3: 28.9 ms per loop  #wen solution in [306]: %timeit df['month6']= pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01') 1 loop, best of 3: 214 ms per loop 

Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -