跳转至

熊猫数据框的处理时间

原文:https://www.geesforgeks.org/processing-与熊猫相处的时间-dataframe/

熊猫是根据金融模型创建的,所以正如你所料,它包含了大量的工具来处理日期和时间。有时,我们的数据集中日期和时间的给定格式不能直接用于分析,因此我们预处理这些时间值以获得日期、月份、年份、小时、分钟和秒等特征。 让我们讨论熊猫数据框处理日期和时间的所有不同方法。 将日期和时间划分为多个特征: 使用 pd.date_range 创建五个日期和时间,生成固定频率的日期和时间跨度序列。然后我们用熊猫。Series.dt 提取特征。

蟒蛇 3

# Load library
import pandas as pd

# calling DataFrame constructor
df = pd.DataFrame()

# Create 6 dates
df['time'] = pd.date_range('2/5/2019', periods = 6, freq ='2H')
print(df['time'])  # print dataframe

# Extract features - year, month, day, hour, and minute
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month
df['day'] = df['time'].dt.day
df['hour'] = df['time'].dt.hour
df['minute'] = df['time'].dt.minute

# Show six rows
df.head(6)

输出:

0   2019-02-05 00:00:00
1   2019-02-05 02:00:00
2   2019-02-05 04:00:00
3   2019-02-05 06:00:00
4   2019-02-05 08:00:00
5   2019-02-05 10:00:00
Name: time, dtype: datetime64[ns]

                time  year  month  day  hour  minute
0 2019-02-05 00:00:00  2019      2    5     0       0
1 2019-02-05 02:00:00  2019      2    5     2       0
2 2019-02-05 04:00:00  2019      2    5     4       0
3 2019-02-05 06:00:00  2019      2    5     6       0
4 2019-02-05 08:00:00  2019      2    5     8       0
5 2019-02-05 10:00:00  2019      2    5    10       0

将字符串转换为时间戳: 我们使用 pd.to_datetime 将给定的字符串转换为 datetime 格式,然后使用第一种方法从 datetime 中提取不同的特征。

蟒蛇 3

# Load libraries
import numpy as np
import pandas as pd

# Create time Strings
dt_strings = np.array(['04-03-2019 12:35 PM',
                       '22-06-2017 11:01 AM',
                       '05-09-2009 07:09 PM'])

# Convert to datetime format
timestamps = [pd.to_datetime(date, format ="%d-%m-%Y%I:%M %p",
                     errors ="coerce") for date in dt_strings]

# Convert to datetimes
timestamps = [pd.to_datetime(date, format ="%d-%m-%Y %I:%M %p",
                      errors ="coerce") for date in dt_strings]

输出:

【时间戳(' 2019-03-04 12:35:00 ')、时间戳(' 2017-06-22 11:01:00 ')、时间戳(' 2009-09-05 19:09:00 ')】

从给定日期中提取一周中的天数: 我们使用 Series.dt.weekday_name 从给定日期中查找一周中的一天的名称。

蟒蛇 3

# Load library
import pandas as pd
df = pd.DataFrame()

# Create 6 dates
dates = pd.pd.Series(date_range('2/5/2019', periods = 6, freq ='M'))

print(dates)

# Extract days of week and then print
print(dates.dt.weekday_name)

输出:

0   2019-02-28
1   2019-03-31
2   2019-04-30
3   2019-05-31
4   2019-06-30
5   2019-07-31
dtype: datetime64[ns]
0     Thursday
1       Sunday
2      Tuesday
3       Friday
4       Sunday
5    Wednesday
dtype: object

提取日期和时间范围内的数据: 我们可以从给定的数据集中获取特定时间范围内的行。 方法#1: 如果数据集没有用时间索引。

蟒蛇 3

# Load library
import pandas as pd

# Create data frame
df = pd.DataFrame()

# Create datetimes
df['date'] = pd.date_range('1/1/2012', periods = 1000, freq ='H')

print(df.head(5))

# Select observations between two datetimes
x = df[(df['date'] > '2012-1-1 01:00:00') &
       (df['date'] <= '2012-1-1 11:00:00')]

print(x)

输出:

                 date
0 2012-01-01 00:00:00
1 2012-01-01 01:00:00                // 5 rows of Timestamps out of 1000
2 2012-01-01 02:00:00
3 2012-01-01 03:00:00
4 2012-01-01 04:00:00

                 date
2  2012-01-01 02:00:00
3  2012-01-01 03:00:00
4  2012-01-01 04:00:00
5  2012-01-01 05:00:00               //Timestamps in the given range
6  2012-01-01 06:00:00
7  2012-01-01 07:00:00
8  2012-01-01 08:00:00
9  2012-01-01 09:00:00
10 2012-01-01 10:00:00
11 2012-01-01 11:00:00

方法 2: 如果数据集是用时间索引的

蟒蛇 3

# Load library
import pandas as pd

# Create data frame
df = pd.DataFrame()

# Create datetimes
df['date'] = pd.date_range('1/1/2012', periods = 1000, freq ='H')

# Set index
df = df.set_index(df['date'])

print(df.head(5))

# Select observations between two datetimes
x = df.loc['2012-1-1 04:00:00':'2012-1-1 12:00:00']

print(x)

输出:

                                   date
date                                   
2012-01-01 00:00:00 2012-01-01 00:00:00
2012-01-01 01:00:00 2012-01-01 01:00:00
2012-01-01 02:00:00 2012-01-01 02:00:00
2012-01-01 03:00:00 2012-01-01 03:00:00                // 5 rows of Timestamps out of 1000
2012-01-01 04:00:00 2012-01-01 04:00:00
                                   date
date                                   
2012-01-01 04:00:00 2012-01-01 04:00:00
2012-01-01 05:00:00 2012-01-01 05:00:00
2012-01-01 06:00:00 2012-01-01 06:00:00
2012-01-01 07:00:00 2012-01-01 07:00:00
2012-01-01 08:00:00 2012-01-01 08:00:00
2012-01-01 09:00:00 2012-01-01 09:00:00               //Timestamps in the given range
2012-01-01 10:00:00 2012-01-01 10:00:00
2012-01-01 11:00:00 2012-01-01 11:00:00
2012-01-01 12:00:00 2012-01-01 12:00:00


回到顶部