Python - pandas 之常用代码块[转]

博主： AIHGF
发布时间：2022 年 04 月 19 日
2124 次浏览
暂无评论
9182字数
分类： Python

原文：40 Useful Pandas Snippets Pandas snippets that come in handy in data analysis work - 2022.04.20

1. 过滤列数据

Filter columns.

只需要数据集的几列，如：

pd.read_csv("data.csv", usecols=["date", "price"])

2. 读取时解析日期

Parse dates on read.

pd.read_csv("data.csv", parse_dates=["date"])

3. 指定数据类型

Specify Data Types.

读取数据时，设置数据类型分类，以节省内存.

pd.read_csv("data.csv", dtype={"house_type": "category"})

4. 设置索引

Set index.

设置索引，尤其对于时间序列比较有用.

pd.read_csv("data.csv", index_col="date")

5. 读取的行数

No. of rows to read.

只读取数据集中的部分数据.

#100 行数据
pd.read_csv("data.csv", nrows=100)

6. 跳过行

Skip rows.

跳过某些行的数据，

#跳过第1行和第5行
pd.read_csv("data.csv", skiprows=[1, 5])

#跳过前100行
pd.read_csv("data.csv", skiprows=100)

#跳过 90% 的行
pd.read_csv("data.csv", skiprows=lambda x: x > 0 and np.random.rand() > 0.1)

7. 指定 NA 值

Specify NA Values.

如果数据的值有NA，即，如是 ? 等，将其设置为读取，以便以后不用再转换.

pd.read_csv("data.csv", na_values=["?"])

8. 设定布尔值

Setting boolean values.

如果某一列数据的格式是 Yes 和 No的形式，

pd.read_csv("data.csv", true_values=["yes"], false_values=["no"])

9. 读取多个文件

Read from multiple files.

如果数据被存储在多个文件里，

import glob
import os

files = glob.glob("file_*.csv")

result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)

10. 复制粘贴数据到DataFrames

Copy and Paste into Data Frames.

从剪切板复制粘贴数据，

 df = pd.read_clipboard()

11. 从PDF读取表格数据

Read tables from PDF files.

# pip install tabula-py

from tabula import read_pdf
# Read pdf into list of DataFrame
df = read_pdf('test.pdf', pages='all')

12. 探索性数据分析

EDA，Exploratory Data Analysis，探索性数据分析

EDA cheat.

想要可视化数据，但不想写plot代码，可以采用 pandas-profiling，仅需要一行代码，

# pip install pandas-profiling

import pandas_profiling

df = pd.read_csv("data.csv")

profile = df.profile_report(title="Pandas Profiling Report")
profile.to_file(output_file="output.html")

13. 通过 dtype 过滤列

dtype，Data Types. （list of dtypes for pandas）.

Filter columns by dtype.

# selecting
df.select_dtypes(include="number")
df.select_dtypes(include=["category", "datetime"])

# exluding
df.select_dtypes(exclude="object")

14. Infer dtype

df.infer_objects().dtypes

15. Downcasting

pd.to_numeric(df.numeric_col, downcast="integer") # smallest signed int dtype
pd.to_numeric(df.numeric_col, downcast="float")  # smallest float dtype

16. 手工转换

Manual conversion.

如果数据中有 NaN 值，error="coerce" 能够避免报错.

同时，可以采用 .fillna 将 NA 值填充为合理的值.

# apply to whole data frame
df = df.apply(pd.to_numeric, errors="coerce")

# apply to specific columns
pd.to_numeric(df.numeric_column, errors="coerce")

# filling NA values with zero
pd.to_numeric(df.numeric_column, errors="coerce").fillna(0)

17. 一次转换全部

Convert all at once.

df = df.astype(
    {
        "date": "datetime64[ns]",
        "price": "int",
        "is_weekend": "bool",
        "status": "category",
    }
)

18. 列重命名

Renaming columns.

df = df.rename({"PRICE": "price", "Date (mm/dd/yyyy)": "date"}, axis=1)

19. 增加前缀和后缀

Add suffix and prefix.

df.add_prefix("pre_")
df.add_suffix("_suf")

20. 创建新列

Create new columns.

# create new column of Fahrenheit values from Celcius
df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)

21. 指定位置插入列

Insert columns at specific positions.

random_col = np.random.randint(10, size=len(df))

df.insert(3, 'random_col', random_col) # inserts at third column

22. if-then-else

df["logic"] = np.where(df["price"] > 5, "high", "low")

23. Dropping columns

df.drop('col1', axis=1, inplace=True)
df = df.drop(['col1','col2'], axis=1)
s = df.pop('col')
del df['col']
df.drop(df.columns[0], inplace=True)

24. 字符串列名操作

Columns names

# on column names
df.columns = df.columns.str.lower()
df.columns = df.columns.str.replace(' ', '_')

25. 字符串Contains

df['name'].str.contains("John")

df['phone_num'].str.contains('...-...-....', regex=True)  # regex

df['email'].str.contains('gmail')

26. 字符串findall

pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'

df['email'].str.findall(pattern, flags=re.IGNORECASE)

27. 缺失值检查

Missing values Checking

def missing_vals(df):
    """prints out columns with perc of missing values"""
    missing = [
        (df.columns[idx], perc)
        for idx, perc in enumerate(df.isna().mean() * 100)
        if perc > 0
    ]

    if len(missing) == 0:
        return "no missing values"
        

    # sort desc by perc
    missing.sort(key=lambda x: x[1], reverse=True)

    print(f"There are a total of {len(missing)} variables with missing values\n")

    for tup in missing:
        print(str.ljust(f"{tup[0]:<20} => {round(tup[1], 3)}%", 1))

#
missing_vals(df)

如：

There are a total of 16 variables with missing values

PoolQC               => 100.0%
Alley                => 94.0%
MiscFeature          => 91.0%
Fence                => 77.0%
FireplaceQu          => 54.0%
LotFrontage          => 14.0%
GarageType           => 6.0%
GarageYrBlt          => 6.0%
GarageFinish         => 6.0%
GarageQual           => 6.0%
GarageCond           => 6.0%
BsmtQual             => 3.0%
BsmtCond             => 3.0%
BsmtExposure         => 3.0%
BsmtFinType1         => 3.0%
BsmtFinType2         => 3.0%

28. 缺失值处理

Dealing with missing values.

# drop 
df.dropna(axis=0)
df.dropna(axis=1)

# impute
df.fillna(0)
df.fillna(method="ffill")
df.fillna(method='bfill')

# replace
df.replace( -999, np.nan)
df.replace("?", np.nan)

# interpolate
ts.interpolate() # time series
df.interpolate() # fill all consecutive values forward
df.interpolate(limit=1) # fill one consecutive value forward
df.interpolate(limit=1, limit_direction="backward")
df.interpolate(limit_direction="both")

Calculations with missing data

29. 日期操作之时间

# from today
date.today() + datetime.timedelta(hours=30)
date.today() + datetime.timedelta(days=30)
date.today() + datetime.timedelta(weeks=30)

# ago
date.today() - datetime.timedelta(days=365)

30. 日期操作之两个时间点之间过滤

Filter between two dates.

df[(df["Date"] > "2015-01-01") & (df["Date"] < "2017-01-01")]

31. 根据day/month/year 过滤

# filter by single day
df[df["Date"].dt.strftime("%Y-%m-%d") == "2017-03-01"]

# filter by single month
df[df["Date"].dt.strftime("%m") == "12"]

# filter by single year
df[df["Date"].dt.strftime("%Y") == "2017"]

32. 数据格式化

format_dict = {
    "Date": "{:%d/%m/%y}",
    "Open": "${:.2f}",
    "Close": "${:.2f}",
    "Volume": "{:,}",
}

#
df.style.format(format_dict)

33. 颜色填充

(
    df.style.format(format_dict)
    .hide_index()
    .highlight_min(["Open"], color="red")
    .highlight_max(["Open"], color="green")
    .background_gradient(subset="Close", cmap="Greens")
    .bar('Volume', color='lightblue', align='zero')
    .set_caption('Tesla Stock Prices in 2017')
)

Table Visualization

34. 获取列的最大最小值 id

df['col'].idxmin()
df['col'].idxmax()

35. dataframe 函数处理

df.applymap(lambda x: np.log(x))

36. 随机打乱数据

df.sample(frac=1, random_state=7).reset_index(drop=True)

37. 百分比变化

Percent change，百分比变化。对时间序列有用.

如: price of BTC over 3 days [30000, 33000, 31000] -> [NaN, 0.1, -0.06]

df['col_name'].pct_change()

38. 检查 dataframe 的内存使用量

df.memory_usage().sum() / (1024**2) #converting to MB

39. 将list值分解为多行

Explode list values to multiple rows.

df.explode("col_name").reset_index(drop=True)

40. 将较少的数据归类为 Others

Convert smaller categories to “Others”

subclass = df.MSSubClass
subclass.value_counts()

最后修改：2022 年 04 月 30 日

如果觉得我的文章对你有用，请随意赞赏

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

Python - pandas 之常用代码块[转]

AIHGF • 2022 年 04 月 19 日

<blockquote>原文：<span class="external-link"><a class="no-external-link" href="https://medium.com/bitgrit-data-science-publication/40-useful-pandas-snippets-d7833472d12f" target="_blank"><i data-feather="external-link"></i>40 Useful Pandas Snippets Pandas snippets that come in handy in data analysis work - 2022.04.20</a></span></blockquote><p>相关：</p><blockquote><p><span class="external-link"><a class="no-external-link" href="https://aiuai.cn/aifarm1236.html" target="_blank"><i data-feather="external-link"></i>Python - pandas 之 csv 文件读取与写入 - AIUAI</a></span></p><p><span class="external-link"><a class="no-external-link" href="https://aiuai.cn/aifarm1972.html" target="_blank"><i data-feather="external-link"></i>Python - pandas数据处理之excel操作 - AIUAI</a></span></p></blockquote><pre><code class="lang-python">import pandas as pd

#读取数据
pd.read_csv(&quot;data.csv&quot;)</code></pre><h2>1. 过滤列数据</h2><p>Filter columns.</p><p>只需要数据集的几列，如：</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, usecols=[&quot;date&quot;, &quot;price&quot;])</code></pre><h2>2. 读取时解析日期</h2><p>Parse dates on read.</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, parse_dates=[&quot;date&quot;])</code></pre><h2>3. 指定数据类型</h2><p>Specify Data Types.</p><p>读取数据时，设置数据类型分类，以节省 内存.</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, dtype={&quot;house_type&quot;: &quot;category&quot;})</code></pre><h2>4. 设置索引</h2><p>Set index.</p><p>设置索引，尤其对于时间序列比较有用.</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, index_col=&quot;date&quot;)</code></pre><h2>5. 读取的行数</h2><p>No. of rows to read.</p><p>只读取数据集中的部分数据.</p><pre><code class="lang-python">#100 行数据
pd.read_csv(&quot;data.csv&quot;, nrows=100)</code></pre><h2>6. 跳过行</h2><p>Skip rows.</p><p>跳过某些行的数据，</p><pre><code class="lang-python">#跳过第1行和第5行
pd.read_csv(&quot;data.csv&quot;, skiprows=[1, 5])

#跳过前100行
pd.read_csv(&quot;data.csv&quot;, skiprows=100)

#跳过 90% 的行
pd.read_csv(&quot;data.csv&quot;, skiprows=lambda x: x &gt; 0 and np.random.rand() &gt; 0.1)</code></pre><h2>7. 指定 NA 值</h2><p>Specify NA Values.</p><p>如果数据的值有NA，即，如是 <code>?</code> 等，将其设置为读取，以便以后不用再转换.</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, na_values=[&quot;?&quot;])</code></pre><h2>8. 设定布尔值</h2><p>Setting boolean values.</p><p>如果某一列数据的格式是 <code>Yes</code> 和 <code>No</code>的形式，</p><pre><code class="lang-python">pd.read_csv(&quot;data.csv&quot;, true_values=[&quot;yes&quot;], false_values=[&quot;no&quot;])</code></pre><h2>9. 读取多个文件</h2><p>Read from multiple files.</p><p>如果数据被存储在多个文件里，</p><pre><code class="lang-python">import glob
import os

files = glob.glob(&quot;file_*.csv&quot;)

result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)</code></pre><h2>10. 复制粘贴数据到DataFrames</h2><p>Copy and Paste into Data Frames.</p><p>从剪切板复制粘贴数据，</p><pre><code class="lang-python"> df = pd.read_clipboard()</code></pre><h2>11. 从PDF读取表格数据</h2><p>Read tables from PDF files.</p><pre><code class="lang-python"># pip install tabula-py

from tabula import read_pdf
# Read pdf into list of DataFrame
df = read_pdf('test.pdf', pages='all')</code></pre><h2>12. 探索性数据分析</h2><p>EDA，Exploratory Data Analysis，探索性数据分析</p><p>EDA cheat.</p><p>想要可视化数据，但不想写plot代码，可以采用 <span class="external-link"><a class="no-external-link" href="https://github.com/ydataai/pandas-profiling" target="_blank"><i data-feather="external-link"></i>pandas-profiling</a></span>，仅需要一行代码，</p><pre><code class="lang-python"># pip install pandas-profiling

import pandas_profiling

df = pd.read_csv(&quot;data.csv&quot;)

profile = df.profile_report(title=&quot;Pandas Profiling Report&quot;)
profile.to_file(output_file=&quot;output.html&quot;)</code></pre><h2>13. 通过 dtype 过滤列</h2><p>dtype，Data Types.  （<span class="external-link"><a class="no-external-link" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-dtypes" target="_blank"><i data-feather="external-link"></i>list of dtypes</a></span> for pandas）.</p><p>Filter columns by dtype.</p><pre><code class="lang-python"># selecting
df.select_dtypes(include=&quot;number&quot;)
df.select_dtypes(include=[&quot;category&quot;, &quot;datetime&quot;])

# exluding
df.select_dtypes(exclude=&quot;object&quot;)</code></pre><h2>14. Infer dtype</h2><pre><code class="lang-python">df.infer_objects().dtypes</code></pre><h2>15. Downcasting</h2><pre><code class="lang-python">pd.to_numeric(df.numeric_col, downcast=&quot;integer&quot;) # smallest signed int dtype
pd.to_numeric(df.numeric_col, downcast=&quot;float&quot;)  # smallest float dtype</code></pre><h2>16. 手工转换</h2><p>Manual conversion.</p><p>如果数据中有 NaN 值，<code>error=&quot;coerce&quot;</code> 能够避免报错.</p><p>同时，可以采用 <code>.fillna</code> 将 NA 值填充为合理的值.</p><pre><code class="lang-python"># apply to whole data frame
df = df.apply(pd.to_numeric, errors=&quot;coerce&quot;)

# apply to specific columns
pd.to_numeric(df.numeric_column, errors=&quot;coerce&quot;)

# filling NA values with zero
pd.to_numeric(df.numeric_column, errors=&quot;coerce&quot;).fillna(0)</code></pre><h2>17. 一次转换全部</h2><p>Convert all at once.</p><pre><code class="lang-python">df = df.astype(
    {
        &quot;date&quot;: &quot;datetime64[ns]&quot;,
        &quot;price&quot;: &quot;int&quot;,
        &quot;is_weekend&quot;: &quot;bool&quot;,
        &quot;status&quot;: &quot;category&quot;,
    }
)</code></pre><h2>18. 列重命名</h2><p>Renaming columns.</p><pre><code class="lang-python">df = df.rename({&quot;PRICE&quot;: &quot;price&quot;, &quot;Date (mm/dd/yyyy)&quot;: &quot;date&quot;}, axis=1)</code></pre><h2>19. 增加前缀和后缀</h2><p>Add suffix and prefix.</p><pre><code class="lang-python">df.add_prefix(&quot;pre_&quot;)
df.add_suffix(&quot;_suf&quot;)</code></pre><h2>20. 创建新列</h2><p>Create new columns.</p><pre><code class="lang-python"># create new column of Fahrenheit values from Celcius
df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)</code></pre><h2>21. 指定位置插入列</h2><p>Insert columns at specific positions.</p><pre><code class="lang-python">random_col = np.random.randint(10, size=len(df))

df.insert(3, 'random_col', random_col) # inserts at third column</code></pre><h2>22. if-then-else</h2><pre><code class="lang-python">df[&quot;logic&quot;] = np.where(df[&quot;price&quot;] &gt; 5, &quot;high&quot;, &quot;low&quot;)</code></pre><h2>23. Dropping columns</h2><pre><code class="lang-python">df.drop('col1', axis=1, inplace=True)
df = df.drop(['col1','col2'], axis=1)
s = df.pop('col')
del df['col']
df.drop(df.columns[0], inplace=True)</code></pre><h2>24. 字符串列名操作</h2><p>Columns names</p><pre><code class="lang-python"># on column names
df.columns = df.columns.str.lower()
df.columns = df.columns.str.replace(' ', '_')</code></pre><h2>25. 字符串Contains</h2><pre><code class="lang-python">df['name'].str.contains(&quot;John&quot;)

df['phone_num'].str.contains('...-...-....', regex=True)  # regex

df['email'].str.contains('gmail')</code></pre><h2>26. 字符串findall</h2><pre><code class="lang-python">pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'

df['email'].str.findall(pattern, flags=re.IGNORECASE)</code></pre><h2>27. 缺失值检查</h2><p>Missing values Checking</p><pre><code class="lang-python">def missing_vals(df):
    &quot;&quot;&quot;prints out columns with perc of missing values&quot;&quot;&quot;
    missing = [
        (df.columns[idx], perc)
        for idx, perc in enumerate(df.isna().mean() * 100)
        if perc &gt; 0
    ]

if len(missing) == 0:
        return &quot;no missing values&quot;

# sort desc by perc
    missing.sort(key=lambda x: x[1], reverse=True)

print(f&quot;There are a total of {len(missing)} variables with missing values\n&quot;)

for tup in missing:
        print(str.ljust(f&quot;{tup[0]:&lt;20} =&gt; {round(tup[1], 3)}%&quot;, 1))

#
missing_vals(df)</code></pre><p>如：</p><pre><code class="lang-protobuf">There are a total of 16 variables with missing values

PoolQC               =&gt; 100.0%
Alley                =&gt; 94.0%
MiscFeature          =&gt; 91.0%
Fence                =&gt; 77.0%
FireplaceQu          =&gt; 54.0%
LotFrontage          =&gt; 14.0%
GarageType           =&gt; 6.0%
GarageYrBlt          =&gt; 6.0%
GarageFinish         =&gt; 6.0%
GarageQual           =&gt; 6.0%
GarageCond           =&gt; 6.0%
BsmtQual             =&gt; 3.0%
BsmtCond             =&gt; 3.0%
BsmtExposure         =&gt; 3.0%
BsmtFinType1         =&gt; 3.0%
BsmtFinType2         =&gt; 3.0%</code></pre><h2>28. 缺失值处理</h2><p>Dealing with missing values.</p><pre><code class="lang-python"># drop 
df.dropna(axis=0)
df.dropna(axis=1)

# impute
df.fillna(0)
df.fillna(method=&quot;ffill&quot;)
df.fillna(method='bfill')

# replace
df.replace( -999, np.nan)
df.replace(&quot;?&quot;, np.nan)

# interpolate
ts.interpolate() # time series
df.interpolate() # fill all consecutive values forward
df.interpolate(limit=1) # fill one consecutive value forward
df.interpolate(limit=1, limit_direction=&quot;backward&quot;)
df.interpolate(limit_direction=&quot;both&quot;)</code></pre><blockquote><span class="external-link"><a class="no-external-link" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#calculations-with-missing-data" target="_blank"><i data-feather="external-link"></i>Calculations with missing data</a></span></blockquote><h2>29. 日期操作之时间</h2><pre><code class="lang-python"># from today
date.today() + datetime.timedelta(hours=30)
date.today() + datetime.timedelta(days=30)
date.today() + datetime.timedelta(weeks=30)

# ago
date.today() - datetime.timedelta(days=365)</code></pre><h2>30. 日期操作之两个时间点之间过滤</h2><p>Filter between two dates.</p><pre><code class="lang-python">df[(df[&quot;Date&quot;] &gt; &quot;2015-01-01&quot;) &amp; (df[&quot;Date&quot;] &lt; &quot;2017-01-01&quot;)]</code></pre><h2>31. 根据day/month/year 过滤</h2><pre><code class="lang-python"># filter by single day
df[df[&quot;Date&quot;].dt.strftime(&quot;%Y-%m-%d&quot;) == &quot;2017-03-01&quot;]

# filter by single month
df[df[&quot;Date&quot;].dt.strftime(&quot;%m&quot;) == &quot;12&quot;]

# filter by single year
df[df[&quot;Date&quot;].dt.strftime(&quot;%Y&quot;) == &quot;2017&quot;]</code></pre><h2>32. 数据格式化</h2><pre><code class="lang-python">format_dict = {
    &quot;Date&quot;: &quot;{:%d/%m/%y}&quot;,
    &quot;Open&quot;: &quot;${:.2f}&quot;,
    &quot;Close&quot;: &quot;${:.2f}&quot;,
    &quot;Volume&quot;: &quot;{:,}&quot;,
}

#
df.style.format(format_dict)</code></pre><h2>33. 颜色填充</h2><pre><code class="lang-python">(
    df.style.format(format_dict)
    .hide_index()
    .highlight_min([&quot;Open&quot;], color=&quot;red&quot;)
    .highlight_max([&quot;Open&quot;], color=&quot;green&quot;)
    .background_gradient(subset=&quot;Close&quot;, cmap=&quot;Greens&quot;)
    .bar('Volume', color='lightblue', align='zero')
    .set_caption('Tesla Stock Prices in 2017')
)</code></pre><blockquote><span class="external-link"><a class="no-external-link" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html" target="_blank"><i data-feather="external-link"></i>Table Visualization</a></span></blockquote><h2>34. 获取列的最大最小值 id</h2><pre><code class="lang-python">df['col'].idxmin()
df['col'].idxmax()</code></pre><h2>35. dataframe 函数处理</h2><pre><code class="lang-python">df.applymap(lambda x: np.log(x))</code></pre><h2>36. 随机打乱数据</h2><pre><code class="lang-python">df.sample(frac=1, random_state=7).reset_index(drop=True)</code></pre><h2>37. 百分比变化</h2><p>Percent change，百分比变化。对时间序列有用.</p><p>如: price of BTC over 3 days [30000, 33000, 31000] -&gt; [NaN, 0.1, -0.06]</p><pre><code class="lang-python">df['col_name'].pct_change()</code></pre><h2>38. 检查 dataframe 的内存使用量</h2><pre><code class="lang-python">df.memory_usage().sum() / (1024**2) #converting to MB</code></pre><h2>39. 将list值分解为多行</h2><p>Explode list values to multiple rows.</p><pre><code class="lang-python">df.explode(&quot;col_name&quot;).reset_index(drop=True)</code></pre><h2>40. 将较少的数据归类为 Others</h2><p>Convert smaller categories to “Others”</p><pre><code class="lang-python">subclass = df.MSSubClass
subclass.value_counts()</code></pre>

1. 过滤列数据

2. 读取时解析日期

3. 指定数据类型

4. 设置索引

5. 读取的行数

6. 跳过行

7. 指定 NA 值

8. 设定布尔值

9. 读取多个文件

10. 复制粘贴数据到DataFrames

11. 从PDF读取表格数据

12. 探索性数据分析

13. 通过 dtype 过滤列

14. Infer dtype

15. Downcasting

16. 手工转换

17. 一次转换全部

18. 列重命名

19. 增加前缀和后缀

20. 创建新列

21. 指定位置插入列

22. if-then-else

23. Dropping columns

24. 字符串列名操作

25. 字符串Contains

26. 字符串findall

27. 缺失值检查

28. 缺失值处理

29. 日期操作之时间

30. 日期操作之两个时间点之间过滤

31. 根据day/month/year 过滤

32. 数据格式化

33. 颜色填充

34. 获取列的最大最小值 id

35. dataframe 函数处理

36. 随机打乱数据

37. 百分比变化

38. 检查 dataframe 的内存使用量

39. 将list值分解为多行

40. 将较少的数据归类为 Others

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Python - pandas 之常用代码块[转]

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款