熊猫–计算两个系列之间的欧几里得距离
在各种机器学习算法中使用了许多距离度量。其中之一就是欧氏距离。欧几里得距离是最常用的距离度量,它只是两点之间的直线距离。点之间的欧几里得距离由公式给出:
我们可以用各种方法来计算两个数列之间的欧氏距离。这里有几个相同的方法: 例 1:
import pandas as pd
import numpy as np
# create pandas series
x = pd.Series([1, 2, 3, 4, 5])
y = pd.Series([6, 7, 8, 9, 10])
# here we are computing every thing
# step by step
p1 = np.sum([(a * a) for a in x])
p2 = np.sum([(b * b) for b in y])
# using zip() function to create an
# iterator which aggregates elements
# from two or more iterables
p3 = -1 * np.sum([(2 * a*b) for (a, b) in zip(x, y)])
dist = np.sqrt(np.sum(p1 + p2 + p3))
print("Series 1:", x)
print("Series 2:", y)
print("Euclidean distance between two series is:", dist)
输出:
例 2:
import pandas as pd
import numpy as np
x = pd.Series([1, 2, 3, 4, 5])
y = pd.Series([6, 7, 8, 9, 10])
# zip() function creates an iterator
# which aggregates elements from two
# or more iterables
dist = np.sqrt(np.sum([(a-b)*(a-b) for a, b in zip(x, y)]))
print("Series 1:")
print(x)
print("Series 2:")
print(y)
print("Euclidean distance between two series is:", dist)
输出:
示例 3: 在本例中,我们使用的是 np.linalg.norm() 函数,该函数返回八个不同矩阵范数中的一个。
import pandas as pd
import numpy as np
x = pd.Series([1, 2, 3, 4, 5])
y = pd.Series([6, 7, 8, 9, 10])
dist = (np.linalg.norm(x-y))
print("Series 1:")
print(x)
print("Series 2:")
print(y)
print("Euclidean distance between two series is:", dist)
输出:
示例 4: 我们现在来试试更大的系列:
import pandas as pd
import numpy as np
x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = pd.Series([12, 8, 7, 5, 6, 5, 3, 9, 7, 1])
dist = np.sqrt(np.sum([(a-b)*(a-b) for a, b in zip(x, y)]))
print("Series 1:")
print(x)
print("Series 2:")
print(y)
print("Euclidean distance between two series is:", dist)
输出: