10 Python Functions Every Data Scientist Must Memorize
Here are 10 essential Python functions that every data scientist should know to work efficiently with data:
1. len()
Returns the length (number of elements) of an object like a list, tuple, or string.
data = [1, 2, 3, 4]
print(len(data)) # Output: 4
2. type()
Returns the type of the given object. Useful for checking data types.
x = 42
print(type(x)) # Output: <class 'int'>
3. map()
Applies a function to every item in an iterable (like a list or tuple).
numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared) # Output: [1, 4, 9, 16]
4. filter()
Filters elements in an iterable based on a function that returns True
or False
.
numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even) # Output: [2, 4]
5. reduce()
(from functools
)
Applies a function cumulatively to the items of an iterable, reducing it to a single value.
from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product) # Output: 24
6. zip()
Aggregates elements from two or more iterables into tuples.
names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result) # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]
7. enumerate()
Adds an index to an iterable and returns it as an enumerated object.
names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie
8. sorted()
Sorts an iterable and returns a new sorted list.
numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers) # Output: [1, 2, 5, 9]
9. pandas.DataFrame()
This function creates a DataFrame, a powerful data structure from the pandas
library essential for data manipulation.
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
# Name Score
# 0 Alice 85
# 1 Bob 90
10. numpy.mean()
Computes the mean of an array, useful in statistical analysis.
import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value) # Output: 3.0
These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!
댓글
댓글 쓰기