10 Python Functions Every Data Scientist Must Memorize

10월 18, 2024

Here are 10 essential Python functions that every data scientist should know to work efficiently with data:

1. `len()`

Returns the length (number of elements) of an object like a list, tuple, or string.

data = [1, 2, 3, 4]
print(len(data))  # Output: 4

2. `type()`

Returns the type of the given object. Useful for checking data types.

x = 42
print(type(x))  # Output: <class 'int'>

3. `map()`

Applies a function to every item in an iterable (like a list or tuple).

numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

4. `filter()`

Filters elements in an iterable based on a function that returns True or False.

numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even)  # Output: [2, 4]

5. `reduce()` (from `functools`)

Applies a function cumulatively to the items of an iterable, reducing it to a single value.

from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # Output: 24

6. `zip()`

Aggregates elements from two or more iterables into tuples.

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result)  # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]

7. `enumerate()`

Adds an index to an iterable and returns it as an enumerated object.

names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
    print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie

8. `sorted()`

Sorts an iterable and returns a new sorted list.

numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # Output: [1, 2, 5, 9]

9. `pandas.DataFrame()`

This function creates a DataFrame, a powerful data structure from the pandas library essential for data manipulation.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Score
# 0  Alice     85
# 1    Bob     90

10. `numpy.mean()`

Computes the mean of an array, useful in statistical analysis.

import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value)  # Output: 3.0

These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!

IT