10 Python Functions Every Data Scientist Must Memorize

Here are 10 essential Python functions that every data scientist should know to work efficiently with data:


1. len()

Returns the length (number of elements) of an object like a list, tuple, or string.

data = [1, 2, 3, 4]
print(len(data))  # Output: 4

2. type()

Returns the type of the given object. Useful for checking data types.

x = 42
print(type(x))  # Output: <class 'int'>

3. map()

Applies a function to every item in an iterable (like a list or tuple).

numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

4. filter()

Filters elements in an iterable based on a function that returns True or False.

numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even)  # Output: [2, 4]

5. reduce() (from functools)

Applies a function cumulatively to the items of an iterable, reducing it to a single value.

from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # Output: 24

6. zip()

Aggregates elements from two or more iterables into tuples.

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result)  # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]

7. enumerate()

Adds an index to an iterable and returns it as an enumerated object.

names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
    print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie

8. sorted()

Sorts an iterable and returns a new sorted list.

numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # Output: [1, 2, 5, 9]

9. pandas.DataFrame()

This function creates a DataFrame, a powerful data structure from the pandas library essential for data manipulation.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Score
# 0  Alice     85
# 1    Bob     90

10. numpy.mean()

Computes the mean of an array, useful in statistical analysis.

import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value)  # Output: 3.0

These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

훈민정음 1

Joining an additional control plane node to an existing Kubernetes cluster

CDPEvents in puppeteer

Vespa vs Milvus

kafka polling vs listen

How to change java version on gradle of flutter

The pierce selector in Puppeteer

Install and run an FTP server using Docker