10 Python Functions Every Data Scientist Must Memorize

Here are 10 essential Python functions that every data scientist should know to work efficiently with data:


1. len()

Returns the length (number of elements) of an object like a list, tuple, or string.

data = [1, 2, 3, 4]
print(len(data))  # Output: 4

2. type()

Returns the type of the given object. Useful for checking data types.

x = 42
print(type(x))  # Output: <class 'int'>

3. map()

Applies a function to every item in an iterable (like a list or tuple).

numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

4. filter()

Filters elements in an iterable based on a function that returns True or False.

numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even)  # Output: [2, 4]

5. reduce() (from functools)

Applies a function cumulatively to the items of an iterable, reducing it to a single value.

from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # Output: 24

6. zip()

Aggregates elements from two or more iterables into tuples.

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result)  # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]

7. enumerate()

Adds an index to an iterable and returns it as an enumerated object.

names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
    print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie

8. sorted()

Sorts an iterable and returns a new sorted list.

numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # Output: [1, 2, 5, 9]

9. pandas.DataFrame()

This function creates a DataFrame, a powerful data structure from the pandas library essential for data manipulation.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Score
# 0  Alice     85
# 1    Bob     90

10. numpy.mean()

Computes the mean of an array, useful in statistical analysis.

import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value)  # Output: 3.0

These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!

댓글

이 블로그의 인기 게시물

To switch to a specific tag in a Git repository

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

Using the MinIO API via curl

To download a file from MinIO using Spring Boot, 스프링부트 Minio 사용하기

리눅스의 부팅과정 (프로세스, 서비스 관리)

Chromium 개발 환경 세팅, 크로미움 개발 준비하기

Joining an additional control plane node to an existing Kubernetes cluster

urllib3 with proxy settings

CDPEvents in puppeteer

Avro + Grpc in python