10 Python Functions Every Data Scientist Must Memorize

Here are 10 essential Python functions that every data scientist should know to work efficiently with data:


1. len()

Returns the length (number of elements) of an object like a list, tuple, or string.

data = [1, 2, 3, 4]
print(len(data))  # Output: 4

2. type()

Returns the type of the given object. Useful for checking data types.

x = 42
print(type(x))  # Output: <class 'int'>

3. map()

Applies a function to every item in an iterable (like a list or tuple).

numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

4. filter()

Filters elements in an iterable based on a function that returns True or False.

numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even)  # Output: [2, 4]

5. reduce() (from functools)

Applies a function cumulatively to the items of an iterable, reducing it to a single value.

from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # Output: 24

6. zip()

Aggregates elements from two or more iterables into tuples.

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result)  # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]

7. enumerate()

Adds an index to an iterable and returns it as an enumerated object.

names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
    print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie

8. sorted()

Sorts an iterable and returns a new sorted list.

numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # Output: [1, 2, 5, 9]

9. pandas.DataFrame()

This function creates a DataFrame, a powerful data structure from the pandas library essential for data manipulation.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Score
# 0  Alice     85
# 1    Bob     90

10. numpy.mean()

Computes the mean of an array, useful in statistical analysis.

import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value)  # Output: 3.0

These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

How to split a list into chunks of 100 items in JavaScript, 자바스크립트 리스트 쪼개기

HTML Inline divisions at one row by Tailwind

Boilerplate for typescript server programing

가속도 & 속도

Gradle multi-module project

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

CDPEvents in puppeteer

Sparse encoder

Reactjs datetime range picker