10 Python Functions Every Data Scientist Must Memorize

Here are 10 essential Python functions that every data scientist should know to work efficiently with data:


1. len()

Returns the length (number of elements) of an object like a list, tuple, or string.

data = [1, 2, 3, 4]
print(len(data))  # Output: 4

2. type()

Returns the type of the given object. Useful for checking data types.

x = 42
print(type(x))  # Output: <class 'int'>

3. map()

Applies a function to every item in an iterable (like a list or tuple).

numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

4. filter()

Filters elements in an iterable based on a function that returns True or False.

numbers = [1, 2, 3, 4, 5]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even)  # Output: [2, 4]

5. reduce() (from functools)

Applies a function cumulatively to the items of an iterable, reducing it to a single value.

from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # Output: 24

6. zip()

Aggregates elements from two or more iterables into tuples.

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 80]
result = list(zip(names, scores))
print(result)  # Output: [('Alice', 85), ('Bob', 90), ('Charlie', 80)]

7. enumerate()

Adds an index to an iterable and returns it as an enumerated object.

names = ['Alice', 'Bob', 'Charlie']
for index, name in enumerate(names):
    print(index, name)
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie

8. sorted()

Sorts an iterable and returns a new sorted list.

numbers = [5, 2, 9, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # Output: [1, 2, 5, 9]

9. pandas.DataFrame()

This function creates a DataFrame, a powerful data structure from the pandas library essential for data manipulation.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Score': [85, 90]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Score
# 0  Alice     85
# 1    Bob     90

10. numpy.mean()

Computes the mean of an array, useful in statistical analysis.

import numpy as np
data = [1, 2, 3, 4, 5]
mean_value = np.mean(data)
print(mean_value)  # Output: 3.0

These functions cover a wide range of operations, from simple iteration and aggregation to essential data manipulation. Mastering them will significantly enhance your productivity in data science projects!

댓글

이 블로그의 인기 게시물

Install and run an FTP server using Docker

Using the MinIO API via curl

PYTHONPATH, Python 모듈 환경설정

Elasticsearch Ingest API

오늘의 문장2

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

Fundamentals of English Grammar #1

To switch to a specific tag in a Git repository

You can use Sublime Text from the command line by utilizing the subl command

티베트-버마어파 와 한어파(중국어파)의 어순 비교