Polars is a cutting-edge DataFrame library designed for high-speed data manipulation and analysis.
Written in Rust and leveraging the Apache Arrow columnar format, Polars provides a robust, multi-threaded, and memory-efficient solution for handling both small and large datasets.
It supports multiple programming languages, including Python, Rust, Node.js, R, and SQL.
Key Features
- Blazing Speed: Polars is optimized for performance with features like SIMD (Single Instruction Multiple Data) and query optimization. It outperforms many traditional libraries like Pandas in speed benchmarks.
- Lazy and Eager Execution: Polars supports both lazy execution (ideal for complex pipelines) and eager execution (for immediate results), giving users flexibility in how they process data.
- Multi-Threading: The library utilizes multi-threading to maximize computational efficiency, making it ideal for modern multi-core processors.
- Larger-than-RAM Datasets: Polars can handle datasets that exceed system memory by processing queries in a streaming fashion. This makes it possible to work with datasets as large as 250GB on a standard laptop.
- Advanced Querying: Polars offers a powerful expression API for filtering, aggregating, and transforming data. It also supports SQL-like syntax for users familiar with relational databases.
- Lightweight: With minimal dependencies, Polars is lightweight and has fast import times compared to other libraries like Pandas or NumPy.
In Python, you can quickly create a DataFrame and perform complex operations:
import polars as pl
df = pl.DataFrame({
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": ["apple", "banana", "cherry"]
})
result = df.select(
pl.col("A").sum().alias("sum_A"),
pl.col("C").sort_by("A").alias("sorted_C")
)
print(result)
Polars also supports SQL queries directly on DataFrames or via its CLI for terminal-based operations.
Polars can be installed via pip
:
pip install polars
Optional dependencies can be added for extended functionality:
`bash pip install 'polars[all]'