Skip to content

Core API

Public API from lance_array.core, in reading order: LanceArray (view type), TileCodec, open_array, and normalize_chunk_slices.

lance_array.core.LanceArray

2D view over a Lance dataset with one encoded tile per row.

Rows are indexed by logical tile grid (tile_i, tile_j) mapped to a Lance positional row index for take_blobs. Each stored payload is decoded using the TileCodec chosen at write time; see decode_tile.

Indexing: NumPy-like view[row, col]int, slice (including step ≠ 1), ..., view[row] as view[row, :], and advanced indices (integer or boolean ndarray, list) with the same broadcasting rules as NumPy for 2D arrays. Overlapping tiles are read via batched take_blobs and stitched (including partial edge tiles).

Assignment: only for views opened with mode="r+". Supported keys match basic NumPy indexing with slice step 1 on both axes; fancy and boolean assignment is not implemented.

Full raster: to_numpy materializes the entire grid in one batched read path.

Create on disk: LanceArray.to_lance writes a new dataset from a NumPy image and returns a read-only view; open with mode="r+" to mutate.

Attributes:

Name Type Description
shape tuple[int, int]

Raster shape (H, W).

chunks tuple[int, int]

Tile shape (ch0, ch1).

dtype dtype

Pixel dtype of the logical raster.

Attributes

ndim: int property

Number of dimensions.

Returns:

Type Description
int

Always 2.

coord_to_row: dict[tuple[int, int], int] property

Map each tile grid index to its Lance positional row index.

Returns:

Type Description
dict[tuple[int, int], int]

Keys (tile_i, tile_j); values are indices for take_blobs.

blob_column: str property

Lance column name holding encoded tile payloads.

Returns:

Type Description
str

Blob v2 column name (default "blob" unless overridden at write).

payload_layout: Literal['blob', 'bytes'] property

Physical payload layout for encoded tiles.

dataset: lance.LanceDataset property

Underlying Lance dataset handle.

Returns:

Type Description
LanceDataset

Use e.g. take_blobs for advanced access; most users rely on __getitem__ / to_numpy instead.

n_tile_rows: int property

Number of tile rows along axis 0.

Returns:

Type Description
int

shape[0] // chunks[0].

n_tile_cols: int property

Number of tile columns along axis 1.

Returns:

Type Description
int

shape[1] // chunks[1].

Functions

__init__(dataset: lance.LanceDataset, chunk_shape: tuple[int, int], image_shape: tuple[int, int], coord_to_row: dict[tuple[int, int], int], decode_tile: Callable[[bytes], np.ndarray], *, blob_column: str = 'blob', payload_layout: Literal['blob', 'bytes'] = 'blob', tile_row_col: str = _TILE_ROW_COL, tile_col_col: str = _TILE_COL_COL, dtype: np.dtype | None = None, encode_tile: Callable[[np.ndarray], bytes] | None = None) -> None

Build a view from an open Lance dataset (prefer LanceArray.open or LanceArray.to_lance).

Parameters:

Name Type Description Default
dataset LanceDataset

Open lance.LanceDataset with one row per tile and (i, j) keys.

required
chunk_shape tuple[int, int]

(ch0, ch1) tile size in pixels.

required
image_shape tuple[int, int]

Full raster shape (H, W).

required
coord_to_row dict[tuple[int, int], int]

Map (tile_i, tile_j) to the positional row index used by take_blobs (see module helpers).

required
decode_tile Callable[[bytes], ndarray]

Decode one blob payload to a chunk_shape array.

required
blob_column str

Lance blob column name for tile payloads.

'blob'
dtype dtype | None

Raster dtype; defaults to uint16 if omitted.

None
encode_tile Callable[[ndarray], bytes] | None

Encode a tile array to bytes; must be set for r+ writes, else None for read-only views.

None

open(path: str | Path, *, mode: str = 'r') -> LanceArray classmethod

Open a dataset written with LanceArray.to_lance using a sidecar manifest.

The dataset root must expose lance_array.json (written by LanceArray.to_lance). For local paths this is a file under the dataset directory; for URIs (e.g. s3://...) the manifest is read via optional smart-open.

Parameters:

Name Type Description Default
path str | Path

Lance dataset directory or URI (e.g. s3://bucket/prefix/array.lance). Remote roots need the optional smart-open dependency (pip install 'lance-array[cloud]'); credentials follow the normal cloud SDK / environment defaults.

required
mode str

"r" read-only. "r+" allows __setitem__ for basic indices (contiguous slices and integers; slice step must be 1).

'r'

Returns:

Type Description
LanceArray

View over the on-disk dataset; use mode="r+" for slice assignment.

Raises:

Type Description
ValueError

If mode is invalid, the manifest is missing/unsupported, or the table does not match manifest shape/chunks.

FileNotFoundError

If lance_array.json is missing for a local path.

ImportError

If a remote URI is used without smart-open installed.

to_lance(path: str | Path, image: np.ndarray, chunk_shape: tuple[int, int], *, codec: TileCodec | str = TileCodec.RAW, blosc_typesize: int | None = None, blosc_clevel: int = 5, blosc_cname: str = 'zstd', blob_column: str = 'payload', data_storage_version: Literal['stable', '2.0', '2.1', '2.2', '2.3', 'next', 'legacy', '0.1'] = '2.2', tile_order: Literal['row_major', 'morton', 'hilbert'] = 'morton', payload_layout: Literal['blob', 'bytes'] = 'bytes') -> LanceArray classmethod

Write a 2D image as one encoded tile per row and return a LanceArray.

The on-disk table stores tile coordinates (default columns tile_row, tile_col), morton_code, and an encoded payload column (blob-v2 or large-binary bytes depending on payload_layout). A sidecar lance_array.json stores shape, chunk grid, dtype, and codec parameters so LanceArray.open works.

Pass codec= as TileCodec or a string alias ("raw", "blosc_numcodecs", "blosc2"). Blosc presets use blosc_typesize (defaults to dtype itemsize), blosc_clevel, and blosc_cname where applicable.

Parameters:

Name Type Description Default
path str | Path

Output dataset directory.

required
image ndarray

Full raster (H, W). H / W must be divisible by chunk_shape.

required
chunk_shape tuple[int, int]

(ch0, ch1) height and width of each tile.

required
codec TileCodec | str

Built-in tile codec preset.

RAW
blosc_typesize int | None

Blosc typesize for BLOSC_NUMCODECS / BLOSC2 (default: dtype itemsize).

None
blosc_clevel int

Blosc compression level.

5
blosc_cname str

Blosc compressor name (e.g. "zstd"). For BLOSC2, only a subset is mapped (zstd, lz4, blosclz).

'zstd'
blob_column str

Name of the blob column in the Lance schema.

'payload'
data_storage_version Literal['stable', '2.0', '2.1', '2.2', '2.3', 'next', 'legacy', '0.1']

Passed to lance.write_dataset.

'2.2'
tile_order Literal['row_major', 'morton', 'hilbert']

Physical insertion order of tiles in the Lance table. "row_major" writes (i, j) in nested-loop order; "morton" writes by Morton (Z-order) code and "hilbert" writes by Hilbert code to improve 2D spatial locality for rectangular range reads.

'morton'

Returns:

Type Description
LanceArray

Read-only view over the written dataset (use LanceArray.open with mode="r+" to assign slices).

Raises:

Type Description
ValueError

If image is not 2D, shape is not divisible by chunk_shape, or codec options are invalid.

decode_tile(data: bytes) -> np.ndarray

Decode one blob from storage into a single tile array.

Parameters:

Name Type Description Default
data bytes

Raw bytes from the blob column for one row.

required

Returns:

Type Description
ndarray

Shape chunks, dtype self.dtype.

to_numpy() -> np.ndarray

Decode all tiles and return the full raster (single batched read path).

Returns:

Type Description
ndarray

Shape self.shape, dtype self.dtype.

lance_array.core.TileCodec

Bases: Enum

How each tile is encoded in the Lance blob column.

Pass a member (or string alias such as "raw", "blosc_numcodecs", "blosc2") to LanceArray.to_lance. BLOSC2 requires the blosc2 package (e.g. lance-array[zarr]). See enum members below for each preset.

Attributes

RAW = 'raw' class-attribute instance-attribute

Uncompressed contiguous bytes (dtype itemsize × cells per tile).

BLOSC_NUMCODECS = 'blosc_numcodecs' class-attribute instance-attribute

Blosc1 via numcodecs.Blosc (typical Zarr Blosc parity).

BLOSC2 = 'blosc2' class-attribute instance-attribute

Blosc2 via blosc2.compress / decompress (install blosc2).

lance_array.core.open_array(store: str | Path, *, mode: str = 'r') -> LanceArray

Open a Lance tile dataset (Zarr-style entry point).

Like zarr.open_array, but for a dataset written by LanceArray.to_lance (includes lance_array.json).

Parameters:

Name Type Description Default
store str | Path

Dataset directory or URI passed to lance.dataset.

required
mode str

"r" read-only. "r+" allows LanceArray.__setitem__ for basic indices (see PRD prds/lance-array-slice-writes.md).

'r'

Returns:

Type Description
LanceArray

Same as LanceArray.open.

Raises:

Type Description
ValueError

Invalid mode, corrupt manifest, or dataset row count mismatch.

FileNotFoundError

Missing lance_array.json for a local path.

ImportError

Remote URI without smart-open installed.

Notes

For s3://, gs://, or https://, install smart-open (extra lance-array[cloud]). The manifest is read with smart_open before opening Lance.

lance_array.core.normalize_chunk_slices(s: slice, dim: int) -> tuple[int, int]

Normalize a slice to (start, stop) with step 1 and positive span.

Parameters:

Name Type Description Default
s slice

Slice along an axis of logical length dim (uses s.indices(dim)).

required
dim int

Size of that axis.

required

Returns:

Type Description
tuple[int, int]

Half-open interval (start, stop).

Raises:

Type Description
ValueError

If step != 1 or the slice is empty after normalization.