# Data Processing The data processing module provides utilities for loading and preprocessing point cloud data, with particular focus on KITTI format support. ## PointCloudLoader Main class for loading and preprocessing point cloud data from various formats. ### Methods #### `load_and_preprocess(file_path, format="auto")` Loads point cloud data and applies preprocessing steps. **Parameters:** - `file_path`: Path to the point cloud file - `format`: File format ("auto", "kitti", "ply", "pcd") **Returns:** - `coordinates`: Point coordinates (N, 3) - `reflectivity`: Reflectivity values (N,) #### `load_kitti_format(file_path)` Loads point cloud data in KITTI format (.bin files). **Parameters:** - `file_path`: Path to KITTI .bin file **Returns:** - `coordinates`: Point coordinates (N, 3) - `reflectivity`: Reflectivity values (N,) #### `preprocess_point_cloud(coordinates, reflectivity, options=None)` Applies preprocessing operations to loaded point cloud data. **Parameters:** - `coordinates`: Raw point coordinates - `reflectivity`: Raw reflectivity values - `options`: Preprocessing options dictionary **Returns:** - Preprocessed coordinates and reflectivity ### Example ```python from rapid_seg.data import PointCloudLoader # Initialize loader loader = PointCloudLoader() # Load KITTI format data coordinates, reflectivity = loader.load_and_preprocess("data.bin", "kitti") # Apply preprocessing preprocessed_coords, preprocessed_reflectivity = loader.preprocess_point_cloud( coordinates, reflectivity, options={ "normalize": True, "remove_outliers": True, "downsample": 0.1 } ) ``` ## Supported Formats ### KITTI Format (.bin) Binary format commonly used in autonomous driving datasets: ```python # Load KITTI data coordinates, reflectivity = loader.load_kitti_format("velodyne/000001.bin") # Data structure: [x, y, z, intensity] # coordinates: (N, 3) - x, y, z coordinates # reflectivity: (N,) - intensity/reflectivity values ``` ### PLY Format (.ply) ASCII/binary format for 3D point clouds: ```python # Load PLY data coordinates, reflectivity = loader.load_and_preprocess("model.ply", "ply") ``` ### PCD Format (.pcd) Point Cloud Data format: ```python # Load PCD data coordinates, reflectivity = loader.load_and_preprocess("cloud.pcd", "pcd") ``` ## Preprocessing Options ### Normalization ```python # Normalize coordinates to unit sphere options = {"normalize": True, "normalization_type": "sphere"} ``` ### Outlier Removal ```python # Remove statistical outliers options = {"remove_outliers": True, "outlier_threshold": 3.0} ``` ### Downsampling ```python # Voxel-based downsampling options = {"downsample": 0.1, "voxel_size": 0.1} ``` ### Filtering ```python # Range filtering options = { "range_filter": True, "min_range": 0.0, "max_range": 100.0 } ``` ## Data Validation The loader includes built-in validation: ```python # Validate data integrity is_valid = loader.validate_point_cloud(coordinates, reflectivity) if not is_valid: print("Invalid point cloud data detected") # Handle invalid data ``` ## Performance Features - **Lazy Loading**: Load data only when needed - **Memory Mapping**: Efficient handling of large files - **Batch Processing**: Process multiple files simultaneously - **Progress Tracking**: Monitor loading progress for large datasets