Data Engineer Job
Employer: Callie Torresan
SpiderID: 14163303
Location: San Jose, California
Posted: 4/7/2026
Wage: $174,304 per year
Priority Review Date: 5/7/2026
Job Code / NOC / SOC: 15-2051.00
Category: General
Job Description:
- Analyze large-scale structured and unstructured lidar datasets to extract geospatial insights and support perception algorithm development.
- Design and implement end-to-end data pipelines for real-time ingestion, transformation, and storage of high-volume 3D point cloud data.
- Develop and manage spatial databases (e.g., PostgreSQL/PostGIS) to support efficient indexing, querying, and long-term sensor data retention.
- Apply machine learning and statistical techniques for object classification, scene segmentation, and behavioral prediction from sensor data.
- Build custom geospatial visualizations using tools such as Mapbox, CesiumJS, and Python for system validation and client-facing deliverables.
- Collaborate cross-functionally with perception, embedded software, and hardware engineering teams to ensure smooth data integration across lidar platforms.
- Develop and maintain internal tools for QA automation, sensor calibration tracking, and batch data reprocessing.
- Optimize data workflows for latency, scalability, and real-time performance using tools like Spark, Kafka, and Docker/Kubernetes.
- Implement data quality checks and validation processes to detect anomalies and ensure sensor output integrity.
- Participate in design reviews, present data-driven findings, and propose architectural improvements based on testing and deployment feedback.
- Research and evaluate advancements in AI/ML, spatial computing, and big data frameworks, and integrate relevant innovations into LiDAR-based product stack.
- Support customer-specific analytics use cases by fine-tuning data pipelines and ML models for deployment-specific environments.
Requirements:
- Master’s degree in data science, Spatial Data Science, Computer Science, or a foreign equivalent
- Must have at least one year of full-time work experience in the following areas:
a) Developing and deploying real-time 3D data pipelines for high-volume spatial or sensor data.
b) Designing and implementing spatial database systems using PostgreSQL/PostGIS or similar geospatial DBMS.
c) Building interactive 3D geospatial visualization platforms using WebGL-based JavaScript libraries: CesiumJS, Three.js, and Mapbox GL JS.
- Must possess knowledge of the deep learning models PointNet and PointCNN, and associated techniques used for 3D scene classification and segmentation.
- Design and implement end-to-end data pipelines for real-time ingestion, transformation, and storage of high-volume 3D point cloud data.
- Develop and manage spatial databases (e.g., PostgreSQL/PostGIS) to support efficient indexing, querying, and long-term sensor data retention.
- Apply machine learning and statistical techniques for object classification, scene segmentation, and behavioral prediction from sensor data.
- Build custom geospatial visualizations using tools such as Mapbox, CesiumJS, and Python for system validation and client-facing deliverables.
- Collaborate cross-functionally with perception, embedded software, and hardware engineering teams to ensure smooth data integration across lidar platforms.
- Develop and maintain internal tools for QA automation, sensor calibration tracking, and batch data reprocessing.
- Optimize data workflows for latency, scalability, and real-time performance using tools like Spark, Kafka, and Docker/Kubernetes.
- Implement data quality checks and validation processes to detect anomalies and ensure sensor output integrity.
- Participate in design reviews, present data-driven findings, and propose architectural improvements based on testing and deployment feedback.
- Research and evaluate advancements in AI/ML, spatial computing, and big data frameworks, and integrate relevant innovations into LiDAR-based product stack.
- Support customer-specific analytics use cases by fine-tuning data pipelines and ML models for deployment-specific environments.
Requirements:
- Master’s degree in data science, Spatial Data Science, Computer Science, or a foreign equivalent
- Must have at least one year of full-time work experience in the following areas:
a) Developing and deploying real-time 3D data pipelines for high-volume spatial or sensor data.
b) Designing and implementing spatial database systems using PostgreSQL/PostGIS or similar geospatial DBMS.
c) Building interactive 3D geospatial visualization platforms using WebGL-based JavaScript libraries: CesiumJS, Three.js, and Mapbox GL JS.
- Must possess knowledge of the deep learning models PointNet and PointCNN, and associated techniques used for 3D scene classification and segmentation.
Contact Information:
| Contact Name: Callie Torresan | Type: Employer |
| Company: Cepton Technologies, Inc. |