About the Role
Kogo poszukujemy?
Strong software engineering foundation with production-level coding experience
Advanced SQL skills with the ability to write and optimize complex analytical queries
Advanced Python proficiency
Hands-on experience with Spark / PySpark and Databricks
Experience working with large-scale time-series data and advanced analytics techniques
Understanding of ML data pipelines and how curated datasets support training workflows
Ability to work independently in a remote, cross-timezone collaboration model
Czym będziesz się zajmować?
Analyze large-scale real-world sensor data to identify rare and safety-critical edge cases (e.g. hard braking, unusual traffic behavior)
Develop advanced SQL, Python, and Spark (PySpark) queries to filter, aggregate, and transform high-volume time-series datasets
Build and maintain scalable ETL pipelines converting raw multimodal data into structured simulation-ready formats
Collaborate with AV engineers and researchers to curate datasets for simulation and ML training workflows
Design and implement advanced data mining scripts to improve automated edge-case discovery
Contribute to internal data tooling (analytics workflows, search tools, labeling pipelines) to streamline large-scale data processing
Tech Stack
PythonSQLSparkPySparkDatabricksML data pipelinesAnalyticsETL