Data Engineer

Remote | Technology | Full-time | Fully remote


Who We Are as a Company:

GoKwik was founded in 2020 with one simple mission, to democratize the shopping experience and increase the GMV realization for e-commerce businesses. The company is backed by Sequoia Capital India, Matrix Partners India, RTP Global, and marquee angels.

GoKwik is an e-commerce enablement company focussed predominantly on solving crucial e-commerce issues like boosting conversion rates across the e-commerce funnel and reducing RTO for our partners. It leverages AI/ML technologies to solve hard-hitting problems like RTO to increase CoD conversion rates. GoKwik's 1-click Kwik checkout improves checkout conversion rates ensuring higher GMV realization and reduced marketing CAC.

What we do is very unique with zero immediate competition in India today and therefore, building a team of Real Rockstars in their field to fuel this profit making start-up, which is growing at a rapid pace

We are looking for expertise in Realtime data warehousing and building large-scale Streaming data processing systems by using the latest database technologies.
The Data Engineer takes responsibility for building and running data pipelines, designing our local data warehouse and data frameworks, and catering for different data presentation techniques.

You’ll spend time on the following:

  • Define, Execute and Manage large scale ETL processes to build Datalake, Data warehouse, support development and predictive models
  • Strong Knowledge of building Realtime Data Analytics Pipeline using Kafka, Nifi, Druid and Airflow.
  • Should have strong problem-solving capabilities and ability to quickly propose feasible solutions and effectively communicate strategy and risk mitigation approaches to leadership
  • Build ETL pipelines in Spark, Python, HIVE or SAS that process transaction and account level data and standardize data fields across various data sources
  • Build and maintain high performing ETL processes, including data quality and testing aligned across technology, internal reporting and other functional teams
  • Responsible for the enhancements, data processing and monthly/quarterly deliverables of the products/solutions assigned from the Global solutions portfolio
  • Create data dictionaries, setup/monitor data validation alerts and execute periodic jobs like performance dashboards, predictive models scoring for client's deliverables
  • Define and build technical/data documentation and experience with code version control systems (e.g. git). Ensure data accuracy, integrity and consistency
  • Strong understanding of development and implementation aspects of data pipelines for ML/AI, especially on billion-scale datasets. Ability to take small scale developed models as input and implement with requisite configuration and customization, while maintaining model performance
  • Find opportunities to create, automate and scale repeatable financial and statistical analysis.
  • Strong written, verbal, and interpersonal skills needed to effectively communicate technical insights and recommendations with business customers and leadership team
  • Exposure to model management and governance practices. Ability to take decisions around model drift to monitor and refine models continuously



We’re Excited About You If You Have:

  • 2-3 years experience in creating large scale data engineering pipelines, data-based decision-making and quantitative analysis.
  • Experience with SQL for extracting, aggregating and processing big data Pipelines using Hadoop, EMR NoSQL Databases.
  • Experience with complex, high volume, multi-dimensional data, as well as machine learning models based on unstructured, structured, and streaming datasets
  • Strong Experience with Realtime data management and exposure to code version control systems (git)
  • Advanced experience in writing and optimizing efficient SQL queries with Python and Hive handling Large Data Sets in Big-Data Environments
  • Experience creating/supporting production software/systems and a proven track record of identifying and resolving performance bottlenecks for production systems
  • Experience with Unix/Shell or Python scripting and exposure to Scheduling tools like Apache Airflow.
  • Exposure to deploying large data pipelines to scale ML/AI models built by the data science teams and experience with development of models is a strong plus
  • 4+ yrs. work experience with a Bachelor's Degree or 3+ years of work experience with a Masters or Advanced Degree in an analytical field such as computer science, statistics, finance, economics or relevant area.
  • Working knowledge of Hadoop, Hive ecosystem and associated technologies, (For e.g. Apache Spark, EMR, Apache Nifi, Kafka, Python,Pandas etc.)

Some Important Traits – We look out for a Person in this role

  • Independent, resourceful, analytical, and able to solve problems effectively
  • Ability to be flexible, agile, and thrive in chaos
  • Excellent oral and written communication skills

Our Core Value Cultures:

  • Merchant 1st
  • Innovation
  • Talent

We’re A Remote 1<sup>st </sup>Company

Our Organization was established right in middle of the pandemic, hence we don’t have location barriers across our team, In Fact more then 90% of our Employees “Work from Anywhere” - Which helps in being more flexible in our personal lives and spend less time commuting – At the same time, being together in person is an important part of our culture and shared success. We’ll collaborate in person at a regular cadence and with purpose

The pace of our growth is incredible – if you want to tackle hard and interesting problems at scale, and create an impact within an entrepreneurial environment, Come join us!