Data Engineer Interview Questions

Data Engineer Interview Questions

Le data engineer est un professionnel de l’informatique présent dans presque tous les secteurs. Il/Elle suit l’évolution et les tendances des données pour orienter les stratégies futures de l’entreprise. Une part essentielle de son travail consiste à transformer des données brutes en données exploitables en créant des pipelines et des systèmes de données.

Questions d'entretien d'embauche fréquentes pour un data engineer (H/F) et comment y répondre

Question 1

Question 1 : Décrivez en détail votre niveau d’expertise en langage de programmation.

How to answer
Comment répondre : Avant l’entretien, révisez votre CV et dressez la liste des programmes que vous maîtrisez. Si vous vous apercevez que vous ne connaissez pas un logiciel que l’entreprise utilise majoritairement, mettez en avant votre motivation et votre volonté de vous former au logiciel en question.
Question 2

Question 2 : Expliquez selon vous en quoi consiste le data engineering.

How to answer
Comment répondre : Soulignez votre rôle au sein de l’entreprise et par rapport à d’autres fonctions telles que data scientist pour définir clairement votre contribution. Précisez la différence entre un ingénieur axé sur les bases de données et un ingénieur axé sur les pipelines de données.
Question 3

Question 3 : Quelle est votre expérience en gestion de données dans le cloud et avec Apache Hadoop ?

How to answer
Comment répondre : Renseignez-vous sur les logiciels de gestion de données dans le cloud utilisés par l’entreprise (notamment Apache Hadoop). Un data engineer doit maîtriser les langages de programmation et les systèmes de gestion des données couramment employés dans le secteur, dont Apache Hadoop.

20,186 data engineer interview questions shared by candidates

Given a dictionary, print the key for nth highest value present in the dict. If there are more than 1 record present for nth highest value then sort the key and print the first one (alphabetically). N can be higher than the number of elements in the dictionary.
avatar

Data Engineer

Interviewed at Meta

3.6
Aug 17, 2021

Given a dictionary, print the key for nth highest value present in the dict. If there are more than 1 record present for nth highest value then sort the key and print the first one (alphabetically). N can be higher than the number of elements in the dictionary.

Given a list of ints, balance the list so that each int appears equally in the list. Return a dictionary where the key is the int and the value is the count needed to balance the list. [1, 1, 2] => {2: 1} [1, 1, 1, 5, 3, 2, 2] => {5: 2, 3: 2, 2: 1}
avatar

Data Engineer

Interviewed at Meta

3.6
Aug 17, 2021

Given a list of ints, balance the list so that each int appears equally in the list. Return a dictionary where the key is the int and the value is the count needed to balance the list. [1, 1, 2] => {2: 1} [1, 1, 1, 5, 3, 2, 2] => {5: 2, 3: 2, 2: 1}

SQL questions on promotions, sales schema. what %age of products have both non fat and trans fat. find top 5 sales products having promotions what %age of sales happened on first and last day of the promotion Mysql was used and interviewer asked to if this can be done without subquery. Python:- [1,None,1,2,None} --> [1,1,1,2,2] Ensure you take care of case input[None] which means None object. find s in missisipi.
avatar

Data Engineer

Interviewed at Meta

3.6
Jun 29, 2020

SQL questions on promotions, sales schema. what %age of products have both non fat and trans fat. find top 5 sales products having promotions what %age of sales happened on first and last day of the promotion Mysql was used and interviewer asked to if this can be done without subquery. Python:- [1,None,1,2,None} --> [1,1,1,2,2] Ensure you take care of case input[None] which means None object. find s in missisipi.

products sales +------------------+---------+ +------------------+---------+ | product_id | int |------->| product_id | int | | product_class_id | int | +---->| store_id | int | | brand_name | varchar | | +->| customer_id | int | | product_name | varchar | | | | promotion_id | int | | price | int | | | | store_sales | decimal | +------------------+---------+ | | | store_cost | decimal | | | | units_sold | decimal | | | | transaction_date | date | | | +------------------+---------+ | | stores | | customers +-------------------+---------+ | | +---------------------+---------+ | store_id | int |-+ +--| customer_id | int | | type | varchar | | first_name | varchar | | name | varchar | | last_name | varchar | | state | varchar | | state | varchar | | first_opened_date | datetime| | birthdate | date | | last_remodel_date | datetime| | education | varchar | | area_sqft | int | | gender | varchar | +-------------------+---------+ | date_account_opened | date | +---------------------+---------+ Question 1: What brands have an average price above $3 and contain at least 2 different products? Question 2: To improve sales, the marketing department runs various types of promotions. The marketing manager would like to analyze the effectiveness of these promotion campaigns. In particular, what percent of our sales transactions had a valid promotion applied? Question 3: We want to run a new promotion for our most successful category of products (we call these categories “product classes”). Can you find out what are the top 3 selling product classes by total sales? Question 4: We are considering running a promo across brands. We want to target customers who have bought products from two specific brands. Can you find out which customers have bought products from both the “Fort West" and the "Golden" brands?
avatar

Data Engineer

Interviewed at Meta

3.6
May 22, 2020

products sales +------------------+---------+ +------------------+---------+ | product_id | int |------->| product_id | int | | product_class_id | int | +---->| store_id | int | | brand_name | varchar | | +->| customer_id | int | | product_name | varchar | | | | promotion_id | int | | price | int | | | | store_sales | decimal | +------------------+---------+ | | | store_cost | decimal | | | | units_sold | decimal | | | | transaction_date | date | | | +------------------+---------+ | | stores | | customers +-------------------+---------+ | | +---------------------+---------+ | store_id | int |-+ +--| customer_id | int | | type | varchar | | first_name | varchar | | name | varchar | | last_name | varchar | | state | varchar | | state | varchar | | first_opened_date | datetime| | birthdate | date | | last_remodel_date | datetime| | education | varchar | | area_sqft | int | | gender | varchar | +-------------------+---------+ | date_account_opened | date | +---------------------+---------+ Question 1: What brands have an average price above $3 and contain at least 2 different products? Question 2: To improve sales, the marketing department runs various types of promotions. The marketing manager would like to analyze the effectiveness of these promotion campaigns. In particular, what percent of our sales transactions had a valid promotion applied? Question 3: We want to run a new promotion for our most successful category of products (we call these categories “product classes”). Can you find out what are the top 3 selling product classes by total sales? Question 4: We are considering running a promo across brands. We want to target customers who have bought products from two specific brands. Can you find out which customers have bought products from both the “Fort West" and the "Golden" brands?

Viewing 11 - 20 interview questions

Glassdoor has 20,186 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.