Basic concepts about Data Engineering
Data Engineer Interview Questions
Data Engineer Interview Questions
Le data engineer est un professionnel de l’informatique présent dans presque tous les secteurs. Il/Elle suit l’évolution et les tendances des données pour orienter les stratégies futures de l’entreprise. Une part essentielle de son travail consiste à transformer des données brutes en données exploitables en créant des pipelines et des systèmes de données.
Questions d'entretien d'embauche fréquentes pour un data engineer (H/F) et comment y répondre
Question 1 : Décrivez en détail votre niveau d’expertise en langage de programmation.
Question 2 : Expliquez selon vous en quoi consiste le data engineering.
Question 3 : Quelle est votre expérience en gestion de données dans le cloud et avec Apache Hadoop ?
20,273 data engineer interview questions shared by candidates
Spark optimizations: what are the optimizations that can be done for the below snippet code: shoppers_df (customers description DF) 250MB, 15M records: schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("retailer_id", StringType, nullable = True), StructField("shopper_group_id", StringType, nullable = True), StructField("join_date", DateType, nullable = True), StructField("shopper_type", StringType, nullable = True), StructField("gender", StringType, nullable = True))) sku_df (dimension DF): 15 MB, 90K records purchase_df (transactions DF): 50GB of parquet compressed files 5,000,000,000 records. schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("product_id", LongType, nullable = True), StructField("pos_id", IntegerType, nullable = True), StructField("purchase_date", DateType, nullable = True), StructField("units", DoubleType, nullable = True), StructField("total_spent", DoubleType, nullable = True))) Current code: products_purchased_df = purchase_df.alias("purchase").join(shoppers_df, on = "shopper_id", how = "left outer").join(sku_df.alias("sku"), on = "product_id").select(Col("purchase.*"), Col("sku.*")) usage: status_df = products_purchased_df.groupBy(["shopper_id", "product_id"]).agg(...) Optimize join statement
We will give you a take-home project to do and you will have to do research and come up with architecture around it?
Two rounds - Online technical test Multiple choice answer and question format (skip questions that are not relevant) Technical questions on current problems the company faced and how you would solve it
Talk about a project that involved Databases
What are your career goals for the next 5 years
Job experience, what model did I used? what is the pros and cons of the model? What can you do to further improve the performance.
How does a lithium ion cell work?
How do you work in a team?
Have you worked with AWS cloud tools?
Viewing 1301 - 1310 interview questions