Quiz about Hadoop

Question 1

Partitioner controls the partitioning of what data?

Accepted Answer

intermediate keys

Answer

final keys

Answer

final values

Answer

intermediate values

Question 2

SQL Windowing functions are implemented in Hive using which keywords?

Accepted Answer

OVER, RANK

Answer

UNION DISTINCT, RANK

Answer

OVER, EXCEPT

Answer

UNION DISTINCT, RANK

Question 3

Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

Accepted Answer

Add a partitioned shuffle to the Reduce job.

Answer

Add a partitioned shuffle to the Map job.

Answer

Break the Reduce job into multiple, chained Reduce jobs.

Answer

Break the Reduce job into multiple, chained Map jobs.

Question 4

Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

Accepted Answer

signed HTTP

Answer

encrypted HTTP

Answer

unsigned HTTP

Answer

compressed HTTP

Question 5

MapReduce jobs can be written in which language?

Accepted Answer

Java or Python

Answer

SQL only

Answer

SQL or Java

Answer

Python or SQL

Question 6

To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

Accepted Answer

Combiner

Answer

Reducer

Answer

Mapper

Answer

Counter

Question 7

To verify job status, look for the value `___` in the `___`.

Accepted Answer

SUCCEEDED; stdout

Answer

SUCCEEDED; syslog

Answer

DONE; syslog

Answer

DONE; stdout

Question 8

Which line of code implements a Reducer method in MapReduce 2.0?

Accepted Answer

public void reduce(Text key, Iterator<IntWritable> values, Context context){…}

Answer

public static void reduce(Text key, IntWritable[] values, Context context){…}

Answer

public static void reduce(Text key, Iterator<IntWritable> values, Context context){…}

Answer

public void reduce(Text key, IntWritable[] values, Context context){…}

Question 9

To get the total number of mapped input records in a map job task, you should review the value of which counter?

Accepted Answer

TaskCounter (NOT SURE)

Answer

FileInputFormatCounter

Answer

FileSystemCounter

Answer

JobCounter

Question 10

Hadoop Core supports which CAP capabilities?

Accepted Answer

A, P

Answer

C, A

Answer

C, P

Answer

C, A, P

Question 11

What are the primary phases of a Reducer?

Accepted Answer

shuffle, sort, and reduce

Answer

combine, map, and reduce

Answer

reduce, sort, and combine

Answer

map, sort, and combine

Question 12

To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `___` service, which is `___`.

Accepted Answer

Zookeeper; open source

Answer

Oozie; open source

Answer

Oozie; commercial software

Answer

Zookeeper; commercial software

Question 13

For high availability, which type of multiple nodes should you use?

Accepted Answer

name

Answer

data

Answer

memory

Answer

worker

Question 14

DataNode supports which type of drives?

Accepted Answer

hot swappable

Answer

cold swappable

Answer

warm swappable

Answer

non-swappable

Question 15

Which method is used to implement Spark jobs?

Accepted Answer

in memory of all workers

Answer

on disk of all workers

Answer

on disk of the master node

Answer

in memory of the master node

Question 16

In a MapReduce job, where does the map() function run?

Accepted Answer

on the data nodes of the cluster (NOT SURE)

Answer

on the reducer nodes of the cluster

Answer

on the master node of the cluster

Answer

on every node of the cluster

Question 17

To reference a master file for lookups during Mapping, what type of cache should be used?

Accepted Answer

distributed cache

Answer

local cache

Answer

partitioned cache

Answer

cluster cache

Question 18

Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

Accepted Answer

map inputs

Answer

cache inputs

Answer

reducer inputs

Answer

intermediate values

Question 19

Which command imports data to Hadoop from a MySQL database?

Accepted Answer

sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

Answer

spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --warehouse-dir user/hue/oozie/deployments/spark

Answer

sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

Answer

spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --password spark --warehouse-dir user/hue/oozie/deployments/spark

Question 20

In what form is Reducer output presented?

Accepted Answer

compressed (NOT SURE)

Answer

sorted

Answer

not sorted

Answer

encrypted

Question 21

Which library should be used to unit test MapReduce code?

Accepted Answer

MRUnit

Answer

JUnit

Answer

XUnit

Answer

HadoopUnit

Question 22

If you started the NameNode, then which kind of user must you be?

Accepted Answer

super-user

Answer

hadoop-user

Answer

node-user

Answer

admin-user

Question 23

State \_ between the JVMs in a MapReduce job

Accepted Answer

is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)

Answer

can be configured to be shared

Answer

is partially shared

Answer

is shared

Question 24

To create a MapReduce job, what should be coded first?

Accepted Answer

a Job class and instance (NOT SURE)

Answer

a static job() method

Answer

a job() method

Answer

a static Job class

Question 25

To connect Hadoop to AWS S3, which client should you use?

Accepted Answer

S3A

Answer

S3N

Answer

S3

Answer

the EMR S3

Question 26

HBase works with which type of schema enforcement?

Accepted Answer

schema on read

Answer

schema on write

Answer

no schema

Answer

external schema

Question 27

HDFS files are of what type?

Accepted Answer

append-only

Answer

read-write

Answer

read-only

Answer

write-only

Question 28

A distributed cache file path can originate from what location?

Accepted Answer

hdfs or http

Answer

hdfs or top

Answer

http

Answer

hdfs

Question 29

Which library should you use to perform ETL-type MapReduce jobs?

Accepted Answer

Pig

Answer

Hive

Answer

Impala

Answer

Mahout

Question 30

What is the output of the Reducer?

Accepted Answer

a set of <key, value> pairs

Answer

a relational table

Answer

an update to the input file

Answer

a single, combined list

Question 31

To optimize a Mapper, what should you perform first?

Accepted Answer

Break up Mappers that do more than one task into multiple Mappers.

Answer

Override the default Partitioner.

Answer

Skip bad records.

Answer

Combine Mappers that do one task into large Mappers.

Question 32

When implemented on a public cloud, with what does Hadoop processing interact?

Accepted Answer

files in object storage

Answer

graph data in graph databases

Answer

relational data in managed RDBMS systems

Answer

JSON data in NoSQL databases

Question 33

In the Hadoop system, what administrative mode is used for maintenance?

Accepted Answer

safe mode

Answer

data mode

Answer

single-user mode

Answer

pseudo-distributed mode

Question 34

In what format does RecordWriter write an output file?

Accepted Answer

<key, value> pairs

Answer

keys

Answer

values

Answer

<value, key> pairs

Question 35

To what does the Mapper map input key/value pairs?

Accepted Answer

a set of intermediate key/value pairs

Answer

an average of keys for values

Answer

a sum of keys for values

Answer

a set of final key/value pairs

Question 36

Which Hive query returns the first 1,000 values?

Accepted Answer

SELECT … LIMIT 1000

Answer

SELECT…WHERE value = 1000

Answer

SELECT TOP 1000 …

Answer

SELECT MAX 1000…

Question 37

To implement high availability, how many instances of the master node should you configure?

Accepted Answer

two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)

Answer

one

Answer

zero

Answer

shared

Question 38

Hadoop 2.x and later implement which service as the resource coordinator?

Accepted Answer

YARN

Answer

kubernetes

Answer

JobManager

Answer

JobTracker

Question 39

In MapReduce, **\_** have \_

Accepted Answer

jobs; tasks

Answer

tasks; jobs

Answer

jobs; activities

Answer

activities; tasks

Question 40

What type of software is Hadoop Common?

Accepted Answer

distributed computing framework

Answer

database

Answer

operating system

Answer

productivity tool

Question 41

If no reduction is desired, you should set the numbers of \_ tasks to zero.

Accepted Answer

reduce

Answer

combiner

Answer

mapper

Answer

intermediate

Question 42

MapReduce applications use which of these classes to report their statistics?

Accepted Answer

counter

Answer

mapper

Answer

reducer

Answer

combiner

Question 43

\_ is the query language, and \_ is storage for NoSQL on Hadoop.

Accepted Answer

HQL; HBase

Answer

HDFS; HQL

Answer

HDFS; SQL

Answer

SQL; HBase

Question 44

MapReduce 1.0 \_ YARN.

Accepted Answer

does not include

Answer

is the same thing as

Answer

includes

Answer

replaces

Question 45

Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?

Accepted Answer

NameNode

Answer

ControllerNode

Answer

DataNode

Answer

MetadataNode

Question 46

HQL queries produce which job types?

Accepted Answer

MapReduce

Answer

Impala

Answer

Spark

Answer

Pig

Question 47

#### Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below?

1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'...
2

Accepted Answer

as (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Answer

as (text:CHAR[]); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Answer

as (text:CHAR[]); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Answer

as (text:CHARARRAY); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Question 48

In a MapReduce job, which phase runs after the Map phase completes?

Accepted Answer

Combiner

Answer

Reducer

Answer

Map2

Answer

Shuffle and Sort

Question 49

Where would you configure the size of a block in a Hadoop environment?

Accepted Answer

dfs.block.size in hdfs-site.xmls

Answer

orc.write.variable.length.blocks in hive-default.xml

Answer

mapreduce.job.ubertask.maxbytes in mapred-site.xml

Answer

hdfs.block.size in hdfs-site.xml

Question 50

Hadoop systems are **\_** RDBMS systems.

Accepted Answer

additions for

Answer

replacements for

Answer

not used with

Answer

substitutes for

Question 51

Which object can be used to distribute jars or libraries for use in MapReduce tasks?

Accepted Answer

distributed cache

Answer

library manager

Answer

lookup store

Answer

registry

Question 52

To view the execution details of an Impala query plan, which function would you use?

Accepted Answer

explain

Answer

query action

Answer

detail

Answer

query plan

Question 53

Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?

Accepted Answer

snapshot

Answer

partitioning

Answer

replication

Answer

high availability

Question 54

Hadoop Common is written in which language?

Accepted Answer

Java

Answer

C++

Answer

C

Answer

Haskell

Question 55

Which file system does Hadoop use for storage?

Accepted Answer

HDFS

Answer

NAS

Answer

FAT

Answer

NFS

Question 56

What kind of storage and processing does Hadoop support?

Accepted Answer

distributed

Answer

encrypted

Answer

verified

Answer

remote

Question 57

Hadoop Common consists of which components?

Accepted Answer

HDFS and MapReduce

Answer

Spark and YARN

Answer

HDFS and S3

Answer

Spark and MapReduce

Question 58

Most Apache Hadoop committers' work is done at which commercial company?

Accepted Answer

Amazon

Answer

Cloudera

Answer

Microsoft

Answer

Google

Question 59

To get information about Reducer job runs, which object should be added?

Accepted Answer

Reporter

Answer

IntReadable

Answer

IntWritable

Answer

Writer

Question 60

After changing the default block size and restarting the cluster, to which data does the new size apply?

Accepted Answer

new data

Answer

all data

Answer

no data

Answer

existing data

Question 61

#### Which statement should you add to improve the performance of the following query?

```sql
SELECT
  c.id,
  c.name,
  c.email_preferences.categories.surveys
FROM customers c;
```

Accepted Answer

SORT

Answer

GROUP BY

Answer

FILTER

Answer

SUB-SELECT

Question 62

What custom object should you implement to reduce IO in MapReduce?

Accepted Answer

Combiner

Answer

Comparator

Answer

Mapper

Answer

Reducer

Question 63

You can optimize Hive queries using which method?

Accepted Answer

secondary indices

Answer

summary statistics

Answer

column-based statistics

Answer

a primary key index

Question 64

If you are processing a single action on each input, what type of job should you create?

Accepted Answer

map-only

Answer

partition-only

Answer

reduce-only

Answer

combine-only

Q1. Partitioner controls the partitioning of what data?

Q2. SQL Windowing functions are implemented in Hive using which keywords?

Q3. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

Q4. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

Q5. MapReduce jobs can be written in which language?

Q6. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

Q7. To verify job status, look for the value ___ in the ___.

Q8. Which line of code implements a Reducer method in MapReduce 2.0?

Q9. To get the total number of mapped input records in a map job task, you should review the value of which counter?

Q10. Hadoop Core supports which CAP capabilities?

Q11. What are the primary phases of a Reducer?

Q12. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the ___ service, which is ___.

Q13. For high availability, which type of multiple nodes should you use?

Q14. DataNode supports which type of drives?

Q15. Which method is used to implement Spark jobs?

Q16. In a MapReduce job, where does the map() function run?

Q17. To reference a master file for lookups during Mapping, what type of cache should be used?

Q18. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

Q19. Which command imports data to Hadoop from a MySQL database?

Q20. In what form is Reducer output presented?

Q21. Which library should be used to unit test MapReduce code?

Q22. If you started the NameNode, then which kind of user must you be?

Q23. State _ between the JVMs in a MapReduce job

Q24. To create a MapReduce job, what should be coded first?

Q25. To connect Hadoop to AWS S3, which client should you use?

Q26. HBase works with which type of schema enforcement?

Q27. HDFS files are of what type?

Q28. A distributed cache file path can originate from what location?

Q29. Which library should you use to perform ETL-type MapReduce jobs?

Q30. What is the output of the Reducer?

Q31. To optimize a Mapper, what should you perform first?

Q32. When implemented on a public cloud, with what does Hadoop processing interact?

Q33. In the Hadoop system, what administrative mode is used for maintenance?

Q34. In what format does RecordWriter write an output file?

Q35. To what does the Mapper map input key/value pairs?

Q36. Which Hive query returns the first 1,000 values?

Q37. To implement high availability, how many instances of the master node should you configure?

Q38. Hadoop 2.x and later implement which service as the resource coordinator?

Q39. In MapReduce, _ have _

Q40. What type of software is Hadoop Common?

Q41. If no reduction is desired, you should set the numbers of _ tasks to zero.

Q42. MapReduce applications use which of these classes to report their statistics?

Q43. _ is the query language, and _ is storage for NoSQL on Hadoop.

Q44. MapReduce 1.0 _ YARN.

Q45. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?

Q46. HQL queries produce which job types?

Q47. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below?

Q48. In a MapReduce job, which phase runs after the Map phase completes?

Q49. Where would you configure the size of a block in a Hadoop environment?

Q50. Hadoop systems are _ RDBMS systems.

Q51. Which object can be used to distribute jars or libraries for use in MapReduce tasks?

Q52. To view the execution details of an Impala query plan, which function would you use?

Q53. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?

Q54. Hadoop Common is written in which language?

Q55. Which file system does Hadoop use for storage?

Q56. What kind of storage and processing does Hadoop support?

Q57. Hadoop Common consists of which components?

Q58. Most Apache Hadoop committers' work is done at which commercial company?

Q59. To get information about Reducer job runs, which object should be added?

Q60. After changing the default block size and restarting the cluster, to which data does the new size apply?

Q61. Which statement should you add to improve the performance of the following query?

Q62. What custom object should you implement to reduce IO in MapReduce?

Q63. You can optimize Hive queries using which method?

Q64. If you are processing a single action on each input, what type of job should you create?

Q65. The simplest possible MapReduce job optimization is to perform which of these actions?

Q66. When you implement a custom Writable, you must also define which of these object?

Q67. To copy a file into the Hadoop file system, what command should you use?

Q68. Delete a Hive _ table and you will delete the table _.

Q69. To see how Hive executed a JOIN operation, use the _ statement and look for the _ value.

Q70. Pig operates in mainly how many nodes?

Q71. After loading data, _ and then run a(n) _ query for interactive queries.

Q72. In Hadoop MapReduce job code, what must be static?

Q73. In Hadoop simple mode, which object determines the identity of a client process?

Q74. Which is not a valid input format for a MapReduce job?

Q75. If you see org.apache.hadoop.mapred, which version of MapReduce are you working with?

Q7. To verify job status, look for the value `_` in the `_`.

Q12. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `_` service, which is `_`.