Ways to lookup table in Spark scala

val empDF = spark.sql(“select empID,empName,deptID from employee”)
val deptDF = spark.sql(“select deptID,deptName from department”)
val deptRdd = deptDf.rdd
val lookup_DeptMap = deptRdd.map(x => x(0).toString).
zip(deptRdd.map(x=>x(1).toString)).collectAsMap.toMap
def lookup (lookupMap: Map[String,String]) = udf((deptID:String)=>lookupMap.get(deptID))
val empDeptDF =empDF.withColumn(“DeptName”, lookup(lookup_DeptMap)($”deptID”))
spark.conf.get(“spark.sql.autoBroadcastJoinThreshold”)
import org.apache.spark.sql.functions.broadcast
val empDeptDF = empDF.join(broadcast(deptDF),”deptID”)
val empDeptDF = spark.sql(“select /*+ MAPJOIN(department) */ e.empID,e.empName,e.deptID,d.deptName from employee e join department d on e.deptID = d.deptID”)
val empDeptDF = empDF.join(deptDF,(“deptID”))
spark.sql(“SET spark.sql.autoBroadcastJoinThreshold = -1”)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
sasirekha

sasirekha

18 Followers

I’m a Data Engineer. Love sharing ideas and thoughts :)