Within the function, I have a simple IF statement that determines if they live in Tokyo are vising Tokyo or are somewhere else in the world. I define this just like I would any Python function, where the function inputs are contained within the parenthesis – in this case, the input is going to be city and hometown. In the below, I have defined a function called do_they_live_here. Sometimes though, there just isn’t a good way to achieve your desired outcome, without breaking out a good old UDF, so the performance hit is a necessary evil. It gives you ultimate flexibility in what you want to do with the data but does come at the cost of performance. A UDF is a User Defined Function, it’s a function which is entirely coded by the user, rather than using out of the box functions available with PySpark. Now that I’ve warned you, let’s talk about what a UDF is. Okay, so the first thing I should note is that you should avoid UDF’s (User Defined Functions) like the plague unless you absolutely have to I have spoken about why that is here.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |