Sunday, November 15, 2020

Sven interesting shell command -

hadoop fs -text /liveperson/data/remote/DC=VA/storage_Shared/data_Platform/dwh_le/event_type=SatisfactionEvent/year=2020/month=11/day=11/* | grep "\"conversationHandlerAccountId\":{\"string\"" | sed -E 's/.*\"accountId\":\{\"string\":\"([0-9]+)\"\}.*/\1/g' | sort --human-numeric-sort | uniq | wc -l hadoop fs -text /liveperson/data/remote/DC=VA/storage_Shared/data_Platform/dwh_le/event_type=SatisfactionEvent/year=2020/month=11/day=11/* | grep -v "\"conversationHandlerAccountId\":{\"string\"" | sed -E 's/.*\"accountId\":\{\"string\":\"([0-9]+)\"\}.*/\1/g' | sort --human-numeric-sort | uniq | wc -l Pay attention to this sed -E 's/.*\"accountId\":\{\"string\":\"([0-9]+)\"\}.*/\1/g' Which means to take the accountID value right after the "accountId":{"string":" {"header":{"schemaRevision":"5.0.0.1266","eventTimeStamp":1605078051486,"eventUniqueId":{"string":"7AkrXepoQD2zRajMk3LDnA"},"globalSessionId":null,"globalUserId":null,"accountId":{"string":"60270350"},"encrypted":"NONE","platform":"DEFAULT","component So, if I understand, it search for "accountId":{"string":"[0-9]+ regex and fetch the number [0-9]+ represented by this \1 token in the sed syntxt Pay attention to this sort --human-numeric-sort Pay attention to this uniq as well