본문 바로가기

Spark10

Pyspark (on Jupyter notebook) In [1]: from IPython.core.display import display, HTML display(HTML("")) View Source In [1]: !pip list | grep pyspark pyspark 3.3.0 In [2]: from pyspark.sql import SparkSession 1. hdfs 에 파일올리기¶ hadoopfsls hadoop fs -mkdir /user/philhoonoh $ hadoop fs hadoop fs -put /Users/philhoonoh/Desktop/Hadoop/data.csv /user/philhoonoh/ 2. Spark Session 으로 data.csv를 dataframe 으로 불러오기¶ In [3]: spark = .. 2022. 8. 24.
Spark Shell Basic Command Spark Shell 실행 cdSPARK_HOME ./bin/sparkshellorg.apache.spark.SparkContextscala>scorg.apache.spark.sql.SparkSessionscala>sparkString=local[]masternodescala>sc.masterOption[String]=Some(http://172.16.100.49:4040)sparkUIscala>sc.uiWebUrlSparkShellclearscala>(Ctrl+L)SparkShellexitscala>:quit(SparkSubmit)SparkSHELL jsp 406 96633 Jps 9502.. 2022. 8. 23.
Hadoop HDFS CLI Basic Command Hadoop version 확인 hadoopversionHadoopdfs cd HADOOPHOME sbin/start-dfs.sh sbin/stopdfs.shHadoopyarn cd HADOOPHOME sbin/start-yarn.sh sbin/stopyarn.shhadoophdfsCLI(2)prefix hdfs dfs hadoopfshadoophdfsCLI+Linux hadoop fs -mkdir /user/philhoonoh/input hadoopfsls/user/philhoonohput:Local>HDFS hadoop fs -help put $ ha.. 2022. 8. 23.
Apache Spark 6. Spark Monitoring/Runtime/Deployment Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 23.
Apache Spark 5. Structured APIs (Dataframe, Spark SQL, Dataset) Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 23.
Troubleshooting (Hadoop: Setting up a Single Node Cluster) Local 에 SingleNodeCluster 로 하둡을 실행시 에러 발생 Misplaced & Starting secondary namenodes XXX.XXX.XXX.XXX $ XXX.XXX.XXX.XXX: ssh: Could not resolve hostname XXX.XXX... 2022. 8. 22.
Apache Spark 4. What is RDD & DAG? Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22.
Apache Spark 3. Apache Spark Streaming Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22.
Apache Spark 2. What is Apache Spark? Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22.