Spark10 Pyspark (on Jupyter notebook) In [1]: from IPython.core.display import display, HTML display(HTML("")) View Source In [1]: !pip list | grep pyspark pyspark 3.3.0 In [2]: from pyspark.sql import SparkSession 1. hdfs 에 파일올리기¶ hadoopfs−ls hadoop fs -mkdir /user/philhoonoh $ hadoop fs hadoop fs -put /Users/philhoonoh/Desktop/Hadoop/data.csv /user/philhoonoh/ 2. Spark Session 으로 data.csv를 dataframe 으로 불러오기¶ In [3]: spark = .. 2022. 8. 24. Spark Shell Basic Command Spark Shell 실행 cdSPARK_HOME 정보확인정보프로세스확인밖에서./bin/spark−shellorg.apache.spark.SparkContextscala>scorg.apache.spark.sql.SparkSessionscala>sparkString=local[∗]masternode정보확인scala>sc.masterOption[String]=Some(http://172.16.100.49:4040)sparkUI정보scala>sc.uiWebUrlSparkShellclearscala>(Ctrl+L)SparkShellexitscala>:quit프로세스확인(SparkSubmit)SparkSHELL밖에서 jsp 406 96633 Jps 9502.. 2022. 8. 23. Hadoop HDFS CLI Basic Command Hadoop version 확인 실행hadoopversionHadoopdfs실행 cd HADOOPHOME sbin/start-dfs.sh 실행sbin/stop−dfs.shHadoopyarn실행 cd HADOOPHOME sbin/start-yarn.sh 사용아래개동일같이사용sbin/stop−yarn.shhadoophdfsCLI사용−(아래2개동일)−prefix같이사용 hdfs dfs 명령어형식hadoopfshadoophdfsCLI+Linux명령어형식 hadoop fs -mkdir /user/philhoonoh/input hadoopfs−ls/user/philhoonohput:Local−>HDFS hadoop fs -help put $ ha.. 2022. 8. 23. Apache Spark 6. Spark Monitoring/Runtime/Deployment Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 23. Apache Spark 5. Structured APIs (Dataframe, Spark SQL, Dataset) Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 23. Troubleshooting (Hadoop: Setting up a Single Node Cluster) Local 에 SingleNodeCluster 로 하둡을 실행시 에러 발생 Misplaced &Misplaced & Starting secondary namenodes XXX.XXX.XXX.XXX $ XXX.XXX.XXX.XXX: ssh: Could not resolve hostname XXX.XXX... 2022. 8. 22. Apache Spark 4. What is RDD & DAG? Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22. Apache Spark 3. Apache Spark Streaming Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22. Apache Spark 2. What is Apache Spark? Ref. 아파치 스파크 입문 Apache Hadoop 2022. 8. 22. 이전 1 2 다음