2024 문화 여가 활동 분석 및 시각화¶

2021.11. ~ 2024.01. 문화여가 활동을 분석하고, 시각화합니다.

1. 환경 설정¶

한글 폰트 설치

In [1]:

!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm ~/.cache/matplotlib -rf

구글 드라이브를 사용하는 경우, 추가 설정

In [2]:

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

1-1. 데이터 병합¶

데이터 이름: MZ세대 문화여가 활동 특징
데이터 설명
- 온라인(PC모바일) 소비자 서베이 데이터(2021.11월부터 매주 수집)
- 2030세대의 관심 여가활동에 대한 태도와 활동 전반에 대한 응답내용
- 2030세대의 주 여가시간 활용목적의 1-2순위, 일평균 여가활동 시간, 주로 여가시간을 보내는 방법의 비율, 관심여가활동 1~5순위 값을 포함함.
- 본 데이터 셋에는 응답자 특성(성별, 연령대, 거주지역, 소득수준 등)이 포함됨.
데이터 출처: ㈜컨슈머인사이트 정기 기획조사 ‘여가문화체육 조사'

In [ ]:

import os

directory = "/content/drive/MyDrive/Project/Data_viz/2024_문화여가_활동_분석/data/"

In [ ]:

import pandas as pd

file_paths = os.listdir(directory)

# 각 파일을 Pandas DataFrame으로 읽고 하나로 합치기
df_list = [pd.read_csv(directory+file_path) for file_path in file_paths]
combined_df = pd.concat(df_list, ignore_index=True)

combined_df.info(), combined_df.head()

In [ ]:

combined_df.to_csv("/content/drive/MyDrive/Project/Data_viz/2024_문화여가_활동_분석/2024_문화여가_활동_분석.csv", index=False)

2. 데이터 분석¶

In [3]:

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Project/Data_viz/2024_문화여가_활동_분석/2024_문화여가_활동_분석.csv")

In [5]:

df.describe()

Out[5]:

	RESPOND_ID	EXAMIN_BEGIN_DE	WORKDAY_DAY_AVRG_LSR_TIME_VALUE	WKEND_DAY_AVRG_LSR_TIME_VALUE	LSR_TIME_REST_RCRT_USE_RATE	LSR_TIME_HOBBY_USE_RATE	LSR_TIME_SELF_IMPT_USE_RATE	LSR_TIME_TWDPSN_RLTN_FLWSP_USE_RATE	LSR_TIME_ETC_USE_RATE
count	2.107300e+04	2.107300e+04	21073.000000	21073.000000	21073.000000	21073.000000	21073.000000	21073.000000	21073.000000
mean	5.162086e+07	2.023202e+07	3.104731	5.788497	42.454468	19.224363	12.887202	22.697575	2.736393
std	9.407137e+06	3.066950e+03	2.264285	3.471535	26.486978	20.176764	15.962274	19.555753	11.273799
min	4.449000e+03	2.023050e+07	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	5.333435e+07	2.023071e+07	2.000000	3.000000	20.000000	0.000000	0.000000	10.000000	0.000000
50%	5.334800e+07	2.023091e+07	3.000000	5.000000	40.000000	15.000000	10.000000	20.000000	0.000000
75%	5.336166e+07	2.023112e+07	4.000000	8.000000	60.000000	30.000000	20.000000	30.000000	0.000000
max	5.337753e+07	2.024013e+07	18.000000	18.000000	100.000000	100.000000	100.000000	100.000000	100.000000

2-1. 데이터 분포 확인¶

In [6]:

import numpy as np

# 데이터별 컬럼에 대한 분포를 확인하는 함수
def categorical_stats(data, column_name):
    return df[column_name].value_counts(normalize=True) * 100

In [7]:

# 데이터별 컬럼에 대해 조사일자를 기준으로 주요 문화여가 활동을 확인하는 함수
def analyze_top_activities(group_columns):
    return df.groupby(group_columns)['INTRST_LSR_ACT_RN1_VALUE'].agg(
        lambda x: x.value_counts().index[0]).reset_index().rename(columns={'INTRST_LSR_ACT_RN1_VALUE': 'Top Leisure Activity'})

In [8]:

def analyze_overall_top_activities(group_column):
    return df.groupby(group_column)['INTRST_LSR_ACT_RN1_VALUE'].agg(
        lambda x: x.value_counts().index[0]).reset_index().rename(columns={'INTRST_LSR_ACT_RN1_VALUE': 'Top Leisure Activity'})

2-1-1. 성별¶

In [9]:

# Calculate statistics for key columns
gender_distribution = categorical_stats(df, 'SEXDSTN_FLAG_CD')
gender_distribution

Out[9]:

M    50.643003
F    49.356997
Name: SEXDSTN_FLAG_CD, dtype: float64

In [10]:

top_activities_by_gender = analyze_top_activities(['EXAMIN_BEGIN_DE', 'SEXDSTN_FLAG_CD'])
top_activities_by_gender

Out[10]:

	EXAMIN_BEGIN_DE	SEXDSTN_FLAG_CD	Top Leisure Activity
0	20230503	F	산책-걷기
1	20230503	M	영상 컨텐츠 시청
2	20230510	F	산책-걷기
3	20230510	M	영상 컨텐츠 시청
4	20230517	F	산책-걷기
...	...	...	...
75	20240117	M	영상 컨텐츠 시청
76	20240124	F	영상 컨텐츠 시청
77	20240124	M	영상 컨텐츠 시청
78	20240131	F	영상 컨텐츠 시청
79	20240131	M	영상 컨텐츠 시청

80 rows × 3 columns

In [11]:

top_activities_by_gender[top_activities_by_gender['EXAMIN_BEGIN_DE'] > 20231231]

Out[11]:

	EXAMIN_BEGIN_DE	SEXDSTN_FLAG_CD	Top Leisure Activity
70	20240103	F	영상 컨텐츠 시청
71	20240103	M	게임
72	20240110	F	영상 컨텐츠 시청
73	20240110	M	게임
74	20240117	F	영상 컨텐츠 시청
75	20240117	M	영상 컨텐츠 시청
76	20240124	F	영상 컨텐츠 시청
77	20240124	M	영상 컨텐츠 시청
78	20240131	F	영상 컨텐츠 시청
79	20240131	M	영상 컨텐츠 시청

In [12]:

overall_top_activities_by_gender = analyze_overall_top_activities('SEXDSTN_FLAG_CD')
overall_top_activities_by_gender

Out[12]:

	SEXDSTN_FLAG_CD	Top Leisure Activity
0	F	영상 컨텐츠 시청
1	M	영상 컨텐츠 시청

2-1-2. 연령대¶

In [13]:

age_group_distribution = categorical_stats(df, 'AGRDE_FLAG_NM')
age_group_distribution

Out[13]:

50대    23.076923
40대    21.677027
60대    20.082570
30대    17.947136
20대    17.216343
Name: AGRDE_FLAG_NM, dtype: float64

In [14]:

top_activities_by_age_group = analyze_top_activities(['EXAMIN_BEGIN_DE', 'AGRDE_FLAG_NM'])
top_activities_by_age_group

Out[14]:

	EXAMIN_BEGIN_DE	AGRDE_FLAG_NM	Top Leisure Activity
0	20230503	20대	영상 컨텐츠 시청
1	20230503	30대	게임
2	20230503	40대	산책-걷기
3	20230503	50대	산책-걷기
4	20230503	60대	산책-걷기
...	...	...	...
195	20240131	20대	영상 컨텐츠 시청
196	20240131	30대	영상 컨텐츠 시청
197	20240131	40대	영상 컨텐츠 시청
198	20240131	50대	산책-걷기
199	20240131	60대	종교활동

200 rows × 3 columns

In [15]:

top_activities_by_gender[top_activities_by_gender['EXAMIN_BEGIN_DE'] > 20231231]

Out[15]:

	EXAMIN_BEGIN_DE	SEXDSTN_FLAG_CD	Top Leisure Activity
70	20240103	F	영상 컨텐츠 시청
71	20240103	M	게임
72	20240110	F	영상 컨텐츠 시청
73	20240110	M	게임
74	20240117	F	영상 컨텐츠 시청
75	20240117	M	영상 컨텐츠 시청
76	20240124	F	영상 컨텐츠 시청
77	20240124	M	영상 컨텐츠 시청
78	20240131	F	영상 컨텐츠 시청
79	20240131	M	영상 컨텐츠 시청

In [16]:

overall_top_activities_by_age_group = analyze_overall_top_activities('AGRDE_FLAG_NM')
overall_top_activities_by_age_group

Out[16]:

	AGRDE_FLAG_NM	Top Leisure Activity
0	20대	영상 컨텐츠 시청
1	30대	영상 컨텐츠 시청
2	40대	영상 컨텐츠 시청
3	50대	산책-걷기
4	60대	산책-걷기

2-1-3. 지역별¶

In [17]:

area_distribution = categorical_stats(df, 'ANSWRR_OC_AREA_NM')
area_distribution

Out[17]:

경기도        23.926351
서울특별시      18.663693
부산광역시       7.269966
인천광역시       6.301903
경상남도        6.249703
경상북도        4.778627
대구광역시       4.626774
충청남도        4.038343
충청북도        3.729891
대전광역시       3.454658
광주광역시       3.198406
전라남도        3.112988
전라북도        3.079770
강원도         2.970626
울산광역시       2.467613
제주특별자치도     1.304987
세종특별자치시     0.825701
Name: ANSWRR_OC_AREA_NM, dtype: float64

In [18]:

top_activities_by_region = analyze_top_activities(['EXAMIN_BEGIN_DE', 'ANSWRR_OC_AREA_NM'])
top_activities_by_region

Out[18]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	Top Leisure Activity
0	20230503	강원도	종교활동
1	20230503	경기도	영상 컨텐츠 시청
2	20230503	경상남도	영상 컨텐츠 시청
3	20230503	경상북도	등산 직접 하기
4	20230503	광주광역시	걷기-속보-조깅 직접 하기
...	...	...	...
674	20240131	전라남도	걷기-속보-조깅 직접 하기
675	20240131	전라북도	영화관 관람
676	20240131	제주특별자치도	만화책 보기
677	20240131	충청남도	게임
678	20240131	충청북도	낮잠자기

679 rows × 3 columns

In [19]:

top_activities_by_region[top_activities_by_region['EXAMIN_BEGIN_DE'] > 20231231]

Out[19]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	Top Leisure Activity
594	20240103	강원도	국내 여행
595	20240103	경기도	게임
596	20240103	경상남도	영상 컨텐츠 시청
597	20240103	경상북도	영상 컨텐츠 시청
598	20240103	광주광역시	산책-걷기
...	...	...	...
674	20240131	전라남도	걷기-속보-조깅 직접 하기
675	20240131	전라북도	영화관 관람
676	20240131	제주특별자치도	만화책 보기
677	20240131	충청남도	게임
678	20240131	충청북도	낮잠자기

85 rows × 3 columns

In [20]:

top_activities_by_region[(top_activities_by_region['EXAMIN_BEGIN_DE'] > 20231231) & (top_activities_by_region['ANSWRR_OC_AREA_NM'].isin(["서울특별시", "강원도"]))]

Out[20]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	Top Leisure Activity
594	20240103	강원도	국내 여행
602	20240103	서울특별시	영상 컨텐츠 시청
611	20240110	강원도	게임
619	20240110	서울특별시	영상 컨텐츠 시청
628	20240117	강원도	걷기-속보-조깅 직접 하기
636	20240117	서울특별시	산책-걷기
645	20240124	강원도	국내 여행
653	20240124	서울특별시	걷기-속보-조깅 직접 하기
662	20240131	강원도	어학-기술-자격증 취득
670	20240131	서울특별시	영상 컨텐츠 시청

In [21]:

overall_top_activities_by_region = analyze_overall_top_activities('ANSWRR_OC_AREA_NM')
overall_top_activities_by_region

Out[21]:

	ANSWRR_OC_AREA_NM	Top Leisure Activity
0	강원도	산책-걷기
1	경기도	영상 컨텐츠 시청
2	경상남도	영상 컨텐츠 시청
3	경상북도	영상 컨텐츠 시청
4	광주광역시	영상 컨텐츠 시청
5	대구광역시	영상 컨텐츠 시청
6	대전광역시	영상 컨텐츠 시청
7	부산광역시	영상 컨텐츠 시청
8	서울특별시	영상 컨텐츠 시청
9	세종특별자치시	산책-걷기
10	울산광역시	영상 컨텐츠 시청
11	인천광역시	영상 컨텐츠 시청
12	전라남도	영상 컨텐츠 시청
13	전라북도	영상 컨텐츠 시청
14	제주특별자치도	걷기-속보-조깅 직접 하기
15	충청남도	영상 컨텐츠 시청
16	충청북도	영상 컨텐츠 시청

2-1-4. 소득 수준¶

In [22]:

income_level_distribution = categorical_stats(df, 'HSHLD_INCOME_DGREE_NM')
income_level_distribution

Out[22]:

300이상500만원 미만    28.572106
300만원 미만         23.442320
700만원 이상         21.022161
500이상700만원 미만    20.025625
무응답               6.937788
Name: HSHLD_INCOME_DGREE_NM, dtype: float64

In [23]:

top_activities_by_income_level = analyze_top_activities(['EXAMIN_BEGIN_DE', 'HSHLD_INCOME_DGREE_NM'])
top_activities_by_income_level

Out[23]:

	EXAMIN_BEGIN_DE	HSHLD_INCOME_DGREE_NM	Top Leisure Activity
0	20230503	300만원 미만	산책-걷기
1	20230503	300이상500만원 미만	영상 컨텐츠 시청
2	20230503	500이상700만원 미만	산책-걷기
3	20230503	700만원 이상	산책-걷기
4	20230503	무응답	영상 컨텐츠 시청
...	...	...	...
195	20240131	300만원 미만	영상 컨텐츠 시청
196	20240131	300이상500만원 미만	영상 컨텐츠 시청
197	20240131	500이상700만원 미만	영상 컨텐츠 시청
198	20240131	700만원 이상	영상 컨텐츠 시청
199	20240131	무응답	영상 컨텐츠 시청

200 rows × 3 columns

In [24]:

top_activities_by_income_level[top_activities_by_income_level['EXAMIN_BEGIN_DE'] > 20231231]

Out[24]:

	EXAMIN_BEGIN_DE	HSHLD_INCOME_DGREE_NM	Top Leisure Activity
175	20240103	300만원 미만	산책-걷기
176	20240103	300이상500만원 미만	영상 컨텐츠 시청
177	20240103	500이상700만원 미만	영상 컨텐츠 시청
178	20240103	700만원 이상	산책-걷기
179	20240103	무응답	영상 컨텐츠 시청
180	20240110	300만원 미만	영상 컨텐츠 시청
181	20240110	300이상500만원 미만	영상 컨텐츠 시청
182	20240110	500이상700만원 미만	영상 컨텐츠 시청
183	20240110	700만원 이상	영상 컨텐츠 시청
184	20240110	무응답	영상 컨텐츠 시청
185	20240117	300만원 미만	게임
186	20240117	300이상500만원 미만	영상 컨텐츠 시청
187	20240117	500이상700만원 미만	영상 컨텐츠 시청
188	20240117	700만원 이상	영상 컨텐츠 시청
189	20240117	무응답	영상 컨텐츠 시청
190	20240124	300만원 미만	산책-걷기
191	20240124	300이상500만원 미만	영상 컨텐츠 시청
192	20240124	500이상700만원 미만	영상 컨텐츠 시청
193	20240124	700만원 이상	산책-걷기
194	20240124	무응답	영상 컨텐츠 시청
195	20240131	300만원 미만	영상 컨텐츠 시청
196	20240131	300이상500만원 미만	영상 컨텐츠 시청
197	20240131	500이상700만원 미만	영상 컨텐츠 시청
198	20240131	700만원 이상	영상 컨텐츠 시청
199	20240131	무응답	영상 컨텐츠 시청

In [25]:

overall_top_activities_by_income_level = analyze_overall_top_activities('HSHLD_INCOME_DGREE_NM')
overall_top_activities_by_income_level

Out[25]:

	HSHLD_INCOME_DGREE_NM	Top Leisure Activity
0	300만원 미만	영상 컨텐츠 시청
1	300이상500만원 미만	영상 컨텐츠 시청
2	500이상700만원 미만	영상 컨텐츠 시청
3	700만원 이상	영상 컨텐츠 시청
4	무응답	영상 컨텐츠 시청

2-1-5. 관심 레저 활동¶

In [26]:

# Top interest leisure activities
top_interest_activities_1 = categorical_stats(df, 'INTRST_LSR_ACT_RN1_VALUE')
top_interest_activities_2 = categorical_stats(df, 'INTRST_LSR_ACT_RN2_VALUE')

In [27]:

top_interest_activities_1.head()

Out[27]:

영상 컨텐츠 시청         9.310492
산책-걷기             6.918806
걷기-속보-조깅 직접 하기    5.632800
게임                5.424002
국내 여행             4.498648
Name: INTRST_LSR_ACT_RN1_VALUE, dtype: float64

In [28]:

top_interest_activities_2.head()

Out[28]:

없음                15.996773
영상 컨텐츠 시청          6.045651
산책-걷기              6.036160
걷기-속보-조깅 직접 하기     4.185451
쇼핑                 3.886490
Name: INTRST_LSR_ACT_RN2_VALUE, dtype: float64

3. 데이터 시각화¶

In [29]:

import matplotlib.pyplot as plt
plt.rc('font', family='NanumBarunGothic')
plt.rcParams['axes.unicode_minus'] =False

In [31]:

import seaborn as sns

def visualize_pie_activities(data, title, ax):
    # Counting the frequency of each top leisure activity
    activity_counts = data['Top Leisure Activity'].value_counts()
    # Creating the pie plot
    activity_counts.plot(kind='pie', ax=ax, autopct='%1.1f%%', startangle=90, counterclock=False, legend=True)
    ax.set_ylabel('')  # Remove the y-label
    ax.set_title(title)

# Setting up the matplotlib figure for the grouped visualizations
fig, axs = plt.subplots(2, 2, figsize=(14, 14))
titles = [
    'Overall Top Leisure Activity by Gender',
    'Overall Top Leisure Activity by Age Group',
    'Overall Top Leisure Activity by Region',
    'Overall Top Leisure Activity by Income Level'
]

# Visualizing each set of top activities in pie charts
visualize_pie_activities(overall_top_activities_by_gender, titles[0], axs[0, 0])
visualize_pie_activities(overall_top_activities_by_age_group, titles[1], axs[0, 1])
visualize_pie_activities(overall_top_activities_by_region, titles[2], axs[1, 0])
visualize_pie_activities(overall_top_activities_by_income_level, titles[3], axs[1, 1])

plt.tight_layout()
plt.show()

No description has been provided for this image

4. 데이터 애니메이션 시각화¶

4-1. 전체 대상 시각화¶

날짜 기준 상위 10개만 남기고 나머지는 기타 처리

In [ ]:

def keep_top_10_activities(group):
    top_10_activities = group['INTRST_LSR_ACT_RN1_VALUE'].value_counts().nlargest(10).index
    group['INTRST_LSR_ACT_RN1_VALUE'] = group['INTRST_LSR_ACT_RN1_VALUE'].apply(lambda x: x if x in top_10_activities else '기타')
    return group

df_transformed = df.groupby('EXAMIN_BEGIN_DE').apply(keep_top_10_activities)

df_transformed.reset_index(drop=True, inplace=True)

<ipython-input-128-ac0f2abd41e3>:10: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

df_transformed[df_transformed['EXAMIN_BEGIN_DE'] == 20230614]['INTRST_LSR_ACT_RN1_VALUE'].value_counts()

Out[ ]:

기타                275
산책-걷기              50
영상 컨텐츠 시청          40
게임                 32
걷기-속보-조깅 직접 하기     28
국내 여행              22
영화관 관람             17
쇼핑                 17
친구-이성친구 만남         15
가족-친지 만남           14
낮잠자기               14
Name: INTRST_LSR_ACT_RN1_VALUE, dtype: int64

날짜(EXAMIN_BEGIN_DE ) 별로 Top7 INTRST_LSR_ACT_RN1_VALUE 를 레이싱 바 차트로

In [ ]:

unique_activities = df_transformed['INTRST_LSR_ACT_RN1_VALUE'].unique()

def generate_distinct_colors(n):
    colors = px.colors.qualitative.Plotly + px.colors.qualitative.Alphabet + px.colors.qualitative.Vivid
    if n > len(colors):
        raise ValueError("Need more colors")
    return colors[:n]

color_map = {activity: color for activity, color in zip(unique_activities, generate_distinct_colors(len(unique_activities)))}

In [ ]:

import plotly.graph_objects as go
import plotly.express as px

activity_counts_per_date = df_transformed[df_transformed['INTRST_LSR_ACT_RN1_VALUE'] != "기타"].groupby(['EXAMIN_BEGIN_DE', 'INTRST_LSR_ACT_RN1_VALUE']).size().reset_index(name='counts')

pivot_df = activity_counts_per_date.pivot(index='INTRST_LSR_ACT_RN1_VALUE', columns='EXAMIN_BEGIN_DE', values='counts').fillna(0)

dates = pivot_df.columns.tolist()

initial_data = pivot_df[dates[0]].nlargest(10)

fig = go.Figure()

frames = []
for date in dates:
    sorted_activities = pivot_df[date].nlargest(10).index
    frame_traces = []
    y_categories = []
    for activity in sorted_activities:
        count = pivot_df.loc[activity, date]
        if count > 0:
            frame_traces.append(go.Bar(
                x=[count],
                y=[activity],
                orientation='h',
                marker_color=color_map[activity],
                text=f"{activity}",
                textposition='inside',
                textfont=dict(color='white')
            ))
            y_categories.append(activity)

    frames.append(go.Frame(data=frame_traces, name=str(date), layout={'yaxis': {'categoryorder': 'array', 'categoryarray': y_categories}}))

if frames:
    fig.add_traces(frames[0].data)

fig.frames = frames

fig.update_layout(
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 2000, "redraw": True}, "fromcurrent": True, "transition": {"duration": 500}}],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False}, "transition": {"duration": 0}}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.06,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }],
    sliders=[{
        "steps": [{"args": [[f.name], {"frame": {"duration": 2000, "redraw": True}, "mode": "immediate", "transition": {"duration": 500}}], "label": str(f.name), "method": "animate"} for f in frames],
        "transition": {"duration": 500},
        "x": 0.1,
        "y": 0,
        "currentvalue": {"font": {"size": 20}, "prefix": "Date: ", "visible": True, "xanchor": "right"},
        "len": 0.9,
        "xanchor": "left",
        "yanchor": "top"
    }],
    width=1500,
    height=800,
    xaxis_range=[0, 65],
    yaxis={'categoryorder': 'total descending'},
    yaxis_autorange="reversed",
    font=dict(
        family="NanumGothic",
        size=20,
        color="RebeccaPurple"
        ),
    yaxis_showticklabels=False,
    showlegend=False
)

# Show figure
fig.show()

In [ ]:

fig.write_html("hobby_bar_plot.html")

4-2. 연령대별 취미¶

In [ ]:

df['AGRDE_FLAG_NM'].value_counts()

Out[ ]:

50대    4863
40대    4568
60대    4232
30대    3782
20대    3628
Name: AGRDE_FLAG_NM, dtype: int64

In [ ]:

import plotly.graph_objects as go
import plotly.express as px

def draw_graph_age(age='20대'):

    df_transformed = df[df['AGRDE_FLAG_NM'] == age].groupby('EXAMIN_BEGIN_DE').apply(keep_top_10_activities)
    df_transformed.reset_index(drop=True, inplace=True)

    unique_activities = df_transformed['INTRST_LSR_ACT_RN1_VALUE'].unique()

    def generate_distinct_colors(n):
        colors = px.colors.qualitative.Plotly + px.colors.qualitative.Alphabet + px.colors.qualitative.Vivid
        if n > len(colors):
            raise ValueError("Need more colors")
        return colors[:n]

    color_map = {activity: color for activity, color in zip(unique_activities, generate_distinct_colors(len(unique_activities)))}

    activity_counts_per_date = df_transformed[df_transformed['INTRST_LSR_ACT_RN1_VALUE'] != "기타"].groupby(['EXAMIN_BEGIN_DE', 'INTRST_LSR_ACT_RN1_VALUE']).size().reset_index(name='counts')
    pivot_df = activity_counts_per_date.pivot(index='INTRST_LSR_ACT_RN1_VALUE', columns='EXAMIN_BEGIN_DE', values='counts').fillna(0)

    dates = pivot_df.columns.tolist()
    max_activity_counts = []

    for date in dates:
        max_count = pivot_df[date].max()
        max_activity_counts.append(max_count)

    max_max_activity_count = max(max_activity_counts)
    initial_data = pivot_df[dates[0]].nlargest(10)

    fig = go.Figure()

    frames = []
    for date in dates:
        sorted_activities = pivot_df[date].nlargest(10).index
        frame_traces = []
        y_categories = []
        for activity in sorted_activities:
            count = pivot_df.loc[activity, date]
            if count > 0:
                frame_traces.append(go.Bar(
                    x=[count],
                    y=[activity],
                    orientation='h',
                    marker_color=color_map[activity],
                    text=f"{activity}",
                    textposition='inside',
                    textfont=dict(color='white')
                ))
                y_categories.append(activity)

        frames.append(go.Frame(data=frame_traces, name=str(date), layout={'yaxis': {'categoryorder': 'array', 'categoryarray': y_categories}}))

    if frames:
        fig.add_traces(frames[0].data)

    fig.frames = frames

    fig.update_layout(
        updatemenus=[{
            "buttons": [
                {
                    "args": [None, {"frame": {"duration": 2000, "redraw": True}, "fromcurrent": True, "transition": {"duration": 500}}],
                    "label": "Play",
                    "method": "animate"
                },
                {
                    "args": [[None], {"frame": {"duration": 0, "redraw": False}, "transition": {"duration": 0}}],
                    "label": "Pause",
                    "method": "animate"
                }
            ],
            "direction": "left",
            "pad": {"r": 10, "t": 87},
            "showactive": False,
            "type": "buttons",
            "x": 0.06,
            "xanchor": "right",
            "y": 0,
            "yanchor": "top"
        }],
        sliders=[{
            "steps": [{"args": [[f.name], {"frame": {"duration": 2000, "redraw": True}, "mode": "immediate", "transition": {"duration": 500}}], "label": str(f.name), "method": "animate"} for f in frames],
            "transition": {"duration": 500},
            "x": 0.1,
            "y": 0,
            "currentvalue": {"font": {"size": 20}, "prefix": "Date: ", "visible": True, "xanchor": "right"},
            "len": 0.9,
            "xanchor": "left",
            "yanchor": "top"
        }],
        width=1500,
        height=800,
        xaxis_range=[0, max_max_activity_count+5],
        yaxis={'categoryorder': 'total descending'},  # This line is key for keeping the y-axis ordered
        yaxis_autorange="reversed",
        font=dict(
            family="NanumGothic",
            size=20,
            color="RebeccaPurple"
            ),
        yaxis_showticklabels=False,
        showlegend=False,
        title=age
    )

    fig.write_html(f'hobby_bar_'+age+'_plot.html')

    fig.show()

In [ ]:

draw_graph_age('20대')

<ipython-input-185-c803b4f53752>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

draw_graph_age('30대')

<ipython-input-185-c803b4f53752>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

draw_graph_age('40대')

<ipython-input-185-c803b4f53752>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

draw_graph_age('50대')

<ipython-input-185-c803b4f53752>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

draw_graph_age('60대')

<ipython-input-185-c803b4f53752>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

4-3. 성별 취미¶

In [ ]:

import plotly.graph_objects as go
import plotly.express as px

def draw_graph_sex(sex='M'):

    df_transformed = df[df['SEXDSTN_FLAG_CD'] == sex].groupby('EXAMIN_BEGIN_DE').apply(keep_top_10_activities)
    df_transformed.reset_index(drop=True, inplace=True)

    unique_activities = df_transformed['INTRST_LSR_ACT_RN1_VALUE'].unique()

    def generate_distinct_colors(n):
        colors = px.colors.qualitative.Plotly + px.colors.qualitative.Alphabet + px.colors.qualitative.Vivid
        if n > len(colors):
            raise ValueError("Need more colors")
        return colors[:n]

    color_map = {activity: color for activity, color in zip(unique_activities, generate_distinct_colors(len(unique_activities)))}

    activity_counts_per_date = df_transformed[df_transformed['INTRST_LSR_ACT_RN1_VALUE'] != "기타"].groupby(['EXAMIN_BEGIN_DE', 'INTRST_LSR_ACT_RN1_VALUE']).size().reset_index(name='counts')
    pivot_df = activity_counts_per_date.pivot(index='INTRST_LSR_ACT_RN1_VALUE', columns='EXAMIN_BEGIN_DE', values='counts').fillna(0)

    dates = pivot_df.columns.tolist()
    max_activity_counts = []

    for date in dates:
        max_count = pivot_df[date].max()
        max_activity_counts.append(max_count)

    max_max_activity_count = max(max_activity_counts)
    initial_data = pivot_df[dates[0]].nlargest(10)

    fig = go.Figure()

    frames = []
    for date in dates:
        sorted_activities = pivot_df[date].nlargest(10).index
        frame_traces = []
        y_categories = []
        for activity in sorted_activities:
            count = pivot_df.loc[activity, date]
            if count > 0:
                frame_traces.append(go.Bar(
                    x=[count],
                    y=[activity],
                    orientation='h',
                    marker_color=color_map[activity],
                    text=f"{activity}",
                    textposition='inside',
                    textfont=dict(color='white')
                ))
                y_categories.append(activity)

        frames.append(go.Frame(data=frame_traces, name=str(date), layout={'yaxis': {'categoryorder': 'array', 'categoryarray': y_categories}}))

    if frames:
        fig.add_traces(frames[0].data)

    fig.frames = frames

    fig.update_layout(
        updatemenus=[{
            "buttons": [
                {
                    "args": [None, {"frame": {"duration": 2000, "redraw": True}, "fromcurrent": True, "transition": {"duration": 500}}],
                    "label": "Play",
                    "method": "animate"
                },
                {
                    "args": [[None], {"frame": {"duration": 0, "redraw": False}, "transition": {"duration": 0}}],
                    "label": "Pause",
                    "method": "animate"
                }
            ],
            "direction": "left",
            "pad": {"r": 10, "t": 87},
            "showactive": False,
            "type": "buttons",
            "x": 0.06,
            "xanchor": "right",
            "y": 0,
            "yanchor": "top"
        }],
        sliders=[{
            "steps": [{"args": [[f.name], {"frame": {"duration": 2000, "redraw": True}, "mode": "immediate", "transition": {"duration": 500}}], "label": str(f.name), "method": "animate"} for f in frames],
            "transition": {"duration": 500},
            "x": 0.1,
            "y": 0,
            "currentvalue": {"font": {"size": 20}, "prefix": "Date: ", "visible": True, "xanchor": "right"},
            "len": 0.9,
            "xanchor": "left",
            "yanchor": "top"
        }],
        width=1500,
        height=800,
        xaxis_range=[0, max_max_activity_count+5],
        yaxis={'categoryorder': 'total descending'},  # This line is key for keeping the y-axis ordered
        yaxis_autorange="reversed",
        font=dict(
            family="NanumGothic",
            size=20,
            color="RebeccaPurple"
            ),
        yaxis_showticklabels=False,
        showlegend=False,
        title=sex
    )

    fig.write_html(f'hobby_bar_'+sex+'_plot.html')

    # Show figure
    fig.show()

In [ ]:

draw_graph_sex('M')

<ipython-input-194-3ae106bf8427>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

draw_graph_sex('F')

<ipython-input-194-3ae106bf8427>:7: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

4-4. 지역별 취미¶

In [ ]:

def keep_top_10_activities_by_date_and_area(group):
    top_10_activities = group['INTRST_LSR_ACT_RN1_VALUE'].value_counts().nlargest(10).index
    group['INTRST_LSR_ACT_RN1_VALUE'] = group['INTRST_LSR_ACT_RN1_VALUE'].apply(lambda x: x if x in top_10_activities else '기타')
    return group

df_transformed = df.groupby(['EXAMIN_BEGIN_DE', 'ANSWRR_OC_AREA_NM']).apply(keep_top_10_activities_by_date_and_area)

df_transformed.reset_index(drop=True, inplace=True)

<ipython-input-104-3173d7cc840f>:10: FutureWarning:

Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)

In [ ]:

import geopandas as gpd

seoul_boundary_gdf = gpd.read_file('https://blog.kakaocdn.net/dn/dzgBUs/btrDtibaqaT/aYboMA5dCPJiEMBI15OSA1/SIDO_MAP_2022.json?attach=1&knm=tfile.json')

In [ ]:

seoul_boundary_gdf

Out[ ]:

	CTPRVN_CD	CTP_ENG_NM	CTP_KOR_NM	geometry
0	11	Seoul	서울특별시	POLYGON ((126.98400 37.63600, 126.94800 37.657...
1	26	Busan	부산광역시	POLYGON ((129.28800 35.32100, 129.26300 35.386...
2	27	Daegu	대구광역시	POLYGON ((128.47300 35.83300, 128.47000 35.806...
3	28	Incheon	인천광역시	MULTIPOLYGON (((126.34300 37.64400, 126.37500 ...
4	29	Gwangju	광주광역시	POLYGON ((126.76000 35.25900, 126.73600 35.251...
5	30	Daejeon	대전광역시	POLYGON ((127.40200 36.48600, 127.39800 36.490...
6	31	Ulsan	울산광역시	POLYGON ((129.34600 35.46500, 129.40800 35.493...
7	36	Sejong-si	세종특별자치시	POLYGON ((127.17800 36.59700, 127.19400 36.565...
8	41	Gyeonggi-do	경기도	POLYGON ((127.12700 37.46900, 127.07100 37.432...
9	42	Gangwon-do	강원도	POLYGON ((128.54900 38.30200, 128.51300 38.346...
10	43	Chungcheongbuk-do	충청북도	POLYGON ((127.49300 36.23800, 127.53300 36.251...
11	44	Chungcheongnam-do	충청남도	MULTIPOLYGON (((126.23600 36.86600, 126.20500 ...
12	45	Jeollabuk-do	전라북도	POLYGON ((126.62800 35.97800, 126.52300 35.968...
13	46	Jellanam-do	전라남도	MULTIPOLYGON (((126.59300 34.16200, 126.54000 ...
14	47	Gyeongsangbuk-do	경상북도	MULTIPOLYGON (((129.57900 36.05200, 129.54000 ...
15	48	Gyeongsangnam-do	경상남도	MULTIPOLYGON (((128.35100 34.84100, 128.40700 ...
16	50	Jeju-do	제주특별자치도	POLYGON ((126.76800 33.56400, 126.73000 33.560...

In [ ]:

def get_top_activity(group):
    return group['INTRST_LSR_ACT_RN1_VALUE'].mode()[0]

def get_top_activity_count(row):
    return df_transformed[df_transformed['INTRST_LSR_ACT_RN1_VALUE'] == row]['INTRST_LSR_ACT_RN1_VALUE'].count()

top_activities = df_transformed[df_transformed["INTRST_LSR_ACT_RN1_VALUE"] != "기타"].groupby(['EXAMIN_BEGIN_DE', 'ANSWRR_OC_AREA_NM']).apply(get_top_activity).reset_index()
top_activities.rename(columns={0: 'TOP_INTRST_LSR_ACT_RN1_VALUE'}, inplace=True)

top_activities["TOP_INTRST_COUNT"] = top_activities["TOP_INTRST_LSR_ACT_RN1_VALUE"].apply(get_top_activity_count)

In [ ]:

top_activities

Out[ ]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	TOP_INTRST_LSR_ACT_RN1_VALUE	TOP_INTRST_COUNT
0	20230503	강원도	종교활동	384
1	20230503	경기도	영상 컨텐츠 시청	1915
2	20230503	경상남도	골프 직접 하기	271
3	20230503	경상북도	등산 직접 하기	267
4	20230503	광주광역시	걷기-속보-조깅 직접 하기	1075
...	...	...	...	...
674	20240131	전라남도	걷기-속보-조깅 직접 하기	1075
675	20240131	전라북도	영화관 관람	460
676	20240131	제주특별자치도	SNS -인터넷 커뮤니티 활동	320
677	20240131	충청남도	게임	1035
678	20240131	충청북도	낮잠자기	434

679 rows × 4 columns

In [ ]:

df_gdf = pd.merge(top_activities, seoul_boundary_gdf, how='left', left_on='ANSWRR_OC_AREA_NM', right_on='CTP_KOR_NM')

In [ ]:

df_gdf = df_gdf[["EXAMIN_BEGIN_DE", "ANSWRR_OC_AREA_NM","TOP_INTRST_LSR_ACT_RN1_VALUE", "TOP_INTRST_COUNT", "geometry"]]

In [ ]:

df_gdf

Out[ ]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	TOP_INTRST_LSR_ACT_RN1_VALUE	TOP_INTRST_COUNT	geometry
0	20230503	강원도	종교활동	384	POLYGON ((128.54900 38.30200, 128.51300 38.346...
1	20230503	경기도	영상 컨텐츠 시청	1915	POLYGON ((127.12700 37.46900, 127.07100 37.432...
2	20230503	경상남도	골프 직접 하기	271	MULTIPOLYGON (((128.35100 34.84100, 128.40700 ...
3	20230503	경상북도	등산 직접 하기	267	MULTIPOLYGON (((129.57900 36.05200, 129.54000 ...
4	20230503	광주광역시	걷기-속보-조깅 직접 하기	1075	POLYGON ((126.76000 35.25900, 126.73600 35.251...
...	...	...	...	...	...
674	20240131	전라남도	걷기-속보-조깅 직접 하기	1075	MULTIPOLYGON (((126.59300 34.16200, 126.54000 ...
675	20240131	전라북도	영화관 관람	460	POLYGON ((126.62800 35.97800, 126.52300 35.968...
676	20240131	제주특별자치도	SNS -인터넷 커뮤니티 활동	320	POLYGON ((126.76800 33.56400, 126.73000 33.560...
677	20240131	충청남도	게임	1035	MULTIPOLYGON (((126.23600 36.86600, 126.20500 ...
678	20240131	충청북도	낮잠자기	434	POLYGON ((127.49300 36.23800, 127.53300 36.251...

679 rows × 5 columns

In [ ]:

import geopandas as gpd
import plotly.express as px

gdf = gpd.GeoDataFrame(df_gdf, geometry='geometry')

gdf['longitude'] = gdf.centroid.x
gdf['latitude'] = gdf.centroid.y

<ipython-input-136-580457f7b645>:6: UserWarning:

Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.


<ipython-input-136-580457f7b645>:7: UserWarning:

Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

In [ ]:

gdf.head()

Out[ ]:

	EXAMIN_BEGIN_DE	ANSWRR_OC_AREA_NM	TOP_INTRST_LSR_ACT_RN1_VALUE	TOP_INTRST_COUNT	geometry	longitude	latitude
0	20230503	강원도	종교활동	384	POLYGON ((128.54900 38.30200, 128.51300 38.346...	128.300619	37.718909
1	20230503	경기도	영상 컨텐츠 시청	1915	POLYGON ((127.12700 37.46900, 127.07100 37.432...	127.182402	37.534171
2	20230503	경상남도	골프 직접 하기	271	MULTIPOLYGON (((128.35100 34.84100, 128.40700 ...	128.260329	35.330286
3	20230503	경상북도	등산 직접 하기	267	MULTIPOLYGON (((129.57900 36.05200, 129.54000 ...	128.748324	36.349367
4	20230503	광주광역시	걷기-속보-조깅 직접 하기	1075	POLYGON ((126.76000 35.25900, 126.73600 35.251...	126.836172	35.155369

In [ ]:

import plotly.express as px

# 날짜 형식을 문자열로 변환 (애니메이션 프레임용)
gdf['EXAMIN_BEGIN_DE_STR'] = gdf['EXAMIN_BEGIN_DE'].astype(str)

# Plotly Express를 사용한 시각화
fig = px.scatter_geo(gdf,
                     lon='longitude',
                     lat='latitude',
                     size="TOP_INTRST_COUNT",  # 원의 크기를 TOP_INTRST_COUNT에 따라 조정
                     color="TOP_INTRST_LSR_ACT_RN1_VALUE",  # 취미별로 색상 구분
                     hover_name="ANSWRR_OC_AREA_NM",  # 마우스 오버시 지역 이름 표시
                     hover_data=["TOP_INTRST_LSR_ACT_RN1_VALUE", "TOP_INTRST_COUNT"],  # 추가 정보 표시
                     animation_frame="EXAMIN_BEGIN_DE",  # 애니메이션 프레임으로 사용할 날짜 컬럼
                     title="날짜별 지역의 가장 인기 있는 취미와 관심도 시각화")

# 지도의 초기 위치와 확대 레벨 설정
fig.update_geos(center={"lat": 36.5, "lon": 128}, projection_scale=20)  # 한국 지도의 중심점으로 조정

fig.show()

* 브라우저 및 디바이스에 따라 시각화 애니메이션이 잘 안 보일 수 있습니다.

'IT > 데이터분석' 카테고리의 다른 글

[시각화][애니메이션] 지하철은 내가 탈 때만 붐비는 걸까? - 서울 지하철 시간대별 혼잡도 분석 (0)	2024.03.01
[시각화] MZ세대의 문해력은 정말로 낮을까? - 성인문해능력조사 분석 (0)	2024.02.07

logN^블

[시각화][애니메이션] 유튜버의 미래는 밝을까? - 2024 문화여가활동 분석

2024 문화 여가 활동 분석 및 시각화¶

1. 환경 설정¶

1-1. 데이터 병합¶

2. 데이터 분석¶

2-1. 데이터 분포 확인¶

2-1-1. 성별¶

2-1-2. 연령대¶

2-1-3. 지역별¶

2-1-4. 소득 수준¶

2-1-5. 관심 레저 활동¶

3. 데이터 시각화¶

4. 데이터 애니메이션 시각화¶

4-1. 전체 대상 시각화¶

4-2. 연령대별 취미¶

4-3. 성별 취미¶

4-4. 지역별 취미¶

'IT > 데이터분석' 카테고리의 다른 글

티스토리툴바

[시각화][애니메이션] 유튜버의 미래는 밝을까? - 2024 문화여가활동 분석

2024 문화 여가 활동 분석 및 시각화¶

1. 환경 설정¶

1-1. 데이터 병합¶

2. 데이터 분석¶

2-1. 데이터 분포 확인¶

2-1-1. 성별¶

2-1-2. 연령대¶

2-1-3. 지역별¶

2-1-4. 소득 수준¶

2-1-5. 관심 레저 활동¶

3. 데이터 시각화¶

4. 데이터 애니메이션 시각화¶

4-1. 전체 대상 시각화¶

4-2. 연령대별 취미¶

4-3. 성별 취미¶

4-4. 지역별 취미¶

'IT > 데이터분석' 카테고리의 다른 글

관련글

티스토리툴바